[FFmpeg-devel] [BUG] UTF-8 decoder vulnerable to character spoofing attacks
Loren Merritt
lorenm
Tue Oct 23 05:44:40 CEST 2007
On Mon, 22 Oct 2007, Rich Felker wrote:
> On Mon, Oct 22, 2007 at 07:15:35PM +0200, Reimar D?ffinger wrote:
>
>> Well, I always thought those bugs are due to extremely bad practices in
>> checking data. At least I always considered UTF-8 as a method of
>> compressing 32 bit data.
>
> UTF-8 is not a compression algorithm. It's a character encoding. This
> is like the first FAQ (or rather frequent pitfall) about UTF-8.
What is the distinction? As a multimedia developer, "compression" and
"encoding" mean the same to me.
Unicode is a set of symbols (in the information theory sense, not to be
confused with "glyphs"), with a meaning attached to each. UTF-8 is an
assignment of a bitstring (between 1 and 4 bytes) to each symbol. That
sounds like the very definition of Variable Length Coding, i.e.
compression.
UTF-8 has additional goals, like mapping each character string to a unique
bit string. But that has nothing to do with compressing or not. e.g. FFV1
also leaves no decisions up to the encoder (beyond a couple variants
selected in the header).
--Loren Merritt
More information about the ffmpeg-devel
mailing list