[FFmpeg-devel] [BUG] UTF-8 decoder vulnerable to character spoofing attacks

Tue Oct 23 01:46:44 CEST 2007

On Mon, Oct 22, 2007 at 07:15:35PM +0200, Reimar D?ffinger wrote:
> > > i would first like to understand under what circumstances the current code
> > > is causing a real problem (security or normal bug)
> > 
> > It could actually cause crash, e.g. if a string is right at the top of
> > the heap and contains nothing but 0xff, then the UTF-8 decoder will
> > read 8 bytes and crash.
> 
> Not, the comment says quite clearly that GET_UTF8 may read up to 7
> bytes. It is a usage error to use it when it is not possible to read at least
> an additional 7 bytes.

OK, sorry, I missed that. But it doesn't make the code any more
correct, just not subject to crashing...

> > As I said, it will also incorrectly decode illegal aliases for
> > characters rather than signalling error. I don't think this will lead
> > to vulns in the current code (since UTF-8 decoder is hardly used
> > anyway), but it's bad to have buggy code that could cause problems
> > when someone uses it in the future. Even if not, it teaches people who
> > read it extremely bad practices.
> 
> Well, I always thought those bugs are due to extremely bad practices in
> checking data. At least I always considered UTF-8 as a method of
> compressing 32 bit data.

UTF-8 is not a compression algorithm. It's a character encoding. This
is like the first FAQ (or rather frequent pitfall) about UTF-8.

> We would rightly call anyone who does security
> checking of zlib compressed data via strchr or so an idiot,

This is completely unrelated. You are not dealing with compression.
You are dealing with text that must conform to a rigid specification
for many EXTREMELY good reasons. Failing to understand those reasons
is not a justification for breaking the spec any more than failing to
understand the requirements of NUT would be a justification for
generating invalid NUT streams.

Rich