[FFmpeg-devel] [PATCH] avutil: add av_memcpy() to avoid undefined behavior with NULL, NULL, 0

Sun Jul 7 20:13:42 EEST 2019

sön 2019-07-07 klockan 18:19 +0200 skrev Michael Niedermayer:
> On Sat, Jul 06, 2019 at 11:22:59PM +0200, Tomas Härdin wrote:
> > lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer:
> > > On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> > > > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > > > > As we are already off topic, heres an example to test static
> > > > > analysis, does this trigger undefined behavior by executing the
> > > > > memcpy
> > > > > for some user input ?
> > > > > 
> > > > > void f(unsigned bigint a) {
> > > > >     bigint i;
> > > > >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> > > > >         ;
> > > > >     if (a > 20 && i > ((bigint)1 << a))
> > > > >         memcpy(NULL, NULL, 0);
> > > > > }
> > > > > 
> > > > > i know its a lame example but just to remind that static analysis
> > > > > has
> > > > > limitations. (your mail sounds a bit like static analysis could
> > > > > replace
> > > > > everything ...)
> > > > 
> > > > That is acually perfectly legal since the intersection between
> > > > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> > > > Here's an example that validates:
> > > > 
> > > > #include <stdint.h>
> > > > #include <string.h>
> > > > 
> > > > /*@ axiomatic Foo {
> > > >     axiom Bar: \forall integer a;
> > > >       0 <= a <= 62 ==>
> > > >         1 <= (1<<a) <= (1<<62);
> > > >     }
> > > >  */
> > > > 
> > > > /*@ requires 0 <= a <= 62;
> > > >     assigns ((char*)NULL)[0..-1];
> > > >  */
> > > > void f(uint64_t a) {
> > > >     int64_t i = 2;
> > > >     int64_t a2 = (1LL << a) + 1;
> > > >     /*@ loop invariant 2 <= i <= a2;
> > > >         loop assigns i;
> > > >      */
> > > >     for (; a2 % i; i++)
> > > >         ;
> > > >     //@ assert a2 % i == 0;
> > > >     if (a > 20 && i > ((int64_t)1 << a))
> > > >         memcpy(NULL, NULL, 0);
> > > > }
> > > 
> > > this code is wrong.
> > > Imagine this was real code, and to make it fit in the static
> > > analyzer one changes it like this
> > > 
> > > why is it worng ?
> > > the range should not have a upper bound of 62 in fact there is no
> > > reason to run this function with input below 1LL<<33 that is not
> > > 33 that is 1LL<<33
> > 
> > All bignum implementations I've seen have upper bounds. 
> 
> How can this help ?
> 
> assume we proof avutil is free of UB, then we update some external lib
> or change to a platform with a larger INT_MAX and boom, the proof is
> no longer valid 
> (this could happen if such implementation limits are used in a proof)

Yes, you have to target the prover to every architecture you intend to
support. I ran into this exact issue when verifying some embedded code
where int is 16-bit. Adding the option -machdep x86_16 was enough to
fix that. I've seen posts on the Frama-C ML about adding support for
other platforms, such as the 128-bit mode in RISC-V.

You also can't verify external libraries of course. But FFmpeg is
notoriously NIH:y, so that wouldn't be an issue for a very long  time.
But, you can also augment function prototypes with guesstimated
contracts. This is analogous to reading headers and forming your own
mental model about how a given library works.

/Tomas