[FFmpeg-devel] [PATCH 2/7] decode: add a method for attaching lavc-internal data to frames

Fri Oct 13 21:03:12 EEST 2017

On Fri, Oct 06, 2017 at 12:38:44AM +0100, Mark Thompson wrote:
> On 05/10/17 23:59, Mark Thompson wrote:
> > On 05/10/17 23:01, Michael Niedermayer wrote:
> >> On Thu, Oct 05, 2017 at 09:03:40PM +0100, Mark Thompson wrote:
> >>> On 05/10/17 17:47, Michael Niedermayer wrote:
> >>>> On Wed, Oct 04, 2017 at 02:04:54PM +0200, wm4 wrote:
> >>>>> On Wed, 4 Oct 2017 13:37:31 +0200
> >>>>> Tobias Rapp <t.rapp at noa-archive.com> wrote:
> >>>>>
> >>>>>> On 04.10.2017 11:34, wm4 wrote:
> >>>>>>> On Wed, 4 Oct 2017 11:22:37 +0200
> >>>>>>> Michael Niedermayer <michael at niedermayer.cc> wrote:
> >>>>>>>   
> >>>>>>>> On Wed, Oct 04, 2017 at 09:12:29AM +0200, wm4 wrote:  
> >>>>>>>>> On Tue, 3 Oct 2017 21:40:58 +0200
> >>>>>>>>> Michael Niedermayer <michael at niedermayer.cc> wrote:
> >>>>>>>>>      
> >>>>>>>>>> On Tue, Oct 03, 2017 at 03:15:13PM +0200, wm4 wrote:  
> >>>>>>>>>>> From: Anton Khirnov <anton at khirnov.net>
> >>>>>>>>>>>        
> >>>>>>>>>>      
> >>>>>>>>>>> Use the AVFrame.opaque_ref field. The original user's opaque_ref is
> >>>>>>>>>>> wrapped in the lavc struct and then unwrapped before the frame is
> >>>>>>>>>>> returned to the caller.  
> >>>>>>>>>>
> >>>>>>>>>> this is a ugly hack
> >>>>>>>>>>
> >>>>>>>>>> one and the same field should not be used to hold both the
> >>>>>>>>>> users opaque_ref as well as a structure which is itself not a user
> >>>>>>>>>> opaque_ref  
> >>>>>>>>>
> >>>>>>>>> While the AVFrame is within libavcodec, it's libavcodec's frame, not
> >>>>>>>>> the user's. Thus your claim doesn't make too much sense. libavcodec
> >>>>>>>>> fully controls the meaning for its own AVFrames' opaque_ref, but
> >>>>>>>>> reconstruct the user's value when returning it.  
> >>>>>>>>
> >>>>>>>> i disagree  
> >>>>>>>
> >>>>>>> Well, you're wrong anyway.
> >>>>>>>   
> >>>>>>>> such hacks should not be added, we do have enough hacks already  
> >>>>>>>
> >>>>>>> It's not a hack.  
> >>>>>>
> >>>>>> Changing the semantics of a field during its lifetime, even when only 
> >>>>>> done within private code, is at least unexpected behavior.
> >>>>>
> >>>>> That's not done.
> >>>>
> >>>> The semantics are defined by the docs, which state:
> >>>> "AVBufferRef for free use by the API user."
> >>>>
> >>>> And before the patch this is true, all instances of this field are
> >>>> controled by the user application and are consistent.
> >>>>
> >>>> after the patch the AVFrames used by a codec have their opaque_ref
> >>>> replaced by a wraped structure relative to what the outside of this
> >>>> codec has.
> >>>>
> >>>>
> >>>>> Conceptually the AVFrame with the changed behavior is
> >>>>> a new reference. Internally, AVFrame.opaque_ref will always have the
> >>>>> same semantics, pointing to the FrameDecodeData struct. Only at points
> >>>>> where the AVFrame ref is converted to/from the user struct this is
> >>>>> changed.
> >>>>>
> >>>>>>> This is done strictly when returning a valid AVFrame, so there is no
> >>>>>>> room for mistakes.  
> >>>>>>
> >>>>>> The room for mistake might not increase for external developers but it 
> >>>>>> increases for internal developers (maintenance cost).
> >>>>>
> >>>>> Like where? There are only 2 places where the code needs to deal with
> >>>>> it, and these are in shared code, not individual decoders.
> >>>>
> >>>> just greping for AVFrame in the headers shows callbacks, a direct
> >>>> pointer to a AVFrame and the API functions that interface the codec
> >>>>
> >>>> Just thinking of a codec that instanciates another codec and how
> >>>> exactly the callback which may originate from the user or the outer
> >>>> codec would unwrap the potential nested wraping.
> >>>> I really dont think we want this in FFmpeg
> >>>> And this is just one example ...
> >>>
> >>
> >>> I don't understand this discussion.
> >>
> >> yes, i realize this, but iam not sure why
> >>
> >> its pretty simple and clear but you seem to skip over parts of what
> >> i said.
> >>
> >>
> >>>
> >>> As far as I can tell, the sequence is this:
> >>>
> >>> * libavcodec allocates an AVFrame structure.
> >>> * libavcodec calls get_buffer2 with that AVFrame structure; the user fills its fields.
> >>> * libavcodec extracts the buffer references and metadata from the AVFrame, and maybe frees it (some codecs reuse a single AVFrame for the lifetime of the codec, others allocate them each time).
> >>> ~ decoding happens, the buffers are written to ~
> >>> * The user allocates a new AVFrame structure.
> >>> * The user calls receive_frame with that new AVFrame structure; libavcodec its fields with references to the decoded data and associated metadata.
> >>> * The user can then read the buffers of the frame, along with its metadata.
> >>>
> >>> Why would it matter what happens in the middle?  The AVFrame structure at the end is not the AVFrame structure at the start, and the user can't assume anything about it at all - if they try to dereference a pointer to the AVFrame supplied by libavcodec for get_buffer2 after get_buffer2 has returned then they are definitely invoking undefined behaviour.
> >>
> >> First this does not consider nested codecs.
> >> A decoder can instanciate a internal decoder, if that now uses the
> >> user callback it would have to unwrap twice for it if it used a
> >> callback from the outer decoder it has to unwrap once.
> >> This of course also depends on how it instanciates the decoder, that
> >> is at which API level.
> > 
> > All decoders pass back to their callers the opaque reference they received, so I don't think nesting is relevant.
> > 
> >> Next, if you look at the API and search for AVFrame you will find
> >> that there are more callbacks than get_buffer2() that use AVFrames.
> >> there is for example also
> >>  void (*draw_horiz_band)(struct AVCodecContext *s,
> >>                             const AVFrame *src, int offset[AV_NUM_DATA_POINTERS],
> >>                             int y, int type, int height);
> >>
> >> Which too has a AVFrame, which too must be unwraped and in a way specific
> >> on the nesting depth of the decoder.
> > 
> > Hmm, ok, I didn't know about this one, and it was never mentioned before.
> > 
> >> and then there is
> >> AVFrame *coded_frame;
> >> in AVCodecContext
> >>
> >> Which can be accessed by the user as well as codec.
> > 
> > And immediately above it:
> > 
> >      * - decoding: unused
> > 
> >> Iam pretty sure if i look more ill find more examples.> 
> >> And future API changes would all have to consider this wraping and
> >> unwraping.
> >>
> >> And then all above is just about decoders, there are encoders too
> >> some share code with decoders and at the same time encoders take
> >> AVFrames from the user. Does the shared code expect wraped opaque_ref
> >> or not?
> >> All this adds alot of complexity to maintaining the code and it would
> >> add many bugs
> >>
> >> The opaque_ref wraping is a really bad design. Iam not sure why
> >> people defend it.
> > 
> > I disagree with your implication here.  I was under the impression that decoders have exactly one entry and one exit point for frames (get_buffer2 and receive_frame): since the opaque reference is only considered at those two points, there is no additional complexity anywhere and no other code need care about it.  However, apparently the API abstraction has additional holes in it which I was not aware of (the draw_horiz_band callback you have noted above), which seems to invalidate that argument.
> > 

> > Are there any other holes in the decoding or encoding abstractions through which frames are able to leak?  It would be helpful to have a list of these somewhere so that people considering how AVFrames travel through libavcodec don't get caught out by this sort of thing in future.

I dont really know and as a maintainer of some of this code, i dont
really want to have to keep track of how exactly AVFrames can pass
through the code.

And as the one taking care of a lot of security i totally have to
reject this. It turns a robust system into something thats really
fragile. This is only secure if every way a AVFrame can pass through
the code has each interface point carefully convert the opaque_ref type.
Thats a large source of bugs and exploits.
Just look at how hard it is to get this patch work initially.
How many future commits will change the way AVFrames move and incorrectly
update all the wraping. developers are not machines this suprising
requirement of (un)wraping on interfaces will not be done correctly
in a few years from now even if you get it all correct now.

Now add libavformat, libavdevice, libavfilter specific opaque_ref
wraping

you really consider this to be good design by any definition of "good"?

> 
> Having thought about this a bit more, the wrapped_avframe decoder rather messes with it too.  If the input frame (packet) has opaque_ref set then the unwrap process would try to use it and fail.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is what and why we do it that matters, not just one of them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171013/fb671ba6/attachment.sig>