[FFmpeg-devel] [PATCH] avformat/utils: add helper functions to retrieve index entries from an AVStream

zsugabubus zsugabubus at national.shitposting.agency
Thu Mar 25 22:28:10 EET 2021


On Thu, Mar 25, 2021 at 02:21:43PM +0100, Marvin Scholz wrote:
> On 25 Mar 2021, at 12:55, Nicolas George wrote:
> > James Almer (12021-03-24):
> >> I think it's clear by now that nothing i could say will convince you it's
> >> better to not return a pointer to an internal array when there are safer
> >> alternatives, and i already gave my reasons why, none of which satisfied
> >> you, so i don't see the point in keeping this discussion going.
> >
> > I find this comment quite offensive. You did not manage to convince me
> > because your arguments have not been up to the task. Do not try to push
> > the fault on me, and I will refrain from accusing you of not taking my
> > arguments into account. Coming to an agreement is a process, it requires
> > both parts to refine their arguments progressively.
> >
> > This is a matter of choosing the least of several drawbacks. So let us
> > compare the drawbacks and not muddle things further.
> >
> > For me:
> >
> > 1. having a dynamic allocation is way way worse than
> > 2. having sizeof(AVIndexEntry) in the ABI, which is somewhat worse than
> > 3. having a function with many arguments, which is a tiny bit worse than
> > 4. having a "use this pointer immediately" constraint.
> >
> > We agree except on 3>4, so let us focus on that.
> >
> > Option (3) has these practical drawbacks: Many arguments involves more
> > typing and the risk of messing with the order and getting invalid
> > values. It also requires redesigning the API if we add fields and
> > exporting them is useful. And it requires either the overhead of NULL
> > checks or the caller declaring unneeded variables.
> >
> > Option (4) has the obvious practical drawback that misusing the API
> > causes undefined behavior.
> >
> > The constraint of using a pointer immediately on risk of undefined
> > behavior is actually a frequent one, in FFmpeg but also in C at large:
> > gethosbtyname, localtime, etc.
> >
> > For me, that makes it approximately on par with the risk of messing the
> > order of the many arguments.
> >
> > Which leaves more typing, NULL checks overhead or useless variables
> > (still more typing).
> >
> > It is tiny, I have no trouble admitting, but it is tiny in favor of one
> > solution.
> >
> > If you do not agree with these estimates, please explain exactly where.
> 
> I disagree with your ordering too,
> for me 4. is clearly worse than 3. here, as the harm that can be
> done by mixing up arguments (in this case) is less than use of a
> possibly invalid pointer.

If you mess the order of the arguments in an unlucky way or forgot to
initialize one, that's also undefined behavior--though you may get
warnings.

> And mixed up arguments would probably be noticed easier when testing
> than reads of possibly invalid memory.

Every operation is potentially invalid unless you make sure that you
keep the constraints given by the API. That's the end of the story.

Returning to the previous streams example, you also have to make sure,
you do not call avformat_free_context() somewhere before accessing
AVStream, because that would invalidate that. You can do it easily
without getting any warnings or errors. Why nobody complains about this?
(Because we are not in Rust but in C.)

> Even when documenting the constraint of immediately using the pointer,
> it could easily be overlooked/forgotten.

Just like everything else.

> It does not even has to be me forgetting the constraint, but could be
> someone else changing the code in the future, adding some API call
> before the pointer is used, unaware of the possible consequences and
> that could invalidate the pointer (if I understand James explanation
> correctly). This could go unnoticed easily.

Keep away the cat from the code and sit somebody to the chair who used
to read the documentation and familiar with the code he modifies.

Anyway, if you modify index entries while iterating, then you may miss
some entries, or count one twice, who knows. So it is the violation of
the API anyways, and you will be happy when you receive an invalid
memory access or a fatal av_free() error that will save you days of
debugging.

> So IMO having a function with many arguments seems like a better and
> safer tradeoff in this case to me…

Safe-safe... we are still in C, homeland of UB. Your program will not
get magically more or less safer depending on what a single FFmpeg API
function returns, if you or your colleague (or >me< or >anybody<) never
used the standard library before and/or does not read the documentation.


Let's investigate a different thing that have not been mentioned before:
What if AVIndexEntry changes?

Fields ADDED/User INTERESTED:
  (3): Must use the new av_get_index_entry2(); have to be very careful
       about the order of parameters.
  (4): AVIndexEntry.new_field.

Fields ADDED/User NOT interested:
  (3): Deprecation warning for av_get_index_entry(); will have to use
       function with the new signature, passing NULL, taking care of the
       order of arguments.
  (4): -

Fields REMOVED/User was INTERESTED:
  (3): Potentially deprecation warning for av_get_index_entry?(); have
       to use function with the new signature, taking care of the order
       of arguments.
  (4): Deprecation warning for the field; have to use a different field
       or something. ((If field access would be hidden behind a function
       call deprecation may not be even necessary in all cases.))

Fields REMOVED/User not INTERESTED:
  (3): Potentially deprecation warning for av_get_index_entry?(); have
       to use function with the new signature, taking care of the order
       of arguments, again.
  (4): -

Old (3) methods surely cause API bloat.

Changes in fundamental fields require major changes in all interfaces
since old values may not be easily substituted by anything other. In
every other case (4) seems a better choice.

So IMO const AVIndexEntry *av_get_index_entry(AVStream, unsigned) has
the advantage that:
- Simple: Implementation is one line, no branches, no memcpy, no
  AVERROR()s that you 100% not interested (for this function) and can be
  more easily maintained.
- Superset of the "many arguments" proposal: If user wants some fields
  to live longer, it can copy it... without branches.
- Returned object can be extended with functions. (Computed values.)


At the end, if you take some random pointer and you expect it too stay
valid for some unspecified time, yes, you may face some unexpected
result.

There are objects that stay usable until the corresponding free() call
and there are objects that stay usable until some other condition. You
have to look up the API documentation for the function signature,
reading what it does, does this function really does what I need and
lastly for the correct usage.

For me, (4) seems much simpler and simpler things are always a better
choice in the long run.

--
zsugabubus

P.S.: It probably will not be an issue with AVIndexEntry, but a complex
structure with pointers would be even more hassle using (3), because
output variables would require special memdup()ing and free()ing. Going
down the rabbit hole, how many objects should be created at the end? I
think this approach is not viable in big so why should it be used in
small.


More information about the ffmpeg-devel mailing list