[FFmpeg-devel] [PATCH 1/3] avcodec/avpacket: add av_packet_resize()

Sun Mar 14 17:35:48 EET 2021

On 3/14/2021 7:34 AM, Anton Khirnov wrote:
> Quoting James Almer (2021-03-12 18:24:47)
>>
>> The padding exists AFAIK because something, like an optimized bitstream
>> reader that tries to process several bytes at the same time, may end up
>> reading or writing pass the reported end of the buffer.
>> That means that if reading and it's not all zeroes, it could in theory
>> have unpredictable results. Hence why everything always zeroes the
>> padding, even av_shrink_packet().
> 
> On that topic, I've been wondering about eliminating this requirement.
> Does anyone know which code it is precisely that depends on the padding
> being zeroed? Is this optimization really worth it?
> It seems rather silly to jump through all these hoops for an
> unmeasurable benefit in one decoder.

Some subtitle demuxers didn't look at pkt->size and depended on the 
padding to work as a 0 string terminator, but Marton fixed that in a 
patchset sent yesterday.

Beyond that, i think the v210 decoder and encoder read and write past 
the end of the buffer if you use the simd functions.

> 
> It would also give us zero-copy packet splitting.
> 
>>
>> And yes, what you describe is what some bitstream filters and the
>> matroska demuxer do. They just create several packet references pointing
>> to the same data buffer, but using different offsets for the data
>> pointer. They all have "padding", but in many cases said padding is the
>> beginning of the payload of another packet, and that's not ideal.
>>
>>>
>>>>
>>>> I'm sure there are other similar valid scenarios where attempting to
>>>> shrink a non writable packet is not "bad API usage".
>>>>
>>>>>
>>>>> If you want to shrink a writable packet, then maybe you don't even
>>>>> need zero padding, because the buffer possibly already contains some
>>>>> defined value, and the main reason of zero padding is avoiding
>>>>> reading uninitialized data...
>>>>
>>>> If i allocate a payload of size 1024, ultimately fill only 512 bytes,
>>>> then resize it to that value (typical scenario in lavf demuxers), if i
>>>> don't zero the bytes after the 512 offset then they will remain
>>>> uninitialized.
>>>
>>> I am not sure I see your point here, if the data after the padding is
>>> uninitalized, that is not a problem. I made a typo above, and meant
>>> non-writable packet in my comment. And my reasoning is that if a packet
>>> is non-writable, it already contains initialized data with a zero
>>> padding. If you shrink that by reducing pkt->size only, you will not
>>> have uninitialized data, only the padding will not be zeroed. And that
>>> is preferable to copying the data only for zeroing the padding, because
>>> - as far as I know - the padding does not have to be zeroed, it only
>>> have to be initialized.
>>>
>>> I agree that it is not nice that av_shrink_packet() writes something to
>>> the packet, people may not think about it and misuse it instead of
>>> directly decreaseing pkt->size when they need a partial packet. That is
>>> why I suggested the assert(). One might argue that it kind of breaks
>>> behaviour, but I'd say it does not break it too much, and writing to a
>>> non-writable packet was broken in the first place.
>>>
>>> Alternatively we can change av_shrink_packet() to only zero the padding
>>> if the packet is writable, which has the benefit that it will do what
>>> people generally expect, no matter if you throw a writable or a
>>> non-writable packet to it.
>>>
>>> A third alternative is to introduce void av_shrink_packet2() in which
>>> you explicitly document that a writable packet is needed and do the
>>> assert there, and deprecate av_shrink_packet().
>>
>> Not a fan of functions with a 2 suffix. And to be honest, I really don't
>> care about what we do with av_shrink_packet().
>> Do you want to keep it around? Ok. Want to deprecate and remove it?
>> Better. Want to add an assert to it? Not a fan and it may result in
>> unexpected crashes for library users, but whatever.
> 
> I would suggest keeping av_shrink_packet() with a big scary warning that
> it does not check for ownership and it is the caller's responsibility to
> ensure that writing to the packet is safe.

If we can remove the zero-padding requirement, or the padding 
requirement altogether, then that would no longer be necessary.

> 
> Also, I'd like to emphasise that av_*_is_writable() is not gospel. It is
> merely a convention that is used "by default", when you don't have more
> accurate information.
> Some bits of code may use other conventions and consider a buffer
> writable even if av_buffer_is_writable() returns 0. For example this is
> at the core of frame threading, where a reference to a frame currently
> being decoded is propagated to other threads before decoding finishes.
> The owner thread then writes to the frame because frame-mt conventions
> allow it to, even though there are multiple references to the frame.
>