[FFmpeg-devel] [PATCH 1/3] avcodec/avpacket: add av_packet_resize()

Mon Mar 15 06:15:14 EET 2021

James Almer:
> On 3/14/2021 7:34 AM, Anton Khirnov wrote:
>> Quoting James Almer (2021-03-12 18:24:47)
>>>
>>> The padding exists AFAIK because something, like an optimized bitstream
>>> reader that tries to process several bytes at the same time, may end up
>>> reading or writing pass the reported end of the buffer.
>>> That means that if reading and it's not all zeroes, it could in theory
>>> have unpredictable results. Hence why everything always zeroes the
>>> padding, even av_shrink_packet().
>>
>> On that topic, I've been wondering about eliminating this requirement.
>> Does anyone know which code it is precisely that depends on the padding
>> being zeroed? Is this optimization really worth it?
>> It seems rather silly to jump through all these hoops for an
>> unmeasurable benefit in one decoder.
> 
> Some subtitle demuxers didn't look at pkt->size and depended on the
> padding to work as a 0 string terminator, but Marton fixed that in a
> patchset sent yesterday.

You mean subtitle decoders; some subtitle demuxers have a different bug:
They only zero-terminate their extradata with a single '\0', not with
AV_INPUT_BUFFER_PADDING_SIZE. See ff_bprint_to_codecpar_extradata().

And ff_startcode_find_candidate_c() also requires that data is
zero-terminated immediately after the buffer. It is trivial to fix. (My
old patchset would actually avoid overreading altogether.)

> 
> Beyond that, i think the v210 decoder and encoder read and write past
> the end of the buffer if you use the simd functions.
> 
>>
>> It would also give us zero-copy packet splitting.
>>
>>>
>>> And yes, what you describe is what some bitstream filters and the
>>> matroska demuxer do. They just create several packet references pointing
>>> to the same data buffer, but using different offsets for the data
>>> pointer. They all have "padding", but in many cases said padding is the
>>> beginning of the payload of another packet, and that's not ideal.
>>>
>>>>
>>>>>
>>>>> I'm sure there are other similar valid scenarios where attempting to
>>>>> shrink a non writable packet is not "bad API usage".
>>>>>
>>>>>>
>>>>>> If you want to shrink a writable packet, then maybe you don't even
>>>>>> need zero padding, because the buffer possibly already contains some
>>>>>> defined value, and the main reason of zero padding is avoiding
>>>>>> reading uninitialized data...
>>>>>
>>>>> If i allocate a payload of size 1024, ultimately fill only 512 bytes,
>>>>> then resize it to that value (typical scenario in lavf demuxers), if i
>>>>> don't zero the bytes after the 512 offset then they will remain
>>>>> uninitialized.
>>>>
>>>> I am not sure I see your point here, if the data after the padding is
>>>> uninitalized, that is not a problem. I made a typo above, and meant
>>>> non-writable packet in my comment. And my reasoning is that if a packet
>>>> is non-writable, it already contains initialized data with a zero
>>>> padding. If you shrink that by reducing pkt->size only, you will not
>>>> have uninitialized data, only the padding will not be zeroed. And that
>>>> is preferable to copying the data only for zeroing the padding, because
>>>> - as far as I know - the padding does not have to be zeroed, it only
>>>> have to be initialized.
>>>>
>>>> I agree that it is not nice that av_shrink_packet() writes something to
>>>> the packet, people may not think about it and misuse it instead of
>>>> directly decreaseing pkt->size when they need a partial packet. That is
>>>> why I suggested the assert(). One might argue that it kind of breaks
>>>> behaviour, but I'd say it does not break it too much, and writing to a
>>>> non-writable packet was broken in the first place.
>>>>
>>>> Alternatively we can change av_shrink_packet() to only zero the padding
>>>> if the packet is writable, which has the benefit that it will do what
>>>> people generally expect, no matter if you throw a writable or a
>>>> non-writable packet to it.
>>>>
>>>> A third alternative is to introduce void av_shrink_packet2() in which
>>>> you explicitly document that a writable packet is needed and do the
>>>> assert there, and deprecate av_shrink_packet().
>>>
>>> Not a fan of functions with a 2 suffix. And to be honest, I really don't
>>> care about what we do with av_shrink_packet().
>>> Do you want to keep it around? Ok. Want to deprecate and remove it?
>>> Better. Want to add an assert to it? Not a fan and it may result in
>>> unexpected crashes for library users, but whatever.
>>
>> I would suggest keeping av_shrink_packet() with a big scary warning that
>> it does not check for ownership and it is the caller's responsibility to
>> ensure that writing to the packet is safe.
> 
> If we can remove the zero-padding requirement, or the padding
> requirement altogether, then that would no longer be necessary.
> 
>>
>> Also, I'd like to emphasise that av_*_is_writable() is not gospel. It is
>> merely a convention that is used "by default", when you don't have more
>> accurate information.
>> Some bits of code may use other conventions and consider a buffer
>> writable even if av_buffer_is_writable() returns 0. For example this is
>> at the core of frame threading, where a reference to a frame currently
>> being decoded is propagated to other threads before decoding finishes.
>> The owner thread then writes to the frame because frame-mt conventions
>> allow it to, even though there are multiple references to the frame.
>>
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".