[FFmpeg-devel] [PATCH 12/12] lavf/framecrcenc: do not hash side data

Fri May 14 23:24:34 EEST 2021

James Almer:
> On 5/14/2021 3:16 PM, Michael Niedermayer wrote:
>> On Fri, May 14, 2021 at 09:42:10AM -0300, James Almer wrote:
>>> On 5/14/2021 5:01 AM, Anton Khirnov wrote:
>>>> Quoting Michael Niedermayer (2021-05-10 16:06:01)
>>>>> On Sun, Apr 25, 2021 at 09:03:20AM +0200, Anton Khirnov wrote:
>>>>>> There are no guarantees that all side data types have the same
>>>>>> representation on all platforms.
>>>>>
>>>>>> @@ -65,63 +51,6 @@ static int framecrc_write_packet(struct
>>>>>> AVFormatContext *s, AVPacket *pkt)
>>>>>>                 pkt->stream_index, pkt->dts, pkt->pts,
>>>>>> pkt->duration, pkt->size, crc);
>>>>>>        if (pkt->flags != AV_PKT_FLAG_KEY)
>>>>>>            av_strlcatf(buf, sizeof(buf), ", F=0x%0X", pkt->flags);
>>>>>> -    if (pkt->side_data_elems) {
>>>>>> -        int i;
>>>>>> -        av_strlcatf(buf, sizeof(buf), ", S=%d",
>>>>>> pkt->side_data_elems);
>>>>>
>>>>> The number and type of the side data elements should not be affected
>>>>> by the problem described.
>>>>> Maybe we could leave that. Bugs could manifest as the absence or
>>>>> addition
>>>>> of side data, seeing that in framecrc_write_packet() output may be
>>>>> usefull
>>>>
>>>> No strong opinion here - patches welcome I guess.
>>>
>>> I can do it, but it will be a "breaking" change, assuming there are
>>> parsers
>>> that read the output of this muxer.
>>
>> does anyone know of such parsers ?
> 
> No, it's hypothetical.
> 
>> or does anyone have an idea what such parser could be used for ?
> 
> Users that can't or don't want to write programs using the libraries and
> find it easier to write perl scripts that parse the output of CLI like
> ffmpeg and ffprobe.
> 
> Technically speaking, many of our regression tests do exactly that.
> 
>>
>>
>>> Right now you removed all the trailing properties, which while also
>>> breaking, a sane parser made for the old output can simply assume
>>> that the
>>> line was truncated. But if we re-add the side data amount and sizes
>>> for each
>>> of them without the hashes, things can get parsed the wrong way.
>>>
>>> Namely, it used to be:
>>>> 0,          0,          0,       16,    57008, 0x43416399,
>>>> S=2,        8, 0x08e5014f,       88, 0xd65a04db
>>>
>>> And now it will be something like:
>>>> 0,          0,          0,       16,    57008, 0x43416399,
>>>> S=2,        8,       88
>>>
>>> So the question is, do we care about this? The framehash/framemd5 muxer
>>> prints a version number, which lets us make breaking additions
>>> parsers can
>>> then properly handle. Should we then just consider that parsing framecrc
>>> output is not a supported scenario?
>>
>> the version should probably be increased
> 
> framemd5/framehash != framecrc. The former is versioned, but the latter,
> which this discussion is about, is not, so we should decide if that
> means its output is to be considered fixed or not.
> 
> I'm inclined to say it shouldn't. There's framehash for a versioned
> output that's guaranteed to not change, which also supports adler32 (the
> algorithm used by framecrc).
> 

framehash btw uses it properly, whereas framecrc initializes the adler32
checksum to zero (it should be one). Last time I checked, switching FATE
to framehash (or making framecrc initialize the checksum to 1) would
amount to a diff of about 70000 lines, so I didn't do it.
Of course, framehash has the same problems with side-data hashing, but
it has a version option so that we can make changes.
If we decide to switch FATE to framehash, we should probably deprecate
framecrc. I could make this change.

- Andreas