[FFmpeg-devel] [PATCH v2 2/4] ffmpeg: Add display_matrix option

Sat Aug 20 16:32:36 EEST 2022

On 18 Aug 2022, at 12:58, Gyan Doshi wrote:

> On 2022-08-17 05:55 pm, Anton Khirnov wrote:
>> Quoting Gyan Doshi (2022-08-17 12:53:11)
>>>
>>> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
>>>> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>>>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>>>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>>>>> $subject
>>>>>>>>>
>>>>>>>>> -Thilo
>>>>>>>>>     From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 
>>>>>>>>> 00:00:00 2001
>>>>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp at gmail.com>
>>>>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>>>>
>>>>>>>>> This enables overriding the rotation as well as 
>>>>>>>>> horizontal/vertical
>>>>>>>>> flip state of a specific video stream on the input side.
>>>>>>>>>
>>>>>>>>> Additionally, switch the singular test that was utilizing the 
>>>>>>>>> rotation
>>>>>>>>> metadata to instead override the input display rotation, thus 
>>>>>>>>> leading
>>>>>>>>> to the same result.
>>>>>>>>> ---
>>>>>>>> I still don't see how it's better to squash multiple options 
>>>>>>>> into a
>>>>>>>> single option.
>>>>>>>>
>>>>>>>> It requires all this extra infrastructure and in the end it's 
>>>>>>>> less
>>>>>>>> user-friendly, because user-understandable things like rotation 
>>>>>>>> or flips
>>>>>>>> are now hidden under "display matrix". How many users would 
>>>>>>>> know what a
>>>>>>>> display matrix is?
>>>>>>> FWIW I think Gyan's request to do this all in one option that 
>>>>>>> effect one thing (the display matrix) is valid.
>>>>>> I don't.
>>>>>>
>>>>>> It may be one thing internally, but modeling user interfaces 
>>>>>> based on
>>>>>> internal representation is a sinful malpractice. More 
>>>>>> importantly, I see
>>>>>> no advantage from doing it - it only makes the option parsing 
>>>>>> more
>>>>>> complicated.
>>>>> It's not based on ffmpeg's 'internal representation'. All 
>>>>> transform
>>>>> attributes are stored as a composite in one mathematical object.
>>>> Keyword "stored". It is internal representation. Users should not 
>>>> care
>>>> how it is stored, the entire point point of our project is to 
>>>> shield
>>>> them from that as much as possible.
>>>>
>>>>> Evaluating the matrix values will need to look at all sources of
>>>>> contribution. So gathering and presenting all these attributes in 
>>>>> a single
>>>>> option (+ docs) makes it clearer to the user at the cost of an 
>>>>> initial
>>>>> learning curve.
>>>> Are you seriously expecting all users who want to mark a video as
>>>> rotated or flipped to learn about display matrices?
>>> They don't need to know how to encode or decode the matrix if they 
>>> don't
>>> want to. Only that it is the container.
>>>
>>> The difference is between
>>>
>>>    -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
>>>
>>> and
>>>
>>>    -display_matrix:v:0 rotate=90:hflip=1:scale=2
>>>
>>> The latter syntax is all too familiar to users from AVFrame filters 
>>> and
>>> BSFs.
>> The syntax similarity is misleading - filters are applied in the 
>> order
>> you list them, while these options are always applied in fixed order.
>> The analogous filters are also called rotate, [vf]flip, and scale -
>> there is no display_matrix filter.
>
> The display matrix is effected as a single matrix multiplication to 
> obtain output pixel co-ordinates which incorporates all the
> encoded transforms so it is analogous to multiple options within a 
> filter like eq or hue, not multiple filters.
>
> About SEI messaging,  the h264 metadata BSF still obtains (and 
> extracts) those attributes as a display matrix as that is the internal 
> messaging format regardless of ultimate storage form.

Thanks for all your comments! Unfortunately, I don’t see real 
consensus emerging here.

I see a single option finds more acceptance (3:1),
I see using to use AVDict is frowned upon by 1, though the alternative 
suggestion with a parser for SVG-style (new syntax)  is not backup up by 
s.o. else.

Therefore my interpretation would be to go with majority and stick with 
one option and stick with AVDict.
However, I don’t want to shortcut the discussion or override s.o. 
opinion. Going to pick this up next week and if no more arguments 
emerge, I’ll continue with that.

Thanks,
Thilo