[FFmpeg-devel] [PATCH v2 2/4] ffmpeg: Add display_matrix option

Thilo Borgmann thilo.borgmann at mail.de
Sat Aug 20 16:32:36 EEST 2022



On 18 Aug 2022, at 12:58, Gyan Doshi wrote:

> On 2022-08-17 05:55 pm, Anton Khirnov wrote:
>> Quoting Gyan Doshi (2022-08-17 12:53:11)
>>>
>>> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
>>>> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>>>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>>>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>>>>> $subject
>>>>>>>>>
>>>>>>>>> -Thilo
>>>>>>>>>     From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 
>>>>>>>>> 00:00:00 2001
>>>>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp at gmail.com>
>>>>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>>>>
>>>>>>>>> This enables overriding the rotation as well as 
>>>>>>>>> horizontal/vertical
>>>>>>>>> flip state of a specific video stream on the input side.
>>>>>>>>>
>>>>>>>>> Additionally, switch the singular test that was utilizing the 
>>>>>>>>> rotation
>>>>>>>>> metadata to instead override the input display rotation, thus 
>>>>>>>>> leading
>>>>>>>>> to the same result.
>>>>>>>>> ---
>>>>>>>> I still don't see how it's better to squash multiple options 
>>>>>>>> into a
>>>>>>>> single option.
>>>>>>>>
>>>>>>>> It requires all this extra infrastructure and in the end it's 
>>>>>>>> less
>>>>>>>> user-friendly, because user-understandable things like rotation 
>>>>>>>> or flips
>>>>>>>> are now hidden under "display matrix". How many users would 
>>>>>>>> know what a
>>>>>>>> display matrix is?
>>>>>>> FWIW I think Gyan's request to do this all in one option that 
>>>>>>> effect one thing (the display matrix) is valid.
>>>>>> I don't.
>>>>>>
>>>>>> It may be one thing internally, but modeling user interfaces 
>>>>>> based on
>>>>>> internal representation is a sinful malpractice. More 
>>>>>> importantly, I see
>>>>>> no advantage from doing it - it only makes the option parsing 
>>>>>> more
>>>>>> complicated.
>>>>> It's not based on ffmpeg's 'internal representation'. All 
>>>>> transform
>>>>> attributes are stored as a composite in one mathematical object.
>>>> Keyword "stored". It is internal representation. Users should not 
>>>> care
>>>> how it is stored, the entire point point of our project is to 
>>>> shield
>>>> them from that as much as possible.
>>>>
>>>>> Evaluating the matrix values will need to look at all sources of
>>>>> contribution. So gathering and presenting all these attributes in 
>>>>> a single
>>>>> option (+ docs) makes it clearer to the user at the cost of an 
>>>>> initial
>>>>> learning curve.
>>>> Are you seriously expecting all users who want to mark a video as
>>>> rotated or flipped to learn about display matrices?
>>> They don't need to know how to encode or decode the matrix if they 
>>> don't
>>> want to. Only that it is the container.
>>>
>>> The difference is between
>>>
>>>    -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
>>>
>>> and
>>>
>>>    -display_matrix:v:0 rotate=90:hflip=1:scale=2
>>>
>>> The latter syntax is all too familiar to users from AVFrame filters 
>>> and
>>> BSFs.
>> The syntax similarity is misleading - filters are applied in the 
>> order
>> you list them, while these options are always applied in fixed order.
>> The analogous filters are also called rotate, [vf]flip, and scale -
>> there is no display_matrix filter.
>
> The display matrix is effected as a single matrix multiplication to 
> obtain output pixel co-ordinates which incorporates all the
> encoded transforms so it is analogous to multiple options within a 
> filter like eq or hue, not multiple filters.
>
> About SEI messaging,  the h264 metadata BSF still obtains (and 
> extracts) those attributes as a display matrix as that is the internal 
> messaging format regardless of ultimate storage form.

Thanks for all your comments! Unfortunately, I don’t see real 
consensus emerging here.

I see a single option finds more acceptance (3:1),
I see using to use AVDict is frowned upon by 1, though the alternative 
suggestion with a parser for SVG-style (new syntax)  is not backup up by 
s.o. else.

Therefore my interpretation would be to go with majority and stick with 
one option and stick with AVDict.
However, I don’t want to shortcut the discussion or override s.o. 
opinion. Going to pick this up next week and if no more arguments 
emerge, I’ll continue with that.

Thanks,
Thilo


More information about the ffmpeg-devel mailing list