[FFmpeg-devel] [PATCH v2 2/4] ffmpeg: Add display_matrix option

Thu Aug 18 13:58:09 EEST 2022

On 2022-08-17 05:55 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-08-17 12:53:11)
>>
>> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>>>> $subject
>>>>>>>>
>>>>>>>> -Thilo
>>>>>>>>     From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
>>>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp at gmail.com>
>>>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>>>
>>>>>>>> This enables overriding the rotation as well as horizontal/vertical
>>>>>>>> flip state of a specific video stream on the input side.
>>>>>>>>
>>>>>>>> Additionally, switch the singular test that was utilizing the rotation
>>>>>>>> metadata to instead override the input display rotation, thus leading
>>>>>>>> to the same result.
>>>>>>>> ---
>>>>>>> I still don't see how it's better to squash multiple options into a
>>>>>>> single option.
>>>>>>>
>>>>>>> It requires all this extra infrastructure and in the end it's less
>>>>>>> user-friendly, because user-understandable things like rotation or flips
>>>>>>> are now hidden under "display matrix". How many users would know what a
>>>>>>> display matrix is?
>>>>>> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
>>>>> I don't.
>>>>>
>>>>> It may be one thing internally, but modeling user interfaces based on
>>>>> internal representation is a sinful malpractice. More importantly, I see
>>>>> no advantage from doing it - it only makes the option parsing more
>>>>> complicated.
>>>> It's not based on ffmpeg's 'internal representation'. All transform
>>>> attributes are stored as a composite in one mathematical object.
>>> Keyword "stored". It is internal representation. Users should not care
>>> how it is stored, the entire point point of our project is to shield
>>> them from that as much as possible.
>>>
>>>> Evaluating the matrix values will need to look at all sources of
>>>> contribution. So gathering and presenting all these attributes in a single
>>>> option (+ docs) makes it clearer to the user at the cost of an initial
>>>> learning curve.
>>> Are you seriously expecting all users who want to mark a video as
>>> rotated or flipped to learn about display matrices?
>> They don't need to know how to encode or decode the matrix if they don't
>> want to. Only that it is the container.
>>
>> The difference is between
>>
>>    -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
>>
>> and
>>
>>    -display_matrix:v:0 rotate=90:hflip=1:scale=2
>>
>> The latter syntax is all too familiar to users from AVFrame filters and
>> BSFs.
> The syntax similarity is misleading - filters are applied in the order
> you list them, while these options are always applied in fixed order.
> The analogous filters are also called rotate, [vf]flip, and scale -
> there is no display_matrix filter.

The display matrix is effected as a single matrix multiplication to 
obtain output pixel co-ordinates which incorporates all the
encoded transforms so it is analogous to multiple options within a 
filter like eq or hue, not multiple filters.

About SEI messaging,  the h264 metadata BSF still obtains (and extracts) 
those attributes as a display matrix as that is the internal messaging 
format regardless of ultimate storage form.

Regards,
Gyan