[FFmpeg-devel] [PATCH 1/3] spherical: Add tiled equirectangular type and projection-specific properties

Vittorio Giovara vittorio.giovara at gmail.com
Wed Feb 15 20:43:42 EET 2017

Hi Aaron,

On Wed, Feb 15, 2017 at 11:48 AM, Aaron Colwell <acolwell at google.com> wrote:
>> If the spec changes, it will be the contents of the equi/cbmp/mesh.
>> By exporting them raw as extradata, said changes in the spec would
>> require no changes to our implementation.
> This is one of the main reasons I was suggesting this path. I think of these
> extradata fields much like the extra data that codecs have. It really is
> only important to the code that needs to render a specific projection. For
> transcoding, you mainly just need to convey the information in a lossless
> manner from demuxer to muxer.

I understand that, but "transcoding" is not the only task performed here.

Yes, I agree that for your single usecase of converting mp4 metadata
to mkv metadata your solution a feasible one, but the code here also
1. convey information to the users
2. fill in side data and frame data (remember this is a pkt side data
that is exported to frames since spherical is a frame property, it
doesn't matter that currently the spec is container level only for
3. offer it to any muxer for additional parsing

You can't expect to fill in binary data and have all three places
behave correctly, and handle multiple theoretical version while at it.
You need an abstraction like the one provided by the API in this

> I anticipate the spec will change in the future. My plan is that no change
> will break what is currently specified in the spec right now, but I
> anticipate some changes will be made. Having a solution that can gracefully
> handle this would be nice.

A *binary* solution is certainly not nice, especially if you have
multiple version.
Besides, you should really make an effort to change the spec as little
as possible now that it is public, as more and more software will
depend on it and you can't realistically expect that everyone will be
always up-to-date (ffmpeg included).

>> >> Wouldn't it be a better idea to export the binary data of the
>> >> equi/cbmp/mesh boxes into an extradata-like field in the
>> >> AVSphericalMapping struct, and let the downstream application parse
>> >> it instead?
>> >
>> > No I don't think so, lavf is an abstraction layer and one of its tasks
>> > is to provide a (simple?) unified entry layer. and letting the user
>> > parse binary data is IMO bad design and very fragile. Also it's not
>> > impossible that another standard for tagging spherical metadata
>> > emerges in the future: the current API could very easily wrap it,
>> > while exporting the binary entry would be too specification-specific
>> > and it would be tied too heavily on the google draft.
> I agree with Vittorio that having some form of abstraction is a good thing
> and having binary data in places can be problematic. It feels like we could
> find some middle ground here by providing helper functions that parse the
> binary data into projection specific structs and back just like codecs code
> tends to do. I feel like this provides a reasonable balance between having a
> common set of fields where things actually have common semantics like
> projection_type, yaw/pitch/roll, & extra_data while also providing a way to
> get access to projection specific information in a simple way.

I don't feel like this has anything to do with the current patchset.
This set only allows for properly conveying information in a clean
matten to the user (and a muxer if you will can be consider as a plain
user). Having a single parsing routine is IMO a useless optimization,
and can be discussed in another thread.

> At the end of the day players really just need to care about a rendering
> mesh so in some sense it would be nice to have that be the abstraction for
> the player use case. That is basically what we have done in our internal
> player implementations. That could easily be handled by helper functions,
> but would be a bad representation for AVSphericalMapping because it would
> make transcoding/transmuxing painful.

Again, a "player" is not your single user here, you have probing tool
that need to provide data understandable by human people. And please
stop saying transcoding/transmuxing is painful, on the opposite it's
incredibly simple, you just need to read AVSphericalMapping (exactly
as any other user) and write down this information. If you think that
new (non-binary) fields should be added to further simplify muxing,
please do tell.

>> Wait for Aaron's opinion before addressing reviews and pushing. He
>> sent a different patchset himself and it wouldn't be nice to push
>> yours without at least giving him a chance to comment.
> Thank you. I really appreciate this. I hope it is clear I'm really trying to
> find some middle ground here. My team has been working on this stuff inside
> Google for several years and I'm just trying to bring some of that
> experience and learning into the open here. I defer to you because this is
> your project and I am relatively sparse contributor.

Please don't get me wrong, it's really commendable that Google is
opening up more and more, after using open source for so many years.
On the other hand, do understand that Google is not the only ffmpeg
user in the world, and that it needs to respect additional, incredibly
different, and possibly boring use cases all around that might matter
to other users.

More information about the ffmpeg-devel mailing list