[FFmpeg-devel] [PATCH] matroskaenc: Add support for writing video projection.

Vittorio Giovara vittorio.giovara at gmail.com
Sat Feb 4 17:46:26 EET 2017

On Fri, Feb 3, 2017 at 5:57 PM, Aaron Colwell <acolwell at google.com> wrote:
> I still think you don't understand what these fields do given what you say
> here. Yes there is a little more math. At the end of the day all these
> fields do is specify a the min & max for the latitude & longitude. This
> essentially translates to adding scale factors and offsets in your shader or
> something similar in your 3D geometry creation logic. I get it if
> implementations don't want to do this small bit of extra work, but saying
> this is a different type seems strange because you wouldn't do this when
> talking about cropped 2D images.

If I don't understand these fields, maybe the specification could do a
better job at explaining what they are for ;)

I am aware that the final projection is the same, but the actual
problem is that if we don't introduce a new type we would be conveying
a different semantics to a single projection type. In other words we
require applications to behave differently according to two different
fields (the projection type and the offset fields) rather than a
single one. So yes, the projection is the same, the usecase is
different, the app behaviour is different, the final processing is
different -- I think that all this warrants a separate type.

If I still didn't get my point across, let's try with an example: an
application that does not support the tiled equirectangular
projection, with a separate type it can immediately discern that it is
an unsupported type and inform the user, with a single type, instead,
it has to sort-of-implement the tiling and check for fields that
should not matter. Another example: look at AVStereo3DType, there is a
side-by-side and a side-by-side-quincunx : an application that does
not support quincux can just abort and notify the user, if they were
two separate fields, you could have applications that do not support
quincunx but try to render the side-by-side part (which usually
results in a garbage rendering).

So I reiterate that in my opinion a separate enum value which
"notifies" apps that they should check additional types is the way to

>> It is too late to change the spec, but I do believe that the usecase
>> is different enough to add a new type, in order to not overcomplicate
>> the implementation.
> It feels like you are just moving the problem to the demuxers and muxers
> here. Adding a new type means all demuxers will have to contain logic to
> generate these different types and all muxers will have to contain logic to
> collapse these types back down to a single value.

Yes, I would a 100% add more logic to the 2 muxers and 2 demuxers that
implement this spec rather than the thousands applications that
implement the ffmpeg api. Also the "different logic" is literally an
"if" case, if not little else.

> I don't really want to keep arguing about this. If folks really want
> different types then I'll do it just because I want to get reading and
> writing of metadata working end-to-end. I'd like to make a small request to
> use the term "cropped equirectagular" instead of "tiled equirectangular" but
> I don't feel to strongly about that.

Please don't, "cropped" has entirely different meaning, and it's
already confusing that you can do bitstream level cropping and surface
cropping. If you really hate the term "tiled" you could use "bounded
equirectangular" as it is used in the spec.

> I suppose this is just a difference in style so I don't feel too strongly
> about it. I was just trying to use unions & structs here to make it a little
> clearer about which fields are expected to be valid and when. The fields
> seemed to be disjoint sets so I was trying to use language features to
> convey that.
> I'd also like to float a potential alternative solution. We could just
> convey the projection private data as a size and byte array in this struct.
> That would allow me to pass the metadata from demuxers to muxers so I can
> achieve my goals, and allow actual parsing of the information to be deferred
> until someone needs it. How do you feel about this compromise position?

Again I don't see any advantage in using your proposal, it makes the
code difficult to read and hard to debug. Binary metadata are
intrinsically bad in my opinion, for this use case you could just add
four fields, and read/write four times.

I still have the code I had around when I implemented that.

+        /* 0.32 fixed point */
+        t = sc->height * (float) avio_rb32(pb) / ((uint64_t) 1 << 32);
+        b = sc->height * (float) avio_rb32(pb) / ((uint64_t) 1 << 32);
+        l = sc->width * (float) avio_rb32(pb) / ((uint64_t) 1 << 32);
+        r = sc->width * (float) avio_rb32(pb) / ((uint64_t) 1 << 32);


+    sc->spherical->left_offset   = l;
+    sc->spherical->top_offset    = t;
+    sc->spherical->right_offset  = r;
+    sc->spherical->bottom_offset = b;

(and the reverse for writing of course).

More information about the ffmpeg-devel mailing list