[FFmpeg-devel] [PATCH v1] avcodec/v210enc: add yuv420p/yuv420p10 input pixel format support

Sun Sep 22 05:49:21 EEST 2019

> On Sep 21, 2019, at 4:44 PM, Michael Niedermayer <michael at niedermayer.cc> wrote:
> 
>> The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  
> 
> 4:2:0 and 4:2:2 have a chroma plane with different resolution.
> converting between planes of different resolution is what i called scaling.
> 
> 
>> It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).
> 
> yes, doing an operation in the encoder avoids a filter being inserted which
> does that operation.
> Thats true for every encoder and every filter.

The key thing here is the encoder is already touching every pixel, so avoiding having the need for the filter essentially allows the conversion to happen at essentially zero cost (as we repack the pixels into the requisite v210 layout).

> Also replacing interpolation by a nearest neighbor implementation
> is quite expectedly faster.

Yes, and we can certainly argue about whether doing interpolation of chroma when doing 4:2:0 to 4:2:2 actually has any visible benefit.  I can however say the cost of having swscaler in the pipeline is considerable.  In fact I didn’t appreciate it myself until I was trying to deliver 1080p60 in realtime to four decklink outputs and couldn’t keep up on my target platform.  And because filters generally aren’t threaded, I got hit with one of those cases where I had to break out the profiler and ask “why on Earth is the main ffmpeg thread so busy?"

> one problem is
> the user can setup the scale filter with high quality in mind or with 
> low quality and speed in mind.
> But after this patch she always gets low quality because the low quality
> convertion code is hardcoded in the encoder which pretends to support 420.
> The outside code has no chance to know it shouldnt feed 420 if high quality
> is wanted.

The user can still insert a scaler explicitly or use the pix_fmt argument so the format filter gets put into the pipeline.

> 
> Also why should this be in one encoder and not be available to other
> encoders supporting 4:2:2 input ?
> A solution should work for all of them

I would assume this would really only be helpful in encoders which only support 4:2:2 and not 4:2:0, since typical encoders that accept 4:2:0 would preserve that in their resulting encoding (i.e. they wouldn’t blindly upscale 4:2:0 to 4:2:2 for no good reason).

I did actually consider doing a separate filter which just does packed/planer conversion and 4:2:0 to 4:2:2 (as opposed to swscaler).  In this case though the additional modularity in such a filter was outweighed by my goal to minimize the number of times I’m copying the frame data.  Combining it with the v210 encoding meant only a single pass over the data.

> 
> Iam not sure what is the best solution but simply hardcoding this in
> one encoder feels rather wrong

The scale filter performs three basic roles:
1.  Scaling
2.  Packed to planer conversion (or vice versa)
3.  Colorspace conversion

I supposed potentially someone could redesign swscale to include the option to not take the slow path for cases where scaling isn’t actually required (i.e. cases where only 2 and 3 are needed).

Just so we’re all on the same page - this wasn’t a case of random or premature optimization.  I have a specific use case where I’m decoding four instances of 1080p60 video and the platform can’t keep up without this change.  It’s the result of actually profiling the entire pipeline as opposed to some unit test with a benchmark.  In fact I don’t particularly agree with Limin's numbers where he used the benchmark option, since that fails to take into account caching behavior or memory bandwidth on a platform which is constrained (a problem which is exacerbated when running multiple instances).  In a perfect world we would have very small operations which each perform some discrete function, and we can combine all of those in a pipeline.  In the real world though, significant benefits can be gained by combining certain operations to avoid copying the same pixels over and over again.

Devin