[FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

Fri Dec 18 11:44:27 CET 2015

On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote:
> On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote:
> > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > > > From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > > > 
> > > > ---
> > > > 
> > > > Hi,
> > > > 
> > > > This commit is likely to break fate on arm since the current C code path
> > > > seems to use less precision.
> > > > 
> > > > How should I proceed to fix it ?
> > > 
> > > hmm
> > > can the precission of the C code path and any asm impl of it under
> > > bitexact (if they exist), be changed to higher precission without
> > > speedloss?
> > > if so that would be an option
> > 
> > We are currently facing 4 cases (with this patch applied)
> > 
> >   * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
> >   precision overall
> >   * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
> >   precision overall
> 
> >   * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables
> 
> which LUT do you mean here ?

The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used
though, I haven't looked at the C code that actually does the conversion
(yet).

> 
> 
> >   * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
> >   coefficients (libswscale/yuv2rgb.c around line 800).
> > 
> > Notes:
> >   * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
> >   edges).
> > 
> >   * [1] and [4] (13bit coefficient accuracy) should be the same but have
> >   slight differences.
> > 
> > Questions:
> > 
> 
> >   * What is the meaning of the ACCURATE_RND flag ?
> 
> it should enable accurate rounding
> 
> 
> >   * Does [3] use some kind of interpolation instead of duplicating
> >   chroma lines ? Its output seems inferior to the other code paths.
> 
> are you sure that is true for real images?
> its easy to end up with wrong conclusions with synthetic inputs
> unless you want to use the scaler only for such inputs.
> 
> either way line duplication is likely not optimal for real images
> iam not made of constant color blocks that are aligned to some cammeras
> 2x2 samples
> 
> 
> >   * Is [3] the output that should be taken as reference ?
> 
> id say, the reference is reality, making the output as close as a
> image of the new resolution would be if it had been taken that way
> 
> 
> >   * Should we use BITEXACT instead of ACCURATE_RND to determine the
> >   precision used ?
> 
> BITEXACT is to avoid platform differences and allow regression tests
> 
> if all else is equal it would be best if C and asm matches, and if
> C is bad then it should be improved

Here are the C, MMX and NEON outputs from a photo:
http://0x5c.me/yuv2rgb/photos

The C and NEON outputs look almost the same.
The MMX one have slightly different "colors" overall.

Since figuring out what the C code is actually doing and have the neon asm
matches its output is likely to take some time. Would you be OK if, on the
ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken),
and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO,
speed is preferred over the slight quality gain of the 32bit version on
this platform) ?

This behaviour will affect yuv420p but also nv12 and nv21 inputs.

This is a kind of a temporary (and lame) solution until I have some time
to work on it.

Matthieu
[...]