[FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
Matthieu Bouron
matthieu.bouron at gmail.com
Fri Dec 18 11:44:27 CET 2015
On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote:
> On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote:
> > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > > > From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > > >
> > > > ---
> > > >
> > > > Hi,
> > > >
> > > > This commit is likely to break fate on arm since the current C code path
> > > > seems to use less precision.
> > > >
> > > > How should I proceed to fix it ?
> > >
> > > hmm
> > > can the precission of the C code path and any asm impl of it under
> > > bitexact (if they exist), be changed to higher precission without
> > > speedloss?
> > > if so that would be an option
> >
> > We are currently facing 4 cases (with this patch applied)
> >
> > * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
> > precision overall
> > * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
> > precision overall
>
> > * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables
>
> which LUT do you mean here ?
The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used
though, I haven't looked at the C code that actually does the conversion
(yet).
>
>
> > * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
> > coefficients (libswscale/yuv2rgb.c around line 800).
> >
> > Notes:
> > * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
> > edges).
> >
> > * [1] and [4] (13bit coefficient accuracy) should be the same but have
> > slight differences.
> >
> > Questions:
> >
>
> > * What is the meaning of the ACCURATE_RND flag ?
>
> it should enable accurate rounding
>
>
> > * Does [3] use some kind of interpolation instead of duplicating
> > chroma lines ? Its output seems inferior to the other code paths.
>
> are you sure that is true for real images?
> its easy to end up with wrong conclusions with synthetic inputs
> unless you want to use the scaler only for such inputs.
>
> either way line duplication is likely not optimal for real images
> iam not made of constant color blocks that are aligned to some cammeras
> 2x2 samples
>
>
> > * Is [3] the output that should be taken as reference ?
>
> id say, the reference is reality, making the output as close as a
> image of the new resolution would be if it had been taken that way
>
>
> > * Should we use BITEXACT instead of ACCURATE_RND to determine the
> > precision used ?
>
> BITEXACT is to avoid platform differences and allow regression tests
>
> if all else is equal it would be best if C and asm matches, and if
> C is bad then it should be improved
Here are the C, MMX and NEON outputs from a photo:
http://0x5c.me/yuv2rgb/photos
The C and NEON outputs look almost the same.
The MMX one have slightly different "colors" overall.
Since figuring out what the C code is actually doing and have the neon asm
matches its output is likely to take some time. Would you be OK if, on the
ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken),
and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO,
speed is preferred over the slight quality gain of the 32bit version on
this platform) ?
This behaviour will affect yuv420p but also nv12 and nv21 inputs.
This is a kind of a temporary (and lame) solution until I have some time
to work on it.
Matthieu
[...]
More information about the ffmpeg-devel
mailing list