[FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

Tue Jun 13 01:18:23 EEST 2017

On 2017-06-12 18:57, Michael Niedermayer wrote:
> On Mon, Jun 12, 2017 at 03:36:06PM +0200, James Darnley wrote:
>> Rounding contributed by Ronald S. Bultje
>> ---
>>  libavcodec/tests/x86/dct.c       |  2 ++
>>  libavcodec/x86/idctdsp_init.c    | 19 +++++++++++++++++++
>>  libavcodec/x86/simple_idct.h     |  3 +++
>>  libavcodec/x86/simple_idct10.asm |  8 ++++++++
>>  4 files changed, 32 insertions(+)
>
> this (3) and the patches 1 and 2 break te idct
>
> ./ffplay ~/videos/matrixbench_mpeg2.mpg
> looks pretty bad

If that would happen to be the FATE sample
mpeg2/matrixbench_mpeg2.lq1.mpg then I see that too.

As I said on IRC I was able to partly remedy it by moving patch 6 up.

On 2017-06-12 19:00, Michael Niedermayer wrote:
> On Mon, Jun 12, 2017 at 03:36:03PM +0200, James Darnley wrote:
>> I think I have reached the final state for these patches.  There has been little
>> change to the 1st, 3rd, 4th, and 5th.
>>
>> The 2nd adds an option to explicitly control what the macro does after the IDCT.
>> This allows the small optimisation for 8-bit of not storing the data back to the
>> source block.
>>
> 
>> The 6th lets the IDCT use the slightly different coefficients to get exact
>> output compared with the MMX original.  This is rather messy but I think it is
>> slightly better than trying to alter the code macro.  A word diff looks much
>> cleaner than the line diff git uses by default.
> 
> i see no changes in fate
> is the changed code not tested in any fate test (in which case please
> add one) but quite possibly i also misundestand as i didnt look at
> what exactly is slihtly different

While I am not completely sure, I think that is be because the simplemmx
doesn't produce the same output as the C so is avoided in fate tests.
The new functions are chosen with these conditions:

> +        if (ARCH_X86_64 &&
> +            !high_bit_depth &&
> +            avctx->lowres == 0 &&
> +            (avctx->idct_algo == FF_IDCT_AUTO ||
> +                avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
> +                avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {

As is the old MMX, minus the x86-64 part.

Where as others (the 10 and 12 bit) have "avctx->idct_algo ==
FF_IDCT_SIMPLE" at the end.

Further to this, doing sed -i 's/idct simple/idct simplemmx/g' across
the files which match makes fate fail.  I looked at where that matched
and it appeared to be mostly command line options in the testing files.

That was done on master so I think that proves my original thought.

Ultimately I have no idea what is going on here.  I've tested decoding
the sample mentioned above before my inline to external conversion of
the MMX.  I've done that immediately after that commit.  I've done that
on the current master.  I get the same result for all of those.

Going to my patches.  I can get the same result with -cpuflags none.
Surely -cpuflags mmx+mmxext would then be the same as above.

I'm quitting for the night before I flame out.