[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

Ronald S. Bultje rsbultje at gmail.com
Wed Nov 30 14:57:38 EET 2016


Hi,

On Wed, Nov 30, 2016 at 7:10 AM, James Darnley <jdarnley at obe.tv> wrote:

> On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> > 2016-11-29 17:14 GMT+01:00 James Darnley <jdarnley at obe.tv>:
> >> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> >>> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarnley at obe.tv>:
> >>>> sse2:
> >>>> complex: 4.13x faster (1514 vs. 367 cycles)
> >>>> simple:  4.38x faster (1836 vs. 419 cycles)
> >>>>
> >>>> avx:
> >>>> complex: 1.07x faster (260 vs. 244 cycles)
> >>>> simple:  1.03x faster (284 vs. 274 cycles)
> >>>
> >>> What are you comparing?
> >
> >> The AVX comparison is it versus SSE2.
> >
> > This wasn't obvious to me.
>
> I've made it more verbose but I'm not sure whether it is any better.
> Care to give your opinion Carl?
>
> >     Nehalem:
> >      - sse2:
> >        - complex: 4.13x faster (1514 vs. 367 cycles)
> >        - simple:  4.38x faster (1836 vs. 419 cycles)
> >
> >     Haswell:
> >      - sse2:
> >        - complex: 3.61x faster ( 936 vs. 260 cycles)
> >        - simple:  3.97x faster (1126 vs. 284 cycles)
> >      - avx (versus sse2):
> >        - complex: 1.07x faster (260 vs. 244 cycles)
> >        - simple:  1.03x faster (284 vs. 274 cycles)
>
> I included the sse2 results for the Haswell to show that the avx is
> (slightly) better.


Ah! Now it makes sense. I had no idea why your SSE2 results changed from
367 (SSE2 vs. C) to 260 cycles (AVX vs. SSE2).

Ronald


More information about the ffmpeg-devel mailing list