[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

James Darnley jdarnley at obe.tv
Wed Nov 30 14:10:43 EET 2016


On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> 2016-11-29 17:14 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>>> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>>>> sse2:
>>>> complex: 4.13x faster (1514 vs. 367 cycles)
>>>> simple:  4.38x faster (1836 vs. 419 cycles)
>>>>
>>>> avx:
>>>> complex: 1.07x faster (260 vs. 244 cycles)
>>>> simple:  1.03x faster (284 vs. 274 cycles)
>>>
>>> What are you comparing?
> 
>> The AVX comparison is it versus SSE2.
> 
> This wasn't obvious to me.

I've made it more verbose but I'm not sure whether it is any better.
Care to give your opinion Carl?

>     Nehalem:
>      - sse2:
>        - complex: 4.13x faster (1514 vs. 367 cycles)
>        - simple:  4.38x faster (1836 vs. 419 cycles)
> 
>     Haswell:
>      - sse2:
>        - complex: 3.61x faster ( 936 vs. 260 cycles)
>        - simple:  3.97x faster (1126 vs. 284 cycles)
>      - avx (versus sse2):
>        - complex: 1.07x faster (260 vs. 244 cycles)
>        - simple:  1.03x faster (284 vs. 274 cycles)

I included the sse2 results for the Haswell to show that the avx is
(slightly) better.



More information about the ffmpeg-devel mailing list