[FFmpeg-devel] [PATCH] move h264 chromaMC x86 code to yasm
Ronald S. Bultje
rsbultje
Thu Sep 2 12:46:23 CEST 2010
Hi,
On Sat, Aug 28, 2010 at 8:50 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> On Sat, Aug 28, 2010 at 7:14 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> as per subj. FATE passes on x86-32/64 OSX and this patch fixes
>> fate-vp6 on Win64 (which currently fails because of unmarked
>> clobbering of xmm registers). There's some nice optimizations that
>> could be done after this is applied, e.g. adding rv40 ssse3 mc should
>> be easy-as-hell, but all that is left for later.
>>
>> Since this doesn't change the SIMD code in any significant way, I
>> didn't profile it, but I can do that if preferred.
>
> Just to make sure:
>
> after:
> 437 dezicycles in w=8, 8388464 runs, 144 skips
> 526 dezicycles in w=4, 524274 runs, 14 skips
>
> 436 dezicycles in w=8, 8388489 runs, 119 skips
> 525 dezicycles in w=4, 524277 runs, 11 skips
>
> 444 dezicycles in w=8, 8388392 runs, 216 skips
> 530 dezicycles in w=4, 524262 runs, 26 skips
>
> 435 dezicycles in w=8, 8388455 runs, 153 skips
> 522 dezicycles in w=4, 524277 runs, 11 skips
>
> 442 dezicycles in w=8, 8388452 runs, 156 skips
> 530 dezicycles in w=4, 524280 runs, 8 skips
>
> before:
> 454 dezicycles in w=8, 8388477 runs, 131 skips
> 566 dezicycles in w=4, 524276 runs, 12 skips
>
> 448 dezicycles in w=8, 8388482 runs, 126 skips
> 571 dezicycles in w=4, 524278 runs, 10 skips
>
> 450 dezicycles in w=8, 8388485 runs, 123 skips
> 568 dezicycles in w=4, 524274 runs, 14 skips
>
> 450 dezicycles in w=8, 8388466 runs, 142 skips
> 568 dezicycles in w=4, 524273 runs, 15 skips
>
> 449 dezicycles in w=8, 8388475 runs, 133 skips
> 564 dezicycles in w=4, 524266 runs, 22 skips
>
> So it's actually microscopically faster than the inline asm. No idea
> why, I didn't change much, if anything at all...
Applied...
Ronald
More information about the ffmpeg-devel
mailing list