[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Justin Ruggles
justin.ruggles
Mon Jan 31 19:18:18 CET 2011
On 01/31/2011 12:21 AM, Loren Merritt wrote:
>> +cglobal ac3_exponent_min_%1, 3,4,2, exp, reuse_blks, expn, offset
>> + cmp reuse_blksq, 0
>
> shl sets flags.
ok, this does fine with:
shl reuse_blksq, 8
jz .end
>> + je .end
>> + sub expnq, mmsize
>> + shl reuse_blksq, 8
>> +.nextexp:
>> + mov offsetq, reuse_blksq
>> + mova m0, [expq+offsetq]
>> + sub offsetq, 256
>> +.nextblk:
>> + PMINUB m0, [expq+offsetq], m1
>> + sub offsetq, 256
>> + jae .nextblk
>> + mova [expq], m0
>> + add expq, mmsize
>> + sub expnq, mmsize
>> + jae .nextexp
>
> ja, and remove the first sub
I get some very weird mmx2 results when I remove the first sub and
change jae to ja.
Athlon64 X2 6000+
sse2: 3006 -> 2753
mmx2: 5228 -> 5453
mmx: 5490 -> 5430
Atom 330
sse2: 6834 -> 3779
mmx2: 9951 -> 10525
mmx: 11390 -> 11325
Both CPUs are consistent in the change, except that on Athlon64 the mmx2
version is slower than the mmx version. What do you suggest?
-Justin
More information about the ffmpeg-devel
mailing list