[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Ronald S. Bultje
rsbultje
Mon Jan 31 21:19:43 CET 2011
Hi,
On Mon, Jan 31, 2011 at 2:53 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Mon, 31 Jan 2011, Justin Ruggles wrote:
>
>> I get some very weird mmx2 results when I remove the first sub and
>> change jae to ja.
>>
>> Athlon64 X2 6000+
>> sse2: 3006 -> 2753
>> mmx2: 5228 -> 5453
>> ?mmx: 5490 -> 5430
>>
>> Atom 330
>> sse2: ?6834 -> 3779
>> mmx2: ?9951 -> 10525
>> ?mmx: 11390 -> 11325
>>
>> Both CPUs are consistent in the change, except that on Athlon64 the mmx2
>> version is slower than the mmx version. ?What do you suggest?
>
> I usually blame such weird results on code alignment, but I have no
> systematic way to fix them.
Same here, try adding an ALIGN <num> (8 or 16) directly before a loop
statement, or disassemble before/after and see where alignment could
cause issues.
Ronald
More information about the ffmpeg-devel
mailing list