[FFmpeg-devel] [PATCH 2/2] tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}
Christophe Gisquet
christophe.gisquet at gmail.com
Tue Feb 11 08:16:36 CET 2014
Hi,
2014-02-11 6:02 GMT+01:00 James Almer <jamrial at gmail.com>:
>> What did however affect speed negatively was calling the asm functions using
>> all seven elements from TTAFilter as arguments as i mentioned I'd do in my
>> previous email. I lost about 10 cycles on Win64 and 38 on Win32 just by doing
>> that.
>> I assume this is because of the prologue code in x86inc.
>>
>> I'll send an updated patch soon. If you find any dependencies please tell so.
>>
>
> New patchset sent. Kinda bummed at the loss of performance for using seven
> general purpose registers for the arguments, but if it's safer then it can't
> be helped.
Well, I don't feel confident, but it makes sense it works. I don't
know what opinion other people have, nor a way to mitigate a potential
issue. I fear leaving a comment along the declaration of the TTA*
struct about the need for a total size multiple of 16, and making sure
the tables addresses in TTAFilter remain aligned might help, but not
failproof at all.
Regarding the register dependency, it's minor compared to what I
missed, but it may depend on how good the CPU you test on is good at
out of order execution, so it's always good to keep that in mind.
Thanks for the reordering, it's much cleaner now. Also you can use
SWAP to "virtually" rename registers so as to keep consistency between
code path.
--
Christophe
More information about the ffmpeg-devel
mailing list