[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder
Mon Sep 1 05:03:41 CEST 2008
On Thu, Aug 28, 2008 at 10:06:45PM -0400, David Conrad wrote:
> Attached gives me about a 45% faster overall DCA decode on my penryn. Name
> suggestions for the function welcome.
> Regression tests pass, and I get bit-identical output.
> 81883 dezicycles in ff_dca_qmf_mul_c, 16380 runs, 4 skips
> 81067 dezicycles in ff_dca_qmf_mul_c, 32761 runs, 7 skips
> 82178 dezicycles in ff_dca_qmf_mul_c, 65528 runs, 8 skips
> 82789 dezicycles in ff_dca_qmf_mul_c, 131051 runs, 21 skips
> 11990 dezicycles in ff_dca_qmf_mul_sse, 16270 runs, 114 skips
> 12518 dezicycles in ff_dca_qmf_mul_sse, 32538 runs, 230 skips
> 12260 dezicycles in ff_dca_qmf_mul_sse, 65126 runs, 410 skips
> 12254 dezicycles in ff_dca_qmf_mul_sse, 130235 runs, 837 skips
nice, but as you probably already know, my highlevel optimizations
broke your patch.
If you want to update it, also look at ff_mpa_synth_filter() which performs
the same windowing operation but with a quite different implementation, i
do not know which way is more efficient in SIMD, actually i dont know which
is better for C either ...
Also it would be interresting to add a float "ff_mpa_synth_filter" that
would make our mp3 deceder probably faster on normal desktop systems.
We just are missing a float 32point (type II) dct for that, else the code
from dca could be shared as is.
Basically what iam suggesting is that our mp3 decoder should get a
complete float decoding path in addition to the fixed point path ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel