[FFmpeg-devel] [PATCH] SSE RDFT
Måns Rullgård
mans
Sat Mar 20 23:07:14 CET 2010
Alex Converse <alex.converse at gmail.com> writes:
> 2010/3/20 M?ns Rullg?rd <mans at mansr.com>
>
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>
>> > On Sun, Mar 14, 2010 at 3:23 PM, Alex Converse <alex.converse at gmail.com>
>> wrote:
>> >> I'm sure I've made some embarrassingly amateurish mistakes here.
>> >> Feedback is more than welcome.
>> >>
>> >> --Alex
>> >
>> > In the interests of getting away from discussions about yasm and into
>> > actually reviewing the asm...
>> >
>> > +///sign mask of RDFT sine terms
>> >
>> > Three / ?
>> >
>> > Looking at the asm overall, it looks like there's a huge amount of
>> > moving stuff around and very little actual calculation. Is there no
>> > better way to organize it?
>> >
>> > + "movlps (%4,%0,4), %%xmm4 \n\t"
>> > + "unpcklps %%xmm4, %%xmm4 \n\t"
>> > + "movlps (%5,%0,4), %%xmm3 \n\t"
>> > + "unpcklps %%xmm3, %%xmm3 \n\t"
>> >
>> > This looks like a candidate for movsldup in an SSE3 version.
>>
>> Well?
>>
>
> Sorry, I've been a little tied up trying to finish up PS.
>
> There is a lot of data shuffling in here. One potential reduction is
> reorganizing the trig tables but keeping extra trig tables around is always
> a bit controversial.
FWIW, the NEON FFT uses interleaved trig tables.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list