[FFmpeg-devel] [PATCH] Add filter_limit to VP3 loop filter paramaters
David Conrad
lessen42
Sat Oct 4 12:34:19 CEST 2008
On Oct 4, 2008, at 5:05 AM, Jason Garrett-Glaser wrote:
> On Sat, Oct 4, 2008 at 1:49 AM, Diego Biurrun <diego at biurrun.de>
> wrote:
>> On Sat, Oct 04, 2008 at 12:17:03AM -0400, David Conrad wrote:
>>>
>>> The C implementation of the VP3 loop filter uses a table of pre-
>>> calculated values for the filter function. This doesn't work so
>>> well in
>>> SIMD; it's faster to calculate the function with the limit value.
>>>
>>> Implementing the C version without the table is slightly faster on
>>> my
>>> Penryn, but about 20% slower on P4, so it's probably not worth it to
>>> replace the C version to get rid of the table completely.
>>
>> Depending on how much speedup you get elsewhere I think you can
>> ignore
>> P4 benchmark numbers. The P4 is a deadborn child.
>
> Indeed. If you have to take into account Pentium 4, you have to write
> custom versions of all of your assembly to use pshufw instead of movq
> for moves between registers, because on pentium 4 moves between SIMD
> registers take an absurd 6 cycles.
>
> And since I don't think anyone is willing to do that, we can pretty
> much ignore it in most all other code, too.
I mentioned P4 simply because that was the only other cpu I had access
to at the time; these functions only matter for non-x86 computers
anyways once we have MMX versions of them.
At any rate I'll have access to a PPC later today that I'll try with.
More information about the ffmpeg-devel
mailing list