[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER
Guillaume POIRIER
poirierg
Sun Nov 5 16:50:10 CET 2006
Hi,
On 11/4/06, skal <skal65535 at orange.fr> wrote:
>
> Hi everybody,
>
> in case, it seems to me a SSE version of
> H263_LOOP_FILTER is possible by replacing
> "psubusb %%mm4, %%mm2 \n\t"\
> "movq %%mm2, %%mm3 \n\t"\
> "psubusb %%mm4, %%mm3 \n\t"\
> "psubb %%mm3, %%mm2 \n\t"\
> at dsputil_mmx.c:587 (fresh cvs), by:
> "psubusb %%mm4, %%mm2 \n\t"\
> "pminub %%mm4, %%mm2 \n\t"\
>
> +maybe a little re-org of the loop (mm3 is gone).
Please send patch, I'll try to benchmark the speed change.
Note that movq is very slow on P4, so any code that removes
mov(q|dqu|..) provides an interesting speed-up.
> Well, this is just for the fun of it, since the speed-up
> (if any) might not be worth a special version...
Once I have a patch to play with, I can benchmark it on P4, PM, and K8... :)
> (gotta love these saturated instructions. All of h263's
> UpDownRamp() with 2 instructions is quite fun)
Mmmm... grep -r "UpDownRamp" libav* doesn't return anything here, as
well as in google code search.
What kind of code are you referring to?
Guillaume
--
With DADVSI (http://en.wikipedia.org/wiki/DADVSI), France finally has
a lead on USA on selling out individuals right to corporations!
Vive la France!
More information about the ffmpeg-devel
mailing list