[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions
Michael Niedermayer
michaelni
Fri Jan 2 20:51:20 CET 2009
On Fri, Jan 02, 2009 at 02:03:48PM -0500, Jason Garrett-Glaser wrote:
> $subject
>
> Benchmarks:
>
> Cathedral:
> idct_add16: 293 -> 282 clocks
> idct_add16intra: 343 -> 257 clocks
>
> "300" sample (contains almost no i16x16 blocks so I didn't test add16intra):
> idct_add16: 518 -> 433
>
> Higher benefit is due to higher bitrate, most likely.
>
> idct_DC was ommitted from idct_add16 because the extra branching logic
> turned out to make it significantly slower (the branching becomes much
> more complicated and less likely as *both* 4x4 DCT blocks have to be
> DC-only for it to work).
>
> x264 iDCT code was modified to add a stride parameter, required for ffh264.
>
> x86util.asm was included from x264 in full for simplicity's sake and
> ease of use for adding future x264 assembly that uses it.
[...]
> Index: libavcodec/x86/h264dsp_mmx.c
> ===================================================================
> --- libavcodec/x86/h264dsp_mmx.c (revision 16408)
> +++ libavcodec/x86/h264dsp_mmx.c (working copy)
> @@ -472,6 +472,79 @@
> }
> }
>
> +#ifdef HAVE_YASM
> +static void ff_h264_idct_dc_add8_mmx2(uint8_t *dst, int16_t *block, int stride)
> +{
> + int dc0 = (block[ 0] + 32) >> 6;
> + int dc1 = (block[16] + 32) >> 6;
> + __asm__ volatile(
> + "movd %0, %%mm0 \n\t"
> + "movd %1, %%mm2 \n\t"
> + "pshufw $0, %%mm0, %%mm0 \n\t"
> + "pshufw $0, %%mm2, %%mm2 \n\t"
> + "pxor %%mm1, %%mm1 \n\t"
> + "pxor %%mm3, %%mm3 \n\t"
> + "psubw %%mm0, %%mm1 \n\t"
> + "psubw %%mm2, %%mm3 \n\t"
> + "packuswb %%mm2, %%mm0 \n\t"
> + "packuswb %%mm3, %%mm1 \n\t"
> + ::"r"(dc0),
> + "r"(dc1)
> + );
a random idea: (untested and ignore if slower)
movd "block[ 0]", %%mm0 // 0 0 X D
punpcklwd "block[16]", %%mm0 // x X d D
paddsw "32", %%mm0
psraw $6, %%mm0
punpcklwd %%mm0, %%mm0 // d d D D
pxor %%mm1, %%mm1 // 0 0 0 0
psubw %%mm0, %%mm1 // -d-d-D-D
packuswb %%mm1, %%mm0 // -d-d-D-D d d D D
pshufw $0xFA, %%mm0, %%mm1 // -d-d-d-d-D-D-D-D
punpcklwd %%mm0, %%mm0 // d d d d D D D D
except that, patch ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090102/74e5d575/attachment.pgp>
More information about the ffmpeg-devel
mailing list