[FFmpeg-devel] [PATCH 5/7] ARM: NEON optimised H.264 8x8 and 16x16 qpel MC
Ian Caulfield
ian.caulfield
Mon Dec 8 14:54:24 CET 2008
2008/12/5 Mans Rullgard <mans at mansr.com>:
> +
> + vshl.i16 q3, q1, #4
> + vshl.i16 q1, q1, #2
> + vshl.i16 q15, q2, #2
> + vadd.i16 q1, q1, q3
> + vadd.i16 q2, q2, q15
> +
> + vshl.i16 q3, q9, #4
> + vshl.i16 q9, q9, #2
> + vshl.i16 q15, q10, #2
> + vadd.i16 q9, q9, q3
> + vadd.i16 q10, q10, q15
> +
> + vsub.i16 q1, q1, q2
> + vsub.i16 q9, q9, q10
Is this any faster? I don't know what the interlocking will be like,
nor whether you have a spare register to hold the scalar... (or even
if setting up the scalars would make it slower)
vmul.i16 q1, q1, <scalar set to 6>
vmul.i16 q9, q9, <scalar set to 6>
vmls.i16 q1, q2, <scalar set to 3>
vmls.i16 q9, q10, <scalar set to 3>
Ian
More information about the ffmpeg-devel
mailing list