[FFmpeg-devel] [PATCH 09/10] lavc/flacenc: add sse4 version of the 32-bit lpc encoder
Clément Bœsch
u at pkh.me
Wed Feb 12 07:49:19 CET 2014
On Wed, Feb 12, 2014 at 12:11:21AM +0100, James Darnley wrote:
> From 1.3 to 2.5 times faster. Runtime reduced by 4 to 58%. As with the
> 16-bit version the speed-up generally increases with compression_level.
>
> Also like the 16-bit version, it is not used with levels less than 3.
> ---
> libavcodec/x86/flac_dsp_gpl.asm | 97 +++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/flacdsp_init.c | 5 ++
> 2 files changed, 102 insertions(+), 0 deletions(-)
>
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
> index 9e9249a..e36c76b 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -22,6 +22,14 @@
>
> %include "libavutil/x86/x86util.asm"
>
> +SECTION_RODATA
> +
> +pd_0_int_min: times 2 dd 0, -2147483648
> +pq_int_min: times 2 dq -2147483648
> +pq_int_max: times 2 dq 2147483647
> +
> +SECTION .text
> +
> INIT_XMM sse4
> %if ARCH_X86_64
> cglobal flac_enc_lpc_16, 6, 8, 8, 0, res, smp, len, order, coefs, shift
> @@ -89,3 +97,92 @@ movd m3, shiftmp
> sub lenmp, (3*mmsize)/4
> jg .looplen
> RET
> +
> +%macro PMINSQ 3
> + mova %3, %2
> + pcmpgtq %3, %1
pcmpgtq %3, %2, %1
> + pand %1, %3
> + pandn %3, %2
> + por %1, %3
> +%endmacro
> +
> +%macro PMAXSQ 3
> + mova %3, %1
> + pcmpgtq %3, %2
ditto
> + pand %1, %3
> + pandn %3, %2
> + por %1, %3
> +%endmacro
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140212/fa2298d5/attachment.asc>
More information about the ffmpeg-devel
mailing list