[FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

Ronald S. Bultje rsbultje at gmail.com
Fri Jan 15 04:21:08 CET 2016


Hi,

On Thu, Jan 14, 2016 at 9:55 PM, James Almer <jamrial at gmail.com> wrote:

> On 1/14/2016 11:05 PM, James Darnley wrote:
> > 2.6 times faster
> > ---
> > I have one question now.  Should I make the function name match the
> assembly
> > existing deblock/loop filter functions?  I took the current name from
> the C (as
> > I was originally trying to use a gather instruction but that didn't
> offer any
> > benefit).
> > ---
> >  libavcodec/x86/h264_deblock.asm | 40
> ++++++++++++++++++++++++++++++++++++++++
> >  libavcodec/x86/h264dsp_init.c   |  4 ++++
> >  2 files changed, 44 insertions(+)
> >
> > diff --git a/libavcodec/x86/h264_deblock.asm
> b/libavcodec/x86/h264_deblock.asm
> > index 5151f3c..20f0814 100644
> > --- a/libavcodec/x86/h264_deblock.asm
> > +++ b/libavcodec/x86/h264_deblock.asm
> > @@ -864,7 +864,47 @@ ff_chroma_inter_body_mmxext:
> >      DEBLOCK_P0_Q0
> >      ret
> >
> > +cglobal h264_h_loop_filter_chroma422_8, 5, 7, 8, mmsize +
> ARCH_X86_64*2*mmsize
>
> This will not work with x86_32 compilers that don't have aligned stack
> (Like msvc)
> because r6 is needed to store the stack pointer.


If you don't need r%dm (looks like you don't, but didn't check
exhaustively), you can also use a negative stack size (0 - mmsize -
ARCH_X86_64 * 2 * mmsize), then it will not create a stack pointer.

Ronald


More information about the ffmpeg-devel mailing list