[FFmpeg-devel] [FFmpeg-Devel] [GSoC] [PATCH 4/4] Added asm file for deblocking filter

James Almer jamrial at gmail.com
Mon Mar 23 00:38:54 CET 2015


On 22/03/15 4:48 PM, Tucker DiNapoli wrote:
> From: Tucker DiNapoli <T.DiNapoli42 at gmail.com>

[...]

> +;; Utility functions, maybe these should be macros for speed.
> +
> +;; void duplicate(uint8 *src, int stride)
> +;; duplicate block_size pixels 5 times upwards
> +cglobal duplicate, 2, 2, 1
> +    neg r2
> +    mova m0, [r1]

First gprs is r0, not r1, which is why r7 failed on x86_32 as Michael pointed out (r7 on x86_32 would be 
the stack pointer so the %define is not created).

[...]

> +%macro gen_deblock 0
> +;; This is a version of do_a_deblock that should work for mmx,sse and avx
> +;; on x86 and x85_64.
> +cglobal do_a_deblock, 5, 7, 7 ;src, step, stride, ppcontext, mode
> +    ;;alignment check:
> +    ;; there might be a better way to do this
> +    mov r6, mmsize
> +    and r6, rsp
> +    jz .aligned
> +    sub rsp, r6
> +.aligned:
> +    sub rsp, (22*mmsize)+gprsize
> +    mov [rsp + 22*mmsize], r6

If you need to allocate space on stack to store stuff, write the amount of bytes you need as the fifth 
parameter for cglobal, after amount of xmm regs. x86inc will take care of updating the stack pointer and 
dealing with alignment on platforms it knows it's not guaranteed to be aligned (msvc for example).
For this case it could be "cglobal do_a_deblock, 5, 7, 7, 22*mmsize" or such.

Read the documentation about PROLOGUE and cglobal in x86inc.asm for more info and the constrains you'll 
have to keep in mind (Extra reg needed to store the original pointer if stack is not aligned, INIT_YMM 
forcing the usage of said reg regardless of platform since stack is only guaranteed to be 16-byte aligned 
by default, etc).


More information about the ffmpeg-devel mailing list