[FFmpeg-devel] [PATCH] avfilter/vf_w3fdif: add x86 SIMD
James Almer
jamrial at gmail.com
Fri Oct 9 20:06:44 CEST 2015
On 10/9/2015 1:44 PM, Paul B Mahol wrote:
> +cglobal w3fdif_complex_low, 4, 7, 9, 0, work_line, in_lines_cur0, coef, linesize
> + movq m3, [coefq]
> + DEFINE_ARGS work_line, in_lines_cur0, in_lines_cur1, linesize, offset, in_lines_cur2, in_lines_cur3
> + SPLATW m0, m3, 0
> + SPLATW m1, m3, 1
> + SPLATW m2, m3, 2
> + SPLATW m3, m3, 3
> + SBUTTERFLY wd, 0, 1, 7
> + SBUTTERFLY wd, 2, 3, 7
Looking at this again, m0 and m1 end up having the same data. And so do m2
and m3. No need for the sbutterfly to interleave the coeffs. You just splat
two of them per register.
movq m0, [coefq+0]
pshufd m2, m0, q1111
SPLATD m0
And since you're saving two regs with this you can enable the function for
x86_32.
> + mov offsetq, 0
> + mov in_lines_cur3q, [in_lines_cur0q+gprsize*3]
> + mov in_lines_cur2q, [in_lines_cur0q+gprsize*2]
> + mov in_lines_cur1q, [in_lines_cur0q+gprsize]
> + mov in_lines_cur0q, [in_lines_cur0q]
> +
> +.loop
> + movh m4, [in_lines_cur0q+offsetq]
> + movh m5, [in_lines_cur1q+offsetq]
> + pxor m7, m7
You can zero this outside the loop without worrying about overwriting it.
It will be one pxor total instead of two per loop.
> + punpcklbw m4, m7
> + punpcklbw m5, m7
> + SBUTTERFLY wd, 4, 5, 7
Use any free reg here and below for the fourth argument to avoid overwriting
the zeroed one.
> + pmaddwd m4, m0
> + pmaddwd m5, m1
Use m0 for both here, of course.
> + movh m6, [in_lines_cur2q+offsetq]
> + movh m8, [in_lines_cur3q+offsetq]
> + pxor m7, m7
> + punpcklbw m6, m7
> + punpcklbw m8, m7
> + SBUTTERFLY wd, 6, 8, 7
> + pmaddwd m6, m2
> + pmaddwd m8, m3
And m2 here (or make it m1).
> + paddd m4, m6
> + paddd m5, m8
> + mova [work_lineq+offsetq*4], m4
> + mova [work_lineq+offsetq*4+mmsize], m5
> + add offsetq, mmsize/2
> + sub linesized, mmsize/2
> + jg .loop
> +REP_RET
The same can be done for complex_high (even if it's not enough to get it
working on x86_32).
More information about the ffmpeg-devel
mailing list