[FFmpeg-devel] [PATCH] avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputs
Ronald S. Bultje
rsbultje at gmail.com
Sun Oct 4 23:49:10 CEST 2015
Hi,
On Sun, Oct 4, 2015 at 3:46 PM, Paul B Mahol <onemda at gmail.com> wrote:
> + .loop:
> + movd m10, [ana_matrix_rq+ 0]
> + movd m11, [ana_matrix_rq+ 4]
> + movd m12, [ana_matrix_rq+ 8]
> + movd m13, [ana_matrix_rq+12]
> + movd m14, [ana_matrix_rq+16]
> + movd m15, [ana_matrix_rq+20]
> + pshufd m10, m10, q0000
> + pshufd m11, m11, q0000
> + pshufd m12, m12, q0000
> + pshufd m13, m13, q0000
> + pshufd m14, m14, q0000
> + pshufd m15, m15, q0000
>
[..]
> + movd m10, [ana_matrix_bq+ 0]
> + movd m11, [ana_matrix_bq+ 4]
> + movd m12, [ana_matrix_bq+ 8]
> + movd m13, [ana_matrix_bq+12]
> + movd m14, [ana_matrix_bq+16]
> + movd m15, [ana_matrix_bq+20]
> + pshufd m10, m10, q0000
> + pshufd m11, m11, q0000
> + pshufd m12, m12, q0000
> + pshufd m13, m13, q0000
> + pshufd m14, m14, q0000
> + pshufd m15, m15, q0000
>
So, you want more registers, right? :-D. OK, so let's talk stack usage. you
want aligned stack here to put all these constants so you don't need to
recreate them in each loop cycle iteration.
change:
cglobal name, n_args, n_gprs, n_xmms, arg1, arg2, arg3
to:
cglobal name, n_args, n_gprs, n_xmms, aligned_memory_in_bytes, arg1, arg2,
arg3
In your case, add memory of 6*mmsize*3.
Now, in the function, prepare the stack space first:
movd m10, [ana_matrix_rq+0]
[etc for the other r args]
pshufd m10, m10, q0000
[etc for the other r args]
mova [rsp+mmsize*0], m10
[etc for the others into rsp+mmsize*1-5]
now do the same for g/b in mmsize*6-11 and 12-17
Now as pshufb argument, use [rsp+mmsize*0-17].
> + packusdw m1, m1
> + packuswb m1, m1
> + pshufb m7, m1, [rshuf]
Try to do r/g/b all at the same time (especially now that you have more
registers available since m10-15 are free), and packusdw r/g together, and
then packuswb r/g and b/nothing together, so that you have a single output
register instead of 3. That saves you the pors at the end also.
Ronald
More information about the ffmpeg-devel
mailing list