[FFmpeg-devel] [PATCH v2] swscale/ppc: VSX-optimize yuv2rgb_full

Wed Mar 20 20:38:34 EET 2019

On Wed, Mar 20, 2019 at 04:06:45PM +0200, Lauri Kasanen wrote:
> ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
>         -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
>         -cpuflags 0 -v error -
> 
> This uses 32-bit mul, so POWER8 only.
> 
> The following output formats get about 4.5x speedup:
> 
> rgb24
>   39980 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8774 UNITS in yuv2packed1,   32768 runs,      0 skips
> bgr24
>   40069 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8772 UNITS in yuv2packed1,   32766 runs,      2 skips
> rgba
>   39759 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8681 UNITS in yuv2packed1,   32767 runs,      1 skips
> bgra
>   39729 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8696 UNITS in yuv2packed1,   32766 runs,      2 skips
> argb
>   39766 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8672 UNITS in yuv2packed1,   32766 runs,      2 skips
> bgra
>   39784 UNITS in yuv2packed1,   32768 runs,      0 skips
>    8659 UNITS in yuv2packed1,   32767 runs,      1 skips
> 
> Signed-off-by: Lauri Kasanen <cand at gmx.com>
> ---
>  libswscale/ppc/swscale_vsx.c | 291 ++++++++++++++++++++++++++++++++++++
> +++++++ 1 file changed, 291 insertions(+)
> 
> v2: HAVE_POWER8 from ifdef to if
> 
> diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c
> index 01eb46c..062ab0d 100644
> --- a/libswscale/ppc/swscale_vsx.c
> +++ b/libswscale/ppc/swscale_vsx.c
> @@ -422,6 +422,248 @@ yuv2NBPSX(16, BE, 1, 16, int32_t)
>  yuv2NBPSX(16, LE, 0, 16, int32_t)
>  #endif
> 
> +static av_always_inline void
> +yuv2rgb_full_1_vsx_template(SwsContext *c, const int16_t *buf0,
> +                     const int16_t *ubuf[2], const int16_t *vbuf[2],
> +                     const int16_t *abuf0, uint8_t *dest, int dstW,
> +                     int uvalpha, int y, enum AVPixelFormat target,
> +                     int hasAlpha)
> +{
> +    const int16_t *ubuf0 = ubuf[0], *vbuf0 = vbuf[0];
> +    const int16_t *ubuf1 = ubuf[1], *vbuf1 = vbuf[1];
> +    vector int16_t vy, vu, vv, A = vec_splat_s16(0), tmp16;
> +    vector int32_t vy32_l, vy32_r, vu32_l, vu32_r, vv32_l, vv32_r,
> tmp32, tmp32_2;
> +    vector int32_t R_l, R_r, G_l, G_r, B_l, B_r;

error: corrupt patch at line 26

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190320/f5afe7b3/attachment.sig>