[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize hscale_fast
Lauri Kasanen
cand at gmx.com
Tue Apr 30 14:38:28 EEST 2019
On Wed, 24 Apr 2019 14:02:16 +0300
Lauri Kasanen <cand at gmx.com> wrote:
> ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
> -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw
>
> 4.27 speedup for hyscale_fast:
> 24796 UNITS in hyscale_fast, 4096 runs, 0 skips
> 5797 UNITS in hyscale_fast, 4096 runs, 0 skips
>
> 4.48 speedup for hcscale_fast:
> 19911 UNITS in hcscale_fast, 4095 runs, 1 skips
> 4437 UNITS in hcscale_fast, 4096 runs, 0 skips
>
> Signed-off-by: Lauri Kasanen <cand at gmx.com>
> ---
> libswscale/ppc/swscale_vsx.c | 196 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 196 insertions(+)
>
> This has the same limit as the x86 version, same width or larger only.
> Shrinking would require a gather load, which doesn't exist on PPC and is slow
> even on x86 AVX. I tried a manual gather load, and the vector function was 20%
> slower than C.
Applying.
- Lauri
More information about the ffmpeg-devel
mailing list