[FFmpeg-devel] [PATCH] 8-bit hevc decoding optimization on aarch64 with neon

Carl Eugen Hoyos ceffmpeg at gmail.com
Sat Nov 18 19:50:08 EET 2017


2017-11-18 18:35 GMT+01:00 Rafal Dabrowa <fatwildcat at gmail.com>:

> For performance testing the following command was used:
>
>     time ./ffmpeg -hide_banner -i ~/bbb-1280x720-cfg06.mkv -f yuv4mpegpipe - >/dev/null

An alternative is:
./ffmpeg -benchmark -i ~/bbb-1280x720-cfg06.mkv -f null -

> The video file was pre-read before test to minimize disk reads during testing.
> Program execution time without optimization was as follows:
>
> real    11m48.576s
> user    43m8.111s
> sys     0m12.469s
>
> Execution time with optimizations:
>
> real    6m17.046s
> user    21m19.792s
> sys     0m14.724s

Looks impressive.


> +av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth)
> +{
> +    int cpu_flags = av_get_cpu_flags();
> +
> +    if (have_neon(cpu_flags) && bit_depth == 8) {
> +        NEON8_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels);
> +        NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h);
> +        NEON8_FNASSIGN(c->put_hevc_epel, 1, 0, epel_v);
> +        NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv);
> +        NEON8_FNASSIGN(c->put_hevc_epel_uni, 1, 0, epel_uni_v);
> +        NEON8_FNASSIGN(c->put_hevc_epel_uni, 1, 1, epel_uni_hv);
> +        NEON8_FNASSIGN(c->put_hevc_epel_bi, 0, 0, pel_bi_pixels);
> +        NEON8_FNASSIGN(c->put_hevc_epel_bi, 0, 1, epel_bi_h);
> +        NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 0, epel_bi_v);
> +        NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 1, epel_bi_hv);
> +        NEON8_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels);
> +        NEON8_FNASSIGN(c->put_hevc_qpel, 0, 1, qpel_h);
> +        NEON8_FNASSIGN(c->put_hevc_qpel, 1, 0, qpel_v);
> +        NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_uni, 0, 1, qpel_uni_h);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 0, qpel_uni_v);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_bi, 0, 0, pel_bi_pixels);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_bi, 0, 1, qpel_bi_h);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 0, qpel_bi_v);
> +        NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 1, qpel_bi_hv);

I wonder if it would have made sense to test and send that patches
in smaller portions, so that those with possible improvements
can be identified.

Thank you, Carl Eugen


More information about the ffmpeg-devel mailing list