[FFmpeg-devel] [PATCH] 8-bit hevc decoding optimization on aarch64 with neon
Carl Eugen Hoyos
ceffmpeg at gmail.com
Sat Nov 18 19:50:08 EET 2017
2017-11-18 18:35 GMT+01:00 Rafal Dabrowa <fatwildcat at gmail.com>:
> For performance testing the following command was used:
>
> time ./ffmpeg -hide_banner -i ~/bbb-1280x720-cfg06.mkv -f yuv4mpegpipe - >/dev/null
An alternative is:
./ffmpeg -benchmark -i ~/bbb-1280x720-cfg06.mkv -f null -
> The video file was pre-read before test to minimize disk reads during testing.
> Program execution time without optimization was as follows:
>
> real 11m48.576s
> user 43m8.111s
> sys 0m12.469s
>
> Execution time with optimizations:
>
> real 6m17.046s
> user 21m19.792s
> sys 0m14.724s
Looks impressive.
> +av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth)
> +{
> + int cpu_flags = av_get_cpu_flags();
> +
> + if (have_neon(cpu_flags) && bit_depth == 8) {
> + NEON8_FNASSIGN(c->put_hevc_epel, 0, 0, pel_pixels);
> + NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h);
> + NEON8_FNASSIGN(c->put_hevc_epel, 1, 0, epel_v);
> + NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv);
> + NEON8_FNASSIGN(c->put_hevc_epel_uni, 1, 0, epel_uni_v);
> + NEON8_FNASSIGN(c->put_hevc_epel_uni, 1, 1, epel_uni_hv);
> + NEON8_FNASSIGN(c->put_hevc_epel_bi, 0, 0, pel_bi_pixels);
> + NEON8_FNASSIGN(c->put_hevc_epel_bi, 0, 1, epel_bi_h);
> + NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 0, epel_bi_v);
> + NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 1, epel_bi_hv);
> + NEON8_FNASSIGN(c->put_hevc_qpel, 0, 0, pel_pixels);
> + NEON8_FNASSIGN(c->put_hevc_qpel, 0, 1, qpel_h);
> + NEON8_FNASSIGN(c->put_hevc_qpel, 1, 0, qpel_v);
> + NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv);
> + NEON8_FNASSIGN(c->put_hevc_qpel_uni, 0, 1, qpel_uni_h);
> + NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 0, qpel_uni_v);
> + NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv);
> + NEON8_FNASSIGN(c->put_hevc_qpel_bi, 0, 0, pel_bi_pixels);
> + NEON8_FNASSIGN(c->put_hevc_qpel_bi, 0, 1, qpel_bi_h);
> + NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 0, qpel_bi_v);
> + NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 1, qpel_bi_hv);
I wonder if it would have made sense to test and send that patches
in smaller portions, so that those with possible improvements
can be identified.
Thank you, Carl Eugen
More information about the ffmpeg-devel
mailing list