[FFmpeg-devel] [Updated PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

Wed Apr 23 15:59:19 CEST 2014

On Wed, 23 Apr 2014 03:26:22 +0100, Michael Niedermayer <michaelni at gmx.at> wrote:
> is it faster to do all the steps intermingled ?
> iam asking because the code should be simpler if it just uses
> the optimized start code search and optimized header parsing
> while maintaining the current structure
>
> for example the header parsing could be optmized like below:

OK, I've tried out your patch, and I also tried converting the start code
searches in find_next_marker and vc1_find_frame_end to use the fast
search function. The times (filtered to include only VC-1 functions) look
like this:

       Before        MN version    MN + fast search  BA version
M2TS  250.0 ± 11.2  160.9 ±  7.3  47.6 ± 9.0        27.2 ± 3.4
MKV   149.0 ± 12.8   70.8 ± 11.2  17.6 ± 4.7         1.7 ± 0.8

In other words, yes there still seems to be a significant speed
improvement from mixing the steps together. I suspect this comes down to
the fact that the buffers that are used with real-world streams tend to
bigger than even the L2 cache on the ARM11.

Ben