[FFmpeg-devel] [PATCH 0/5][TESTERS WANTED] Improve startcode search

Sun Jun 2 01:47:14 EEST 2019

This patch is about improving the startcode search in
libavcodec/startcode.c; this is used by the H.264 and the VC-1 parsers.
(In this context, "startcode" always means the MPEG-1/2/4, H.264/5,
VC-1 startcode 0x00 0x00 0x01, potentially with another leading zero.)
There are currently three things to improve about it:

1. It doesn't really find startcodes, but searches for zeros and lets
the caller weed out the real startcodes from it. This leads to lots
(millions per GB of video parsed) of unnecessary function calls with the
accompanying overhead.
2. It uses a suboptimal pattern for its search; improving it can improve
performance.
3. If HAVE_FAST_UNALIGNED is false and there is no system-dependent
startcode search function available (ARMV6 is currently the only system
with its own search function), it resorts to checking each byte
one-by-one and is therefore very slow.

I have solved all these three issues. At first, I wanted to keep using
not necessarily aligned reads if HAVE_FAST_UNALIGNED is true, but my
benchmarks showed that even then aligned reads turned out to be
superior, so the new implementation uses aligned reads for all platforms
regardless of HAVE_FAST_ALIGNED. This allowed so simplify the code a
bit. You can take a look at the older version at [1].

The alignment check is actually simple: Make sure that a pointer, when
cast to uintptr_t, is divisible by 4 resp. 8. But given that the C
standard leaves the relationship between pointer and uintptr_t mostly
undefined (the only guarantee is that after casting a pointer to void to
uintptr_t and back to void* the result compares equal to the original
pointer) I'd encourage if someone tested this on systems where unaligned
accesses lead to crashes or to abysmal performance. It would also be
nice if someone could complement my x64 benchmarks with benchmarks for
other systems. (Remember: For benchmarks on ARM V6 one should comment
out lines 114-117 in libavcodec/arm/h264dsp_init_arm.c (and lines 31-34
in libavcodec/arm/vc1dsp_init_arm.c if one wants to test via the VC-1
parser) to disable the platform-specific startcode-search functions.
I am actually curious how my version fares against the hand-written
assembly version.)

One should not use a container like Matroska to test this, because
in this case every block contains a whole frame (so that the startcode
search isn't used in such situations). Use e.g. transport streams.
And when benchmarking, one should not benchmark calls to the
startcode_find_candidate function directly (because the current code
returns lots of false positives and this patchset changes this), but
rather the calls to h264_find_frame_end or to h264_parse. 

- Andreas

PS: Thanks to Mark for testing the earlier version [1] of this
patchset on an ARM device where (so he thought) unaligned accesses would
lead to (or rather: can be configured to) SIGBUS; although he has
encountered no issues with my patch, he thinks that the CPU fixes up
four byte unaligned accesses by itself whereas eight-byte unaligned
accesses trap as expected. So further testing would be good.   

[1]: https://github.com/mkver/FFmpeg/commits/start_3

Andreas Rheinhardt (5):
  startcode: Use common macro
  startcode: Switch to aligned reads
  startcode: Stop overreading
  startcode: Don't return false positives
  startcode: Filter out non-startcodes earlier

 libavcodec/h264dsp.h   |   7 +--
 libavcodec/startcode.c | 128 ++++++++++++++++++++++++++++++++++-------
 libavcodec/vc1dsp.h    |   6 +-
 3 files changed, 112 insertions(+), 29 deletions(-)

-- 
2.21.0