[FFmpeg-devel] Idea about speedup of startcode search
Fri Feb 8 14:35:11 CET 2008
here is an idea about tiny optimization i stumbled over when scanning
for mpeg startcodes, that i want to share.
I was working on some stream correction like ProjectX, and profiling on
a weak architecture (VIA C3) showed that the start code scan in the
GOP-analyse was the weak spot, eating over 50% of CPU time.
This is typically a simple for() loop iterating over a buffer and
checking for zero bytes. H.264 has the same with the ill "emulation
prevention three byte" stuff for NALs, see libavcodec/h264.c line 1393
(SVN of today).
Here one searches for 00 00 03 xx patterns, for mpeg 1/2 or h264 you
often look for 00 00 01 xx patterns or 00 00 00 01.
This can be done in a simple C loop, but gcc does a bad job and uses up
to 7 instructions per scanned byte. On Core 2 Duo measurement shows 2.8
instructions per byte.
This can be brought down to 0.8 cycles per byte (less than 2
instructions per byte) with the following idea:
all startcodes mentioned above have two consecutive zero bytes. To
filter them out, load 8 bytes to a mmx register and check 4x2 bytes for
equality with zero, by using packed compare, packing to 4x1 bytes,
or-ing and testing. Do this for 8 bytes at address x and x+1, until
there are any two consecutive zero bytes found, then fine-check with c-code.
It is worth only for large data chunks with rather rare startcodes, but
this is mostly the case. Every byte of a h.264 stream must be piped
through the "emulation prevention 3 byte" checker.
Gain is however maybe too small to do it, at 20mbit with h264 that would
be 2,38mb/sek to parse, so saving ca. 5 million cpu cycles - only a tiny
fragment of a 2ghz cpu. But everything counts...
anyway it may not be an important idea, but if anyone wants to try it
out, here is some test code that i have written and declare as free to
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4522 bytes
Desc: not available
More information about the ffmpeg-devel