[FFmpeg-devel] [PATCH] FLAC parser

Michael Chinen mchinen
Thu Oct 21 18:39:45 CEST 2010


On Thu, Oct 21, 2010 at 1:05 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Oct 20, 2010 at 03:38:14PM +0200, Michael Chinen wrote:
>> Hi,
>>
>> On Tue, Oct 19, 2010 at 2:48 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >[...]
>> >> I did profiling again and it turns out I missed one exit point for the
>> >> function the last time. ?The non-flat wrap buffer version is about
>> >> 2-4% faster overall. ?I've squashed it into the 0003.
>> >
>> >what is the speed difference between current svn and after this patch ?
>>
>> I used the -benchmark flag for 'ffmpeg -i fourminsong.flac a.wav' and
>> five runs and got
>> without patch: utime = 2.044-2.042s
>> with patch: ? ?utime = 2.363-2.379s
>>
>> So flac demuxing with the parser is slower.
>
> its not a problem when the parser is needed, like for -acodec copy but when
> it is not needed then a 15% slowdown is a problem. That said it of course
> would be nicer if it was faster than that even when needed
>
>
>
> [...]
>> >
>> >
>> > [...]
>> >> +static int find_headers_search(FLACParseContext *fpc, uint8_t *buf, int buf_size,
>> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int search_start)
>> >> +
>> >> +{
>> >> + ? ?FLACFrameInfo fi;
>> >> + ? ?int size = 0, i;
>> >> + ? ?uint8_t *header_buf;
>> >> +
>> >> + ? ?for (i = 0; i < buf_size - 1; i++) {
>> >> + ? ? ? ?if ((AV_RB16(buf + i) & 0xFFFE) == 0xFFF8) {
>> >
>> > something based on testing several positions at once is likely faster
>> > like
>> > x= AV_RB32()
>> > (x & ~(x+0x01010101))&0x80808080
>> > that will detect 0xFF bytes and only after that testing the 4 positions for
>> > FFF8
>>
>> Hmm. ?Since in both cases (header there/header not there) this will
>> require more masks on a 2 byte int how will it be faster?
>> Also since it is 15 bits that we are looking for is the 32 bit
>> handling a mistake?
>
> the code is executed 4 times less often than your 2 byte masking
> see ff_avc_find_startcode_internal() for something quite similar

Thanks I now see the light - I didn't see at first you meant to
process in 4 byte chunks.
It is about 2-3x faster with the multiple byte processing:
fastest without multibyte processing:
357748 dezicycles in with more pos testing, 16337 runs, 47 skips
slowest with:
127551 dezicycles in with more pos testing, 15364 runs, 1020 skips

(of course there are more skips so it is harder to profile)

The -benchmark utime dropped down to a range of
utime = 2.232-2.236
vs the prepatch:
utime = 2.049-2.058

so now it is a slowdown of about 10%.

I added this to the patch and retested regression tests.
Let me know if something needs to be done for changelog as well.

Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-move-decode_frame_header-from-flacdec.c-to-flac.c-h.patch
Type: application/octet-stream
Size: 8119 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101021/29def2fc/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-error-codes-for-FLAC-header-parsing-and-move-log.patch
Type: application/octet-stream
Size: 7369 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101021/29def2fc/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Add-FLAC-Parser.patch
Type: application/octet-stream
Size: 37467 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101021/29def2fc/attachment-0002.obj>



More information about the ffmpeg-devel mailing list