[FFmpeg-devel] Idea about speedup of startcode search

Thorsten Jordan tjordan
Fri Feb 8 15:51:31 CET 2008


Michael Niedermayer schrieb:
> On Fri, Feb 08, 2008 at 02:35:11PM +0100, Thorsten Jordan wrote:
>> Hello,
>>
> If gcc compiles it to 7 instrucions per scanned byte that is a bug in gcc
> which should be reported!
> As it can easily do it with 3 instructions (or less if unrolled further),
> that is:
> 
> xor %%eax, %%eax
> 1:
> cmpb %%al,  (%%ebx, %%ecx)
>  jz blah
> cmpb %%al, 2(%%ebx, %%ecx)
>  jz blah2
> add $2, %%ebx
>  jnc 1
it could, but fails to do so...

>> 			     "packsswb %%mm0, %%mm0	\n\t"
>> 			     "packsswb %%mm1, %%mm1	\n\t"
>> 			     "por %%mm1, %%mm0		\n\t"
> 
> movq (%0), %%mm0
> por  1(%0), %%mm0
> pcmpeqb %%mm2, %%mm0
> packsswb %%mm0, %%mm0
wow, that would be even less cycles per scanned byte... good idea

hmm i guess with ff_avc_find_startcode in contrast my idea isn't needed
any more. No problem for me though.

Thanks for the suggestions, to Loren too!

-- 
Regards, Thorsten




More information about the ffmpeg-devel mailing list