[FFmpeg-devel] Idea about speedup of startcode search
Fri Feb 8 15:51:31 CET 2008
Michael Niedermayer schrieb:
> On Fri, Feb 08, 2008 at 02:35:11PM +0100, Thorsten Jordan wrote:
> If gcc compiles it to 7 instrucions per scanned byte that is a bug in gcc
> which should be reported!
> As it can easily do it with 3 instructions (or less if unrolled further),
> that is:
> xor %%eax, %%eax
> cmpb %%al, (%%ebx, %%ecx)
> jz blah
> cmpb %%al, 2(%%ebx, %%ecx)
> jz blah2
> add $2, %%ebx
> jnc 1
it could, but fails to do so...
>> "packsswb %%mm0, %%mm0 \n\t"
>> "packsswb %%mm1, %%mm1 \n\t"
>> "por %%mm1, %%mm0 \n\t"
> movq (%0), %%mm0
> por 1(%0), %%mm0
> pcmpeqb %%mm2, %%mm0
> packsswb %%mm0, %%mm0
wow, that would be even less cycles per scanned byte... good idea
hmm i guess with ff_avc_find_startcode in contrast my idea isn't needed
any more. No problem for me though.
Thanks for the suggestions, to Loren too!
More information about the ffmpeg-devel