[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization
Laurent Desnogues
laurent.desnogues
Wed Jul 2 12:54:31 CEST 2008
On Wed, Jul 2, 2008 at 12:37 PM, M?ns Rullg?rd <mans at mansr.com> wrote:
>>> 0000001c <f2>:
>>> 1c: e92d4010 stmdb sp!, {r4, lr}
>>> 20: e2504001 subs r4, r0, #1 ; 0x1
>>> 24: 38bd8010 ldmccia sp!, {r4, pc}
>>> 28: e2444001 sub r4, r4, #1 ; 0x1
>>> 2c: ebfffffe bl 0 <q>
>>> 30: e3740001 cmn r4, #1 ; 0x1
>>> 34: 1afffffb bne 28 <q+0x28>
>>> 38: e8bd8010 ldmia sp!, {r4, pc}
>>>
>>> I'm curious, what is the output of your compiler?
>>
>> CSL 2007q3 and 2008q1 both generate this:
>>
>> 00000000 <f2>:
>> 0: e92d4070 push {r4, r5, r6, lr}
>> 4: e2505000 subs r5, r0, #0 ; 0x0
>> 8: 08bd8070 popeq {r4, r5, r6, pc}
>> c: e3a04000 mov r4, #0 ; 0x0
>> 10: e2844001 add r4, r4, #1 ; 0x1
>> 14: ebfffffe bl 0 <q>
>> 18: e1540005 cmp r4, r5
>> 1c: 1afffffb bne 10 <f2+0x10>
>> 20: e8bd8070 pop {r4, r5, r6, pc}
>>
>> 00000024 <f1>:
>> 24: e3500001 cmp r0, #1 ; 0x1
>> 28: e92d4070 push {r4, r5, r6, lr}
>> 2c: e1a05000 mov r5, r0
>> 30: 48bd8070 popmi {r4, r5, r6, pc}
>> 34: e3a04000 mov r4, #0 ; 0x0
>> 38: e2844001 add r4, r4, #1 ; 0x1
>> 3c: ebfffffe bl 0 <q>
>> 40: e1540005 cmp r4, r5
>> 44: 1afffffb bne 38 <q+0x38>
>> 48: e8bd8070 pop {r4, r5, r6, pc}
>
> That's exactly what I got too. It's curious that it saves r6, even
> though it is never used. Perhaps it does this to keep the stack
> 8-byte aligned.
Certainly since it's an EABI requirement. It's probably
faster to do it this way than to correct the SP with an
explicit sub instruction.
It should also be noted that conditionnaly executing
ld/st instructions is not a very good idea in general:
if your processor is heavily speculating, you might
stall the pipeline at that point or at the next ld/st.
In that case that should be OK since there is no
other ld/st close after the popmi, and the popmi has
been scheduled a few instructions after the cmp.
> Also curious is why r4 and r5 are used, rather than
> the callee-saved r1 and r2. What a waste of 4 bytes stack space.
That's what I thought too. I am wondering if it's a property
of gcc 4.2 itself or of the ARM back-end.
Laurent
More information about the ffmpeg-devel
mailing list