[FFmpeg-devel] [PATCH] ARM: remove useless stack push/pop
Måns Rullgård
mans
Wed Jun 9 01:43:54 CEST 2010
Rafa?l Carr? <rafael.carre at gmail.com> writes:
> Hi,
>
> r12 doesn't need to be saved in called functions because it's a scratch
> register.
>
> While I'm here, did anyone try to build FFmpeg with -mthumb yet ?
Yes, gcc generated invalid asm. For that reason, and others, we force
-marm. There is no gain from using thumb with ffmpeg.
> "grep -Er '(pop|ldm).*pc' libavcodec/arm" shows that there is a lot of
> functions which can't be called from thumb on armv4t : using ldm ...,pc
> will not perform the switch from arm to thumb on these CPU.
So use interworking if you need to. Any decent linker support that.
> If you want to support both thumb code and armv4t this needs changing
> to use 1 more instruction (without speed cost on anything but arm7tdmi
> where it would take 1 more cycle to return).
Thumb doesn't work anyway, so there's no point.
See also a blog post I did some time ago on the topic. Perhaps I
should revisit that.
BTW, many, if not most, Cortex-A8 chips in the field have hardware
bugs rendering any mixing of Thumb and ARM code unreliable. Older
cores work, but most of those are pre-Thumb2 and the speed penalty
there is too great for FFmpeg.
> diff --git a/libavcodec/arm/jrevdct_arm.S b/libavcodec/arm/jrevdct_arm.S
> index 4fcf351..4ce37d0 100644
> --- a/libavcodec/arm/jrevdct_arm.S
> +++ b/libavcodec/arm/jrevdct_arm.S
> @@ -58,7 +58,7 @@
> .align
>
> function ff_j_rev_dct_arm, export=1
> - stmdb sp!, { r4 - r12, lr } @ all callee saved regs
> + stmdb sp!, { r4 - r11, lr } @ all callee saved regs
>
> sub sp, sp, #4 @ reserve some space on the stack
> str r0, [ sp ] @ save the DCT pointer to the stack
> @@ -369,7 +369,7 @@ empty_odd_column:
> the_end:
> @ The end....
> add sp, sp, #4
> - ldmia sp!, { r4 - r12, pc } @ restore callee saved regs and return
> + ldmia sp!, { r4 - r11, pc } @ restore callee saved regs and return
Does this function call any other functions? If so, the stack must
maintain 8-byte alignment, and this is the easiest way to accomplish
that. Not that you'd want to use that DCT implementation anyway.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list