[FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack.
Reimar Döffinger
Reimar.Doeffinger at gmx.de
Sun Oct 9 19:36:24 EEST 2022
> On 9 Oct 2022, at 16:11, Rémi Denis-Courmont <remi at remlab.net> wrote:
>
> Le sunnuntaina 9. lokakuuta 2022, 16.14.47 EEST Reimar Döffinger a écrit :
>> Use load/store instructions that modify sp to save
>> registers to stack, like it is done for all other
>> functions.
>> At least valgrind complains about the current code.
>> ---
>> libavutil/aarch64/tx_float_neon.S | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/libavutil/aarch64/tx_float_neon.S
>> b/libavutil/aarch64/tx_float_neon.S index 4126c3b812..4be93cc963 100644
>> --- a/libavutil/aarch64/tx_float_neon.S
>> +++ b/libavutil/aarch64/tx_float_neon.S
>> @@ -866,10 +866,10 @@ FFT16_FN ns_float, 1
>>
>> .macro FFT32_FN name, no_perm
>> function ff_tx_fft32_\name\()_neon, export=1
>> - stp d8, d9, [sp, #-16]
>> - stp d10, d11, [sp, #-32]
>> - stp d12, d13, [sp, #-48]
>> - stp d14, d15, [sp, #-64]
>> + stp d8, d9, [sp, #-16]!
>> + stp d10, d11, [sp, #-16]!
>> + stp d12, d13, [sp, #-16]!
>> + stp d14, d15, [sp, #-16]!
>
> While this fixes the ABI violation, it introduces multiple data dependencies on
> stack pointer due to write-back.
That is true in principle, this is not done consistently at all.
> The idiomatic way to do this is to allocate the entire needed stack space in
> the first store / last load, and use positive offsets elsewhence.
Are you sure this is really relevant at all, considering it's so rarely done in the code?
I'm fine looking into doing that consistently if that's what it takes, but I don't think it's good to make this the only function in the file (if not the whole lib) that does it this way.
If we want the idiomatic way to be used, we should
update all the code so people get the right idea.
(I'll note that some places might be tricky, there seems to be code that updates sp back and forth in a loop which looks like it might be for no reason, or there is a weird reason - check SR_TRANSFORM_DEF in the same file).
Just to clarify the details, then end result should look like this?
- stp d8, d9, [sp, #-16]
- stp d10, d11, [sp, #-32]
- stp d12, d13, [sp, #-48]
- stp d14, d15, [sp, #-64]
+ stp d14, d15, [sp, #-16*4]!
+ stp d8, d9, [sp, #16*3]
+ stp d10, d11, [sp, #16*2]
+ stp d12, d13, [sp, #16]
Or something slightly different?
Best regards,
Reimar
More information about the ffmpeg-devel
mailing list