[FFmpeg-devel] Armada 370 problem causes ffmpeg segmentation fault
Simon Nash
simon at cjnash.com
Tue Jan 8 23:32:30 EET 2019
Hello,
This is my first message to ffmpeg-devel, so please excuse me if
anything here doesn't conform to the normal mailing list conventions.
I have encountered a problem with ffmpeg (a segmentation fault) that
occurs only when running ffmpeg on the Marvell Armada 370 processor.
I used gdb to debug the cause of the problem and found that it was
caused by a malfunction in the floating-point unit of the Armada 370.
This is not specific to a single machine but appears to affect all
Armada 370 chips.
The problem occurs in the following section of code in
libavfilter/af_afir.c (lines 441 to 447):
for (ch = 0; ch < ctx->inputs[1]->channels; ch++) {
float *time = (float *)s->in[1]->extended_data[!s->one2many * ch];
for (i = 0; i < s->nb_taps; i++)
power += time[i] * time[i];
}
s->gain = sqrtf(ch / power);
The generated assembler code for the inner loop includes the following:
0x0018a8d4 <activate+1660>: add.w r0, r6, #100 ; 0x64
0x0018a8d8 <activate+1664>: movs r3, #0
0x0018a8da <activate+1666>: vldr s15, [r0, #-100] ; 0xffffff9c
0x0018a8de <activate+1670>: adds r3, #8
0x0018a8e0 <activate+1672>: vldr s8, [r0, #-96] ; 0xffffffa0
0x0018a8e4 <activate+1676>: cmp r3, lr
0x0018a8e6 <activate+1678>: vldr s9, [r0, #-92] ; 0xffffffa4
0x0018a8ea <activate+1682>: pld [r0]
0x0018a8ee <activate+1686>: vldr s10, [r0, #-88] ; 0xffffffa8
0x0018a8f2 <activate+1690>: vmla.f32 s12, s15, s15
0x0018a8f6 <activate+1694>: vldr s11, [r0, #-84] ; 0xffffffac
0x0018a8fa <activate+1698>: vldr s13, [r0, #-80] ; 0xffffffb0
0x0018a8fe <activate+1702>: vldr s14, [r0, #-76] ; 0xffffffb4
0x0018a902 <activate+1706>: vldr s15, [r0, #-72] ; 0xffffffb8
0x0018a906 <activate+1710>: add.w r0, r0, #32
When the 32-bit floating-point multiply instruction
0x0018a8f2 <activate+1690>: vmla.f32 s12, s15, s15
at activate+1690 is executed, there is a segmentation fault. This
should never happen because the vmla.f32 instruction uses register
values only. This problem happens when the value of the operand in
s15 is small enough to cause an underflow when multiplied by itself
(7.04930082e-28 in this case). I have a console log showing the
state of the registers before and after single-stepping through the
vmla.f32 instruction and I would be pleased to make this available
in any convenient form.
In other cases (when ffmpeg was compiled with a lower optimization
level for debugging purposes), the vmla.f32 instruction completes
but corrupts the unrelated register r7, causing a segmentation fault
on a subsequent instruction. It seems that vmla.f32 produces
unpredictable results on the Armada 370 when underflow occurs.
I have investigated possible workarounds. The problem does not occur
if the 32-bit float value is converted to a 64-bit value and
multiplied using a vmla.f64 instruction producing a 64-bit result.
This is because there is no underflow from ther multiplication.
The following patch (diff file from git) does this. Changing to
64-bit multiplication would (I believe) avoid the possibility of
underflow for any 32-bit coefficient value, which would mean that
the code works correctly on the Armada 370 as well as all other
processors.
index 31919f6..3e69107 100644
--- a/libavfilter/af_afir.c
+++ b/libavfilter/af_afir.c
@@ -375,6 +375,7 @@ static int convert_coeffs(AVFilterContext *ctx)
int left, offset = 0, part_size, max_part_size;
int ret, i, ch, n;
float power = 0;
+ double power_d = 0;
s->nb_taps = ff_inlink_queued_samples(ctx->inputs[1]);
if (s->nb_taps <= 0)
@@ -441,9 +442,12 @@ static int convert_coeffs(AVFilterContext *ctx)
for (ch = 0; ch < ctx->inputs[1]->channels; ch++) {
float *time = (float *)s->in[1]->extended_data[!s->one2many * ch];
- for (i = 0; i < s->nb_taps; i++)
- power += time[i] * time[i];
+ for (i = 0; i < s->nb_taps; i++) {
+ double coeff_d = (double)time[i];
+ power_d += coeff_d * coeff_d; // ensure no underflow
+ }
}
+ power = (float)power_d;
s->gain = sqrtf(ch / power);
break;
default:
This produces the following assembler code, with vmla.f32 replaced with
vmla.f64:
0x0018a8d4 <activate+1660>: add.w r0, r6, #100 ; 0x64
0x0018a8d8 <activate+1664>: movs r3, #0
0x0018a8da <activate+1666>: vldr s12, [r0, #-100] ; 0xffffff9c
0x0018a8de <activate+1670>: adds r3, #8
0x0018a8e0 <activate+1672>: vldr s0, [r0, #-96] ; 0xffffffa0
0x0018a8e4 <activate+1676>: cmp r3, lr
0x0018a8e6 <activate+1678>: vldr s2, [r0, #-92] ; 0xffffffa4
0x0018a8ea <activate+1682>: pld [r0]
0x0018a8ee <activate+1686>: vldr s4, [r0, #-88] ; 0xffffffa8
0x0018a8f2 <activate+1690>: vcvt.f64.f32 d6, s12
0x0018a8f6 <activate+1694>: vcvt.f64.f32 d0, s0
0x0018a8fa <activate+1698>: vcvt.f64.f32 d1, s2
0x0018a8fe <activate+1702>: vcvt.f64.f32 d2, s4
0x0018a902 <activate+1706>: vmla.f64 d7, d6, d6
0x0018a906 <activate+1710>: vldr s6, [r0, #-84] ; 0xffffffac
0x0018a90a <activate+1714>: vldr s8, [r0, #-80] ; 0xffffffb0
0x0018a90e <activate+1718>: vldr s10, [r0, #-76] ; 0xffffffb4
0x0018a912 <activate+1722>: vldr s12, [r0, #-72] ; 0xffffffb8
0x0018a916 <activate+1726>: add.w r0, r0, #32
I have tested this and it works on the Armada 370 as well as on
other ARMv7 platforms.
The change to using 64-bit multiplication is unlikely to be a
performance issue because this code runs only during activation.
However, if there are concerns about using 64-bit multiplication
on processors that don't have the bug, I can modify the patch to
check whether the processor is the Armada 370 and use the current
32-bit multiplication code when running on other processors.
I can provide a test case if required. This will reproduce the bug
reliably if ffmpeg is running on an Armada 370 machine.
I realize that this is not a bug in FFmpeg but I would like to ask
the FFmpeg developers to apply this patch (or equivalent) so that
ffmpeg is able to run correctly on the Armada 370. My application
(MinimServer) is using ffmpeg for various audio transformations
including applying FIR filters and has users with devices that use
the Armada 370 as well as users on many other platforms.
Please advise me what I should do now to help make this change
happen. Many thanks.
Best regards,
Simon
More information about the ffmpeg-devel
mailing list