[FFmpeg-devel] [PATCH] avutil/md5: fix unaligned loads

James Almer jamrial at gmail.com
Wed Feb 24 17:51:05 CET 2016


On 2/24/2016 1:15 PM, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Feb 24, 2016 at 10:47 AM, James Almer <jamrial at gmail.com> wrote:
> 
>> On 2/24/2016 12:13 PM, Ronald S. Bultje wrote:
>>> Hi,
>>>
>>> On Tue, Feb 23, 2016 at 8:40 PM, James Almer <jamrial at gmail.com> wrote:
>>>
>>>> Tested on x86 and benched with no apparent speed loss
>>>
>>>
>>> That's because x86 supports unaligned loads.
>>>
>>> How come you get unaligned loads? Shouldn't this prevent it?
>>>
>>> -    if (HAVE_BIGENDIAN || (!HAVE_FAST_UNALIGNED && ((intptr_t)src &
>> 3))) {
>>> +    if (!HAVE_FAST_UNALIGNED && ((intptr_t)src & 3)) {
>>>         while (src < end) {
>>>             memcpy(ctx->block, src, 64);
>>>             body(ctx->ABCD, (uint32_t *) ctx->block, 1);
>>>
>>> Ronald
>>
>> That code is never compiled/executed on x86 because HAVE_FAST_UNALIGNED is
>> 1
>> there.
> 
> 
> So then I don't understand what ubsan is complaining about? Maybe
> HAVE_FAST_UNALIGNED should be disabled when running under ubsan?
> 
> Ronald

That codepath is slower because it memcpys to an aligned temp buffer.
The memcpy calls alongside the body() calls with length 1 are probably
slow, so using it is not a good idea.

Ubsan is complaining about unaligned loads done by body() when src is
passed as argument instead of the temp buffer from the AVMD5 struct.
See the codepath right below the one quoted above.
The errors in question are variations of "src/libavutil/md5.c:128:21391:
runtime error: load of misaligned address 0x000006b30b26 for type
'uint32_t', which requires 4 byte alignment".
Using t = AV_RL32(X) instead of t = X[0] where X is of type 'uint32_t *'
solves this, since the AV_RN macros are meant for unaligned loads.
Are they bogus errors? I don't really know.

By using AV_RL32 i also removed the need for big endian systems to use
the temp buffer codepath, since the AV_RN macros do the bswap as required.
It may result in a slight speed boost, so it would be nice if someone with
a ppc machine could test it.


More information about the ffmpeg-devel mailing list