[FFmpeg-devel] [PATCH] use AV_RB16 in cabac refill

Sat Mar 27 01:38:03 CET 2010

2010/3/25 M?ns Rullg?rd <mans at mansr.com>:
> Alexander Strange <astrange at ithinksw.com> writes:
>
>> On Mar 25, 2010, at 4:08 AM, David Conrad wrote:
>>
>>> On Mar 25, 2010, at 3:30 AM, Alexander Strange wrote:
>>>
>>>> Measured 1 cycle faster decode_cabac_residual on x86-64. Didn't try anywhere else, but I'd be a little interested in what arm does.
>>>
>>> It ought to be 2 instruction less and faster. However, both llvm and gcc decide to zero extend from 16 bits twice, and (llvm-)gcc-4.2 decides to load bytestream twice.
>>
>> Hmm, zero-extending in bswap_16 isn't really surprising, since asm
>> operands are always extended to int.
>
> That depends on how the asm is written.
>
>> The only solution there is to write AV_RB16 in asm too.
>>
>> --disable-asm is remarkably bad, I think it should be using
>> (p[0] << 8 | p[1]) instead of __attribute__((packed)) and bswap_16
>> when FAST_UNALIGNED isn't defined.
>
> I don't quite understand that.

If I configure for arm with --disable-asm (using iPhone gcc), this:

#include "libavutil/intreadwrite.h"

int test(uint16_t *p);

int test(uint16_t *p)
{
    return AV_RB16(p);
}

turns into this:

int test(uint16_t *p)
{
    return bswap_16((((const union unaligned_16 *) (p))->l));
}

which compiles to:

_test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldrb    r3, [r0, #0]    @ zero_extendqisi2
        ldrb    r0, [r0, #1]    @ zero_extendqisi2
        @ lr needed for prologue
        orr     r3, r3, r0, asl #8
        mov     r0, r3, lsr #8
        orr     r0, r0, r3, asl #8
        uxth    r0, r0
        bx      lr

because it (apparently) always uses byte loads for packed structures.
I don't know if anyone cares how well --disable-asm compiles, I was
just curious.