[FFmpeg-devel] [PATCH] use AV_RB16 in cabac refill
Alexander Strange
astrange
Sat Mar 27 01:38:03 CET 2010
2010/3/25 M?ns Rullg?rd <mans at mansr.com>:
> Alexander Strange <astrange at ithinksw.com> writes:
>
>> On Mar 25, 2010, at 4:08 AM, David Conrad wrote:
>>
>>> On Mar 25, 2010, at 3:30 AM, Alexander Strange wrote:
>>>
>>>> Measured 1 cycle faster decode_cabac_residual on x86-64. Didn't try anywhere else, but I'd be a little interested in what arm does.
>>>
>>> It ought to be 2 instruction less and faster. However, both llvm and gcc decide to zero extend from 16 bits twice, and (llvm-)gcc-4.2 decides to load bytestream twice.
>>
>> Hmm, zero-extending in bswap_16 isn't really surprising, since asm
>> operands are always extended to int.
>
> That depends on how the asm is written.
>
>> The only solution there is to write AV_RB16 in asm too.
>>
>> --disable-asm is remarkably bad, I think it should be using
>> (p[0] << 8 | p[1]) instead of __attribute__((packed)) and bswap_16
>> when FAST_UNALIGNED isn't defined.
>
> I don't quite understand that.
If I configure for arm with --disable-asm (using iPhone gcc), this:
#include "libavutil/intreadwrite.h"
int test(uint16_t *p);
int test(uint16_t *p)
{
return AV_RB16(p);
}
turns into this:
int test(uint16_t *p)
{
return bswap_16((((const union unaligned_16 *) (p))->l));
}
which compiles to:
_test:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldrb r3, [r0, #0] @ zero_extendqisi2
ldrb r0, [r0, #1] @ zero_extendqisi2
@ lr needed for prologue
orr r3, r3, r0, asl #8
mov r0, r3, lsr #8
orr r0, r0, r3, asl #8
uxth r0, r0
bx lr
because it (apparently) always uses byte loads for packed structures.
I don't know if anyone cares how well --disable-asm compiles, I was
just curious.
More information about the ffmpeg-devel
mailing list