[FFmpeg-devel] [PATCH] Fix non-rounding up to next 16-bit aligned bug in IFF decoder
Thu Apr 29 15:35:22 CEST 2010
Sebastian Vater <cdgs.basty at googlemail.com> writes:
> M?ns Rullg?rd a ?crit :
>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>>> Just got the idea, we can get rid of the GetBitContext
>>> completely...Instead of reading 4 bits, we simply read a byte:
>>> const uint8_t lut_offsets = *buf++; // instead of get_bits(gb,4);
>> That's a separate thing.
> Separate in what way? What did you mean exactly?
Separate from the LUT byte order.
>>> Then we do loop unrolling by 8 and do two accesses to lut one with >> 4
>>> and one with & 0x0F, or we get even rid of this and create a lut table
>>> with 256 entries using AV_WN64A / AV_RN64A ;-)
>>> The advance here is that on a 64 bit CPU we get another nice speed
>>> improvement ;-)
>>> If we avoid calculations with AV_RN64A etc.
>> Those macros don't do any calculations. All they do is some magic to
>> avoid type aliasing errors.
> Yes, I know, but I meant stuff like (lut0[...] << 32ULL) | lut1[...];
Why on earth would you do that?
> But this isn't necessary if we use an 8-bit table storing uint64_t's...
That would fall apart completely on 32-bit machines. I doubt any
speedup you might see on 64-bit is worth the added complexity of
doing it conditionally. Just leave it as 32-bit.
>>> gcc just should use 2 registers on 32-bit CPU and that's it.
>> Should, but doesn't.
> With the way I meant above, it should...I'll test that now, but without
> a completed table and tell you what it does.
Believe me, it doesn't. GCC is terrible with 64-bit data on 32-bit
machines. Do not tempt it.
mans at mansr.com
More information about the ffmpeg-devel