[FFmpeg-devel] [PATCH] Heavy optimization of IFF decoder
Måns Rullgård
mans
Tue Apr 27 20:57:23 CEST 2010
Sebastian Vater <cdgs.basty at googlemail.com> writes:
> Ronald S. Bultje a ?crit :
>> Hi,
>>
>> 2010/4/27 M?ns Rullg?rd <mans at mansr.com>:
>>
>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>>
>>>> On Mon, Apr 26, 2010 at 7:39 PM, Sebastian Vater
>>>> <cdgs.basty at googlemail.com> wrote:
>>>>
>>>>> + const uint32_t lut[] = {0x0000000,
>>>>> + 0x1000000 << plane,
>>>>> + 0x0010000 << plane,
>>>>> + 0x1010000 << plane,
>>>>> + 0x0000100 << plane,
>>>>> + 0x1000100 << plane,
>>>>> + 0x0010100 << plane,
>>>>> + 0x1010100 << plane,
>>>>> + 0x0000001 << plane,
>>>>> + 0x1000001 << plane,
>>>>> + 0x0010001 << plane,
>>>>> + 0x1010001 << plane,
>>>>> + 0x0000101 << plane,
>>>>> + 0x1000101 << plane,
>>>>> + 0x0010101 << plane,
>>>>> + 0x1010101 << plane};
>>>>>
>>>> I really can't imagine that a static const lut[][] isn't faster. which
>>>> file did you use to test this? (Is it on mphq/samples?)
>>>>
>>> A static table whose values are shifted in the loop is 7% faster on ARM.
>>>
>>
>> It's a little slower on x86 (~12%). However, a (static) 2D array is
>> faster (3%) over the original patch. Mans just said that's fine on ARM
>> as well, so you should probably implement that (don't forget that
>> plane is const, so do a const *lut = table[plane] before entering the
>> loop, else gcc messes up).
>>
>
> Please clarify on this, what kind of static 2D array should I use now?
> Just apply this one? Well, this one is 20% slower (increases from 6.2k
> to 8.9k) for me than my original.
>
> --
>
> Best regards,
> :-) Basty/CDGS (-:
>
> diff --git a/libavcodec/iff.c b/libavcodec/iff.c
> index be13f4e..ca8a790 100644
> --- a/libavcodec/iff.c
> +++ b/libavcodec/iff.c
> @@ -36,6 +36,33 @@ typedef struct {
> uint8_t * planebuf;
> } IffContext;
>
> +#define DECODEPLANE8(plane) {0x0000000, \
> + 0x1000000 << plane, \
> + 0x0010000 << plane, \
> + 0x1010000 << plane, \
> + 0x0000100 << plane, \
> + 0x1000100 << plane, \
> + 0x0010100 << plane, \
> + 0x1010100 << plane, \
> + 0x0000001 << plane, \
> + 0x1000001 << plane, \
> + 0x0010001 << plane, \
> + 0x1010001 << plane, \
> + 0x0000101 << plane, \
> + 0x1000101 << plane, \
> + 0x0010101 << plane, \
> + 0x1010101 << plane} \
> +
> +// 8 planes * 4-bit mask
> +static const uint32_t decodeplane8_tab[8][16] = {DECODEPLANE8(0),
> + DECODEPLANE8(1),
> + DECODEPLANE8(2),
> + DECODEPLANE8(3),
> + DECODEPLANE8(4),
> + DECODEPLANE8(5),
> + DECODEPLANE8(6),
> + DECODEPLANE8(7)};
> +
Please use a bit more sane formatting, e.g. something like this:
+#define LUT8(plane) { \
+ 0x0000000 << plane, \
+ 0x1000000 << plane, \
+ 0x0010000 << plane, \
+ 0x1010000 << plane, \
+ 0x0000100 << plane, \
+ 0x1000100 << plane, \
+ 0x0010100 << plane, \
+ 0x1010100 << plane, \
+ 0x0000001 << plane, \
+ 0x1000001 << plane, \
+ 0x0010001 << plane, \
+ 0x1010001 << plane, \
+ 0x0000101 << plane, \
+ 0x1000101 << plane, \
+ 0x0010101 << plane, \
+ 0x1010101 << plane, \
+ }
+
+static const uint32_t plane8_lut[8][16] = {
+ LUT8(0), LUT8(1), LUT8(2), LUT8(3),
+ LUT8(4), LUT8(5), LUT8(6), LUT8(7),
+};
> /**
> * Convert CMAP buffer (stored in extradata) to lavc palette format
> */
> @@ -95,13 +122,22 @@ static av_cold int decode_init(AVCodecContext *avctx)
> * @param bps bits_per_coded_sample (must be <= 8)
> * @param plane plane number to decode as
> */
> -static void decodeplane8(uint8_t *dst, const uint8_t *const buf, int buf_size, int bps, int plane)
> +static inline void decodeplane8(uint8_t *dst,
> + const uint8_t *const buf,
> + const unsigned buf_size,
> + const unsigned bps,
> + const unsigned plane)
Const on simple variables is mostly pointless.
> {
> GetBitContext gb;
> unsigned int i;
> const unsigned b = (buf_size * 8) + bps - 1;
> + const unsigned b32 = b & ~3;
> init_get_bits(&gb, buf, buf_size * 8);
> - for(i = 0; i < b; i++) {
> + for(i = 0; i < b32; i += 4) {
> + const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
> + AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
> + }
I suggest using a local variable here, like this:
const uint32_t *lut = decodeplane8_tab[plane];
[...]
uint32_t v = lut[get_bits(...)];
I don't trust gcc to do that on its own.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list