[FFmpeg-devel] [PATCH] VP8 decoder

Ronald S. Bultje rsbultje
Wed Jun 16 19:17:02 CEST 2010


Hi,

On Tue, Jun 15, 2010 at 10:22 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Jun 15, 2010 at 09:12:30AM -0400, David Conrad wrote:
>> +static const uint8_t subpel_filters[7][6] = {
>> + ? ?{ 0, ? 6, 123, ?12, ? 1, ? 0 },
>> + ? ?{ 2, ?11, 108, ?36, ? 8, ? 1 },
>> + ? ?{ 0, ? 9, ?93, ?50, ? 6, ? 0 },
>> + ? ?{ 3, ?16, ?77, ?77, ?16, ? 3 },
>> + ? ?{ 0, ? 6, ?50, ?93, ? 9, ? 0 },
>> + ? ?{ 1, ? 8, ?36, 108, ?11, ? 2 },
>> + ? ?{ 0, ? 1, ?12, 123, ? 6, ? 0 },
>> +};
>> +
>> +
>> +#define FILTER_6TAP(src, F, stride) \
>> + ? ?av_clip_uint8((F[2]*src[x+0*stride] - F[1]*src[x-1*stride] + F[0]*src[x-2*stride] + \
>> + ? ? ? ? ? ? ? ? ? F[3]*src[x+1*stride] - F[4]*src[x+2*stride] + F[5]*src[x+3*stride] + 64) >> 7)
>> +
>> +#define VP8_EPEL(SIZE) \
>> +static void put_vp8_epel ## SIZE ## _h_c(uint8_t *dst, uint8_t *src, int stride, int h, int mx, int my) \
>> +{ \
>> + ? ?const uint8_t *filter = subpel_filters[mx-1]; \
>> + ? ?int x, y; \
>> +\
>> + ? ?for (y = 0; y < h; y++) { \
>> + ? ? ? ?for (x = 0; x < SIZE; x++) \
>> + ? ? ? ? ? ?dst[x] = FILTER_6TAP(src, filter, 1); \
>> + ? ? ? ?dst += stride; \
>> + ? ? ? ?src += stride; \
>> + ? ?} \
>> +} \
>> +\
>
> maybe it makes sense to write seperate functions for thr 4tap cases instead of 0*

See attached. It's bit-identical.

Before (START/STOP_TIMER around mc_func in vp8_mc()), sample
Elephants_Dream-360p-Stereo.webm:
50580 dezicycles in count, 4 runs, 0 skips
26520 dezicycles in count, 8 runs, 0 skips
14430 dezicycles in count, 16 runs, 0 skips
8268 dezicycles in count, 32 runs, 0 skips
5156 dezicycles in count, 64 runs, 0 skips
3573 dezicycles in count, 128 runs, 0 skips
2904 dezicycles in count, 256 runs, 0 skips
2630 dezicycles in count, 512 runs, 0 skips
2505 dezicycles in count, 1024 runs, 0 skips
2437 dezicycles in count, 2048 runs, 0 skips
2446 dezicycles in count, 4095 runs, 1 skips
2628 dezicycles in count, 8191 runs, 1 skips
2710 dezicycles in count, 16381 runs, 3 skips
2788 dezicycles in count, 32761 runs, 7 skips
3452 dezicycles in count, 60376 runs, 5160 skips
7742 dezicycles in count, 111261 runs, 19811 skips
17758 dezicycles in count, 234934 runs, 27210 skips
23312 dezicycles in count, 496768 runs, 27520 skips
26031 dezicycles in count, 1020481 runs, 28095 skips
25711 dezicycles in count, 2068103 runs, 29049 skips

After:
48690 dezicycles in count, 4 runs, 0 skips
25560 dezicycles in count, 8 runs, 0 skips
13935 dezicycles in count, 16 runs, 0 skips
8025 dezicycles in count, 32 runs, 0 skips
5038 dezicycles in count, 64 runs, 0 skips
3531 dezicycles in count, 128 runs, 0 skips
2906 dezicycles in count, 256 runs, 0 skips
2629 dezicycles in count, 512 runs, 0 skips
2487 dezicycles in count, 1024 runs, 0 skips
2403 dezicycles in count, 2048 runs, 0 skips
2371 dezicycles in count, 4096 runs, 0 skips
2490 dezicycles in count, 8192 runs, 0 skips
2659 dezicycles in count, 16384 runs, 0 skips
2793 dezicycles in count, 32764 runs, 4 skips
3513 dezicycles in count, 60844 runs, 4692 skips
7247 dezicycles in count, 113071 runs, 18001 skips
14665 dezicycles in count, 231779 runs, 30365 skips
20881 dezicycles in count, 493536 runs, 30752 skips
23749 dezicycles in count, 1017309 runs, 31267 skips
23970 dezicycles in count, 2064886 runs, 32266 skips

In other words, around 10% faster. David, OK to apply to your git tree?

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4tapfuncs.diff
Type: application/octet-stream
Size: 6655 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100616/ff9858b6/attachment.obj>



More information about the ffmpeg-devel mailing list