[Ffmpeg-devel] A question about 'dct_unquantize_h263_intra'

Tue Jan 2 21:22:26 CET 2007

On Tuesday 02 January 2007 19:44, Michael Niedermayer wrote:

> > Is it safe to assume:
> > 1. Result of 'level = level * qmul - qadd' will never overflow signed
> > 16-bits?
>
> yes
>
> > 2. DCTELEM *block is always at least 8 bytes aligned?
>
> yes if not its a bug

Thanks for the information.

> > 3. Processing extra elements after block[nCoeffs] is safe (up to but not
> > including block[(nCoeffs + 7) / 8 * 8])?
>
> block[0 .. 63] is always safe
> nCoeffs <= 64

So overwriting values in this buffer with garbage after nCoeffs is ok and they
are not used later anyway? Or at least processing them in the same way as 
the previous elements is ok?

> > +static inline void dct_unquantize_h263_helper_c(DCTELEM *block, int
> > qmul, int qadd, int count) +{
> > +    int i, level;
> > +    for (i = 0; i < count; i++) {
> > +        level = block[i];
> > +        if (level) {
> > +            if (level < 0) {
> > +                level = level * qmul - qadd;
> > +            } else {
> > +                level = level * qmul + qadd;
> > +            }
> > +            block[i] = level;
> > +        }
> > +    }
> > +}
>
> this looks like a duplicate of dct_unquantize_h263_inter_c() ?

Yes, it is a subset of code that is used in dct_unquantize_h263_inter_c().
Anyway, the reason for have this part of code moved  to a separate function is
to easily do correctness and performance tests of assembly optimized code. It
is used as a reference implementation and dct_unquantize_h263_helper_armv5te() 
macro (armv5te optimized implementation) is directly compared to it. Both
these implementations are used from 'test-unquantize.c' test program:
https://garage.maemo.org/svn/mplayer/trunk/libavcodec/tests/

I generally don't like the idea of developing assembly code unless it is
(almost) completely  covered by regression tests. Also as the goal of assembly
optimizations is performance, tests which measure performance are also very
important. Now as I have this simple test program, I can do with the assembly
code whatever I like and most of the bugs are immediately discovered. Also I
can immediately see performance results of different variations of code and
ask other people to run benchmarks on other machines.

My plans also include adding some tests for idct later (in fact I have some of
the test code, but it needs to be cleaned up a bit). It also should verify
correctness and performance (also simulating real memory usage patterns -
accesses to 'DCTELEM *block' always hit cache, but 'uint8_t *dest' has a lot
of cache misses). The last part of tests will include motion compensation
code. After all this code is covered by tests, I will start optimizing it
quite aggresively. The same framework will be also useful if somebody would
like to optimize code for iwmmxt (xscale) or armv6 SIMD instructions.

So while this function is obviously a duplication of code, I don't feel like
removing it unless some better solution exists which would allow to keep
regression tests. What is the preferred method of testing assembly code in
ffmpeg now?

As the number of ARM users (let alone developers) is marginally low, having
good testing code is even more important. But I hope to have more developers
joining. One thing is to convince them that ffmpeg is actually still not '99%
optimized' for ARM as some people believe :)
http://www.oesf.org/forums/index.php?showtopic=22280&view=findpost&p=148412

Another thing to have is a good testing framework, so that the development can
be done easier and with less risk of breaking code.

Probably it is worth to experiment with correctness/performance testing code
in the maemo mplayer svn first and try to submit patches to ffmpeg a
bit later once some complete solution emerges?

> > +#define dct_unquantize_h263_helper_armv5te(__block, __qmul, __qadd,
> > __count) \
>
> things starting with __ are reserved in C please dont use such names

Thanks, I'll rename these identifiers, that's not a problem.

By the way, what is 's->h263_aic' in this code? If it is set, there should be
possible to have a faster shortcut version for 'qadd = 0' branch. What kind of
codec uses it and how can I encode video to get some test sample?