[FFmpeg-devel] FASTDIV macro

Sun Nov 9 15:48:49 CET 2008

Michael Niedermayer <michaelni at gmx.at> writes:

> On Sun, Nov 09, 2008 at 04:00:15PM +0200, Siarhei Siamashka wrote:
>> On Sunday 09 November 2008, M?ns Rullg?rd wrote:
>> > Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> > > On Saturday 08 November 2008, M?ns Rullg?rd wrote:
>> > >> libavutil/internal.h defines a macro, FASTDIV(), for fast 32/16-bit
>> > >> division my means of multiplying by a table value.  If the
>> > >> architecture is not ARM or x86, which have asm versions, this macro is
>> > >> defined as a normal division if CONFIG_FASTDIV is not set.  The odd
>> > >> thing is, nothing ever sets CONFIG_FASTDIV.  Something is clearly not
>> > >> right here.
>> > >
>> > > A right thing here would be a patch with a description like
>> > > "Enabling FASTDIV macro for architecture X improves performance of
>> > > FFmpeg on this use case by Y percents..."
>> > >
>> > >> I see these alternatives to fix it:
>> > >
>> > > I think you first need to provide some kind of convincing proof that
>> > > it is broken. This macro is definitely useful for ARM processors
>> > > without instruction for hardware division. In other cases I suspect
>> > > that something like what is done by FASTDIV macro could be somehow
>> > > implemented in silicon itself (some cases of division could be
>> > > performed faster than the others). Even a benchark of FASTDIV
>> > > vs. native division for modern x86 cores would be interesting to
>> > > see.
>> >
>> > What are you talking about?  I am not suggesting to change anything
>> > for ARM or x86.
>> 
>> I'm talking about the FASTDIV macro. Its primary use is to improve
>> performance. Because of that any decisions about what to do must be
>> primarily done based on the benchmarks, and not based on the
>> theoretical discussion. You proposed a number of options, but the
>> critical information is missing: performance impact of any of these
>> options. You do have some x86 box, several ARM devices and PS3
>> unless I'm missing something. So what's the problem with providing
>> some benchmark numbers as well?
>> 
>> > I'm talking about what to do with the impossible to 
>> > enable C version using the table.
>> 
>> Of course it is possible just by patching a few lines of code ;)
>> 
>> 
>> Here is some very crude synthetic benchmarking program attached. Of
>> course it does not take into account possible cache misses on the
>> table access and also the fact that sometimes we may need to use
>> expressions like "b==1 ? a : FASTDIV(a, b)".
>
> the b==1 special case can be avoided if you replace the >>32 by
> something out of 24..31 and adjust the table accordingly, in that
> respect its also possible to finetune the values so they have fewer
> bits set if this would be faster, and i think it is for some ARM ...

For the record, there are two places where FASTDIV() is used
conditionally:

libavcodec/flacenc.c:    k = av_log2(n<256 ? FASTDIV(sum2,n) : sum2/n);

libavcodec/vorbis_dec.c-  uint_fast16_t step= dim==1 ? vr->partition_size
libavcodec/vorbis_dec.c:                    : FASTDIV(vr->partition_size, dim);

-- 
M?ns Rullg?rd
mans at mansr.com