[FFmpeg-devel] FASTDIV macro
Sun Nov 9 15:00:15 CET 2008
On Sunday 09 November 2008, M?ns Rullg?rd wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > On Saturday 08 November 2008, M?ns Rullg?rd wrote:
> >> libavutil/internal.h defines a macro, FASTDIV(), for fast 32/16-bit
> >> division my means of multiplying by a table value. If the
> >> architecture is not ARM or x86, which have asm versions, this macro is
> >> defined as a normal division if CONFIG_FASTDIV is not set. The odd
> >> thing is, nothing ever sets CONFIG_FASTDIV. Something is clearly not
> >> right here.
> > A right thing here would be a patch with a description like
> > "Enabling FASTDIV macro for architecture X improves performance of
> > FFmpeg on this use case by Y percents..."
> >> I see these alternatives to fix it:
> > I think you first need to provide some kind of convincing proof that
> > it is broken. This macro is definitely useful for ARM processors
> > without instruction for hardware division. In other cases I suspect
> > that something like what is done by FASTDIV macro could be somehow
> > implemented in silicon itself (some cases of division could be
> > performed faster than the others). Even a benchark of FASTDIV
> > vs. native division for modern x86 cores would be interesting to
> > see.
> What are you talking about? I am not suggesting to change anything
> for ARM or x86.
I'm talking about the FASTDIV macro. Its primary use is to improve
performance. Because of that any decisions about what to do must be primarily
done based on the benchmarks, and not based on the theoretical discussion. You
proposed a number of options, but the critical information is missing:
performance impact of any of these options. You do have some x86 box, several
ARM devices and PS3 unless I'm missing something. So what's the problem with
providing some benchmark numbers as well?
> I'm talking about what to do with the impossible to
> enable C version using the table.
Of course it is possible just by patching a few lines of code ;)
Here is some very crude synthetic benchmarking program attached. Of course it
does not take into account possible cache misses on the table access and also
the fact that sometimes we may need to use expressions like
"b==1 ? a : FASTDIV(a, b)".
The results are the following:
--- Pentium-M, gcc 4.3.2 (-O2) ---
normaldiv(-1896828497) : time=2.195s
fastdiv_c(-1896828497) : time=0.564s
fastdiv_asm_x86(-1896828497) : time=0.416s
--- Core2 (64-bit), gcc 4.1.2 (-O2) ---
normaldiv(-1896828497) : time=0.681s
fastdiv_c(-1896828497) : time=0.183s
fastdiv_asm_x86(-1896828497) : time=0.222s
--- ARM11, gcc 4.3.1 (-O2) ---
normaldiv(-1896828497) : time=43.910s
fastdiv_c(-1896828497) : time=5.480s
fastdiv_asm_armv4(-1896828497) : time=5.049s
fastdiv_asm_armv6(-1896828497) : time=4.629s
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 6690 bytes
Desc: not available
More information about the ffmpeg-devel