[FFmpeg-devel] FASTDIV macro
Sun Nov 9 15:41:37 CET 2008
On Sun, Nov 09, 2008 at 04:00:15PM +0200, Siarhei Siamashka wrote:
> On Sunday 09 November 2008, M?ns Rullg?rd wrote:
> > Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > > On Saturday 08 November 2008, M?ns Rullg?rd wrote:
> > >> libavutil/internal.h defines a macro, FASTDIV(), for fast 32/16-bit
> > >> division my means of multiplying by a table value. If the
> > >> architecture is not ARM or x86, which have asm versions, this macro is
> > >> defined as a normal division if CONFIG_FASTDIV is not set. The odd
> > >> thing is, nothing ever sets CONFIG_FASTDIV. Something is clearly not
> > >> right here.
> > >
> > > A right thing here would be a patch with a description like
> > > "Enabling FASTDIV macro for architecture X improves performance of
> > > FFmpeg on this use case by Y percents..."
> > >
> > >> I see these alternatives to fix it:
> > >
> > > I think you first need to provide some kind of convincing proof that
> > > it is broken. This macro is definitely useful for ARM processors
> > > without instruction for hardware division. In other cases I suspect
> > > that something like what is done by FASTDIV macro could be somehow
> > > implemented in silicon itself (some cases of division could be
> > > performed faster than the others). Even a benchark of FASTDIV
> > > vs. native division for modern x86 cores would be interesting to
> > > see.
> > What are you talking about? I am not suggesting to change anything
> > for ARM or x86.
> I'm talking about the FASTDIV macro. Its primary use is to improve
> performance. Because of that any decisions about what to do must be primarily
> done based on the benchmarks, and not based on the theoretical discussion. You
> proposed a number of options, but the critical information is missing:
> performance impact of any of these options. You do have some x86 box, several
> ARM devices and PS3 unless I'm missing something. So what's the problem with
> providing some benchmark numbers as well?
> > I'm talking about what to do with the impossible to
> > enable C version using the table.
> Of course it is possible just by patching a few lines of code ;)
> Here is some very crude synthetic benchmarking program attached. Of course it
> does not take into account possible cache misses on the table access and also
> the fact that sometimes we may need to use expressions like
> "b==1 ? a : FASTDIV(a, b)".
the b==1 special case can be avoided if you replace the >>32 by something out
of 24..31 and adjust the table accordingly, in that respect its also possible
to finetune the values so they have fewer bits set if this would be faster,
and i think it is for some ARM ...
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel