[FFmpeg-devel] [PATCH] WMA Voice decoder
Fri Jan 22 17:56:26 CET 2010
"Ronald S. Bultje" <rsbultje at gmail.com> writes:
> 2010/1/22 M?ns Rullg?rd <mans at mansr.com>:
>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>> On Fri, Jan 22, 2010 at 11:15 AM, Uoti Urpala <uoti.urpala at pp1.inet.fi> wrote:
>>>> x*49995 / 41 = x*(1219*41 + 16) / 41 = 1219*x + x * 16 / 41
>>>> In the last form x*16 is at most 1048544.
>>> In fastdiv form, this'd be 3 muls, an add and a shift. Is that still
>>> faster than 1 mul + 1 div?
>> Don't forget the table lookup.
>> Multiplication takes typically 3-5 cycles. ?Division takes 15-40
>> cycles if the CPU has a hardware divider, 50-100 cycles if not. ?The
>> fastdiv should be faster, even when including the table lookup. ?If
>> this is called at all frequently, the table should be in L2 cache,
>> which costs typically 10-20 cycles, still faster than a division.
> If you read 3 numbers (you need the "41" in fastdiv-form, the 1219 and
> the 16), does it make a difference? Or because you'd put them in the
> same table  (3 numbers for 9 possible values of y), that's not
> an issue?
Plain division is almost always slower than any simple way to avoid
it. If in doubt, benchmark.
mans at mansr.com
More information about the ffmpeg-devel