[FFmpeg-devel] SH4: optimization attempts
Måns Rullgård
mans
Thu Jan 27 16:03:31 CET 2011
"Ronald S. Bultje" <rsbultje at gmail.com> writes:
> Hi,
>
> On Thu, Jan 27, 2011 at 4:27 AM, Guennadi Liakhovetski
> <g.liakhovetski at gmx.de> wrote:
>> Some of the proposed patches are generic, like the vp8, replacing
>> multiplication by addition in several filter functions, but, as you see
>> below, it didn't bring any results.
> [..]
>> - a0 = (27*w + 63) >> 7;
>> - a1 = (18*w + 63) >> 7;
>> - a2 = ( 9*w + 63) >> 7;
>> + w9 = 9 * w;
>> + w9_63 = w9 + 63;
>> +
>> + a2 = w9_63 >> 7;
>> + w9_63 += w9;
>> + a1 = w9_63 >> 7;
>> + w9_63 += w9;
>> + a0 = w9_63 >> 7;
>
> I might be advocating the devil here, but please do check the
> disassembly before and after your patches. You'd be (pleasantly!)
> surprised at the kind of code that gcc sometimes generates for this
> kind of stuff. (Then again, it might just be that your code does
> improve it and you just can't measure the effect...)
If you're serious about VP8 on SH4, you should be writing those
functions entirely in asm. That will gain at least 20% speed on top
of what you might achieve in C alone.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list