[FFmpeg-cvslog] r16207 - trunk/libavcodec/h264.c

Thu Dec 18 16:36:57 CET 2008

On Thu, 18 Dec 2008 11:48:58 -0000 (UTC), M?ns Rullg?rd wrote:
>
>Michael Niedermayer wrote:
>> On Thu, Dec 18, 2008 at 04:27:34AM +0000, M?ns Rullg?rd wrote:
>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>
>>> > On Thu, Dec 18, 2008 at 03:57:54AM +0000, M?ns Rullg?rd wrote:
>>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>>> >>
>>> >> > On Thu, Dec 18, 2008 at 02:57:17AM +0000, M?ns Rullg?rd wrote:
>>> >> >> michael <subversion at mplayerhq.hu> writes:
>>> >> >>
>>> >> >> > Author: michael
>>> >> >> > Date: Thu Dec 18 03:53:18 2008
>>> >> >> > New Revision: 16207
>>> >> >> >
>>> >> >> > Log:
>>> >> >> > Use the new idct functions (except chroma as it was slower in
>>> >> >> > benchmarks) cathedral +0.5% speed
>>> >> >> > aladin +0.6% speed [note aladin has been cat-ed 10 times to reduce
>>> >> >> > the influence of init time]
>>> >> >> > Speedup also verified via START/STOP_TIMER (difference was very
>>> >> >> > significant for the changed parts)
>>> >> >>
>>> >> >> How much does this hurt on architectures that don't yet have the new
>>> >> >> SIMD functions?
>>> >> >
>>> >> > there are no really new SIMD functions.
>>> >> > I just moved the loops like
>>> >> > for(i=0; i<16; i++)
>>> >> >     dsp->idct4x4_add(blah blah);
>>> >> >
>>> >> > into dsputil so they are
>>> >> >
>>> >> > for(i=0; i<16; i++)
>>> >> >     idct4x4_add_simdwhatever(blah blah);
>>> >> >
>>> >> > that way gcc can inline the function and avoids up to 15 calls through
>>> >> > dsp->
>>> >> >
>>> >> > adding support for this to your favorite architecture is a matter of
>>> >> > copy & paste and adjusting the function names.
>>> >>
>>> >> I can see how it can be done.  I'm asking how much of an impact this
>>> >> has on performance until it's been done.  What percentage of the old
>>> >> calls are affected?
>>> >
>>> > depends on the video, if i have to guess and thats just a guess id say
>>> > more than 50% of the idct work will go to new code
>>>
>>> That's enough to give a significant slowdown on ARM.  How about a
>>> little coordination next time?
>>
>> Iam not sure what you suggest?
>>
>> I could of course send out a warning liks
>> "will in an hour (if it passes tests and benchmarks) commit code that will
>>  require arch specific optims to be updated, until that update they will
>>  be slower. I will update x86 myself"
>>
>> but i dont think that would have helped you.
>
>That is more or less what you did, and it did not help me at all.
>What I want is that before you commit the change, you send a patch
>to the list and wait one day for arch maintainers to catch up, then
>apply.

what it sounds like to me:

"i want you to halt your work for one day, every time you optimize x86.
until we test it for speed regressions on other arch"

i can see both sides of the issue, but do not have a solution.

it maybe wise to have an arm box on FATE which developers can push
patches (which are then automatically built and benchmarked)... i dont
think all developers have all kinds of hardware running to test all
changes.

-compn