[FFmpeg-cvslog] r16207 - trunk/libavcodec/h264.c

Måns Rullgård mans
Thu Dec 18 17:07:50 CET 2008


compn wrote:
> On Thu, 18 Dec 2008 11:48:58 -0000 (UTC), M?ns Rullg?rd wrote:
>>
>>Michael Niedermayer wrote:
>>> On Thu, Dec 18, 2008 at 04:27:34AM +0000, M?ns Rullg?rd wrote:
>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>
>>>> > On Thu, Dec 18, 2008 at 03:57:54AM +0000, M?ns Rullg?rd wrote:
>>>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>>>> >>
>>>> >> > On Thu, Dec 18, 2008 at 02:57:17AM +0000, M?ns Rullg?rd wrote:
>>>> >> >> michael <subversion at mplayerhq.hu> writes:
>>>> >> >>
>>>> >> >> > Author: michael
>>>> >> >> > Date: Thu Dec 18 03:53:18 2008
>>>> >> >> > New Revision: 16207
>>>> >> >> >
>>>> >> >> > Log:
>>>> >> >> > Use the new idct functions (except chroma as it was slower in
>>>> >> >> > benchmarks) cathedral +0.5% speed
>>>> >> >> > aladin +0.6% speed [note aladin has been cat-ed 10 times to reduce
>>>> >> >> > the influence of init time]
>>>> >> >> > Speedup also verified via START/STOP_TIMER (difference was very
>>>> >> >> > significant for the changed parts)
>>>> >> >>
>>>> >> >> How much does this hurt on architectures that don't yet have the new
>>>> >> >> SIMD functions?
>>>> >> >
>>>> >> > there are no really new SIMD functions.
>>>> >> > I just moved the loops like
>>>> >> > for(i=0; i<16; i++)
>>>> >> >     dsp->idct4x4_add(blah blah);
>>>> >> >
>>>> >> > into dsputil so they are
>>>> >> >
>>>> >> > for(i=0; i<16; i++)
>>>> >> >     idct4x4_add_simdwhatever(blah blah);
>>>> >> >
>>>> >> > that way gcc can inline the function and avoids up to 15 calls
>>>> >> > through dsp->
>>>> >> >
>>>> >> > adding support for this to your favorite architecture is a matter of
>>>> >> > copy & paste and adjusting the function names.
>>>> >>
>>>> >> I can see how it can be done.  I'm asking how much of an impact this
>>>> >> has on performance until it's been done.  What percentage of the old
>>>> >> calls are affected?
>>>> >
>>>> > depends on the video, if i have to guess and thats just a guess id say
>>>> > more than 50% of the idct work will go to new code
>>>>
>>>> That's enough to give a significant slowdown on ARM.  How about a
>>>> little coordination next time?
>>>
>>> Iam not sure what you suggest?
>>>
>>> I could of course send out a warning liks
>>> "will in an hour (if it passes tests and benchmarks) commit code that will
>>>  require arch specific optims to be updated, until that update they will
>>>  be slower. I will update x86 myself"
>>>
>>> but i dont think that would have helped you.
>>
>>That is more or less what you did, and it did not help me at all.
>>What I want is that before you commit the change, you send a patch
>>to the list and wait one day for arch maintainers to catch up, then
>>apply.
>
> what it sounds like to me:
>
> "i want you to halt your work for one day, every time you optimize x86.
> until we test it for speed regressions on other arch"

That is not at all what I said.  Did even pay attention to the change?
Michael made a change to some SIMD functions, *disabling* them entirely
on non-x86 until they are updated to a new interface.

> i can see both sides of the issue, but do not have a solution.
>
> it maybe wise to have an arm box on FATE which developers can push
> patches (which are then automatically built and benchmarked)... i dont
> think all developers have all kinds of hardware running to test all
> changes.

I am not asking for all changes to be benchmarked on all platforms.
I am asking for *one day* of advance notice before existing assembler
optimisations are disabled by an interface change.  After the one
day has passed, the change can be committed even if some architecture
has not been updated.

Normal changes that happen to have a negative effect on some architecture
will have to be discovered after the fact.

Do you see the difference?

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-cvslog mailing list