[FFmpeg-devel] [PATCH] SSE dct32()
Michael Niedermayer
michaelni
Sun Jun 20 20:19:18 CEST 2010
On Sun, Jun 20, 2010 at 07:51:28PM +0200, Michael Niedermayer wrote:
> On Sun, Jun 20, 2010 at 01:12:48PM +0100, M?ns Rullg?rd wrote:
> > Vitor Sessak <vitor1001 at gmail.com> writes:
> >
> > > On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
> > >> Vitor Sessak<vitor1001 at gmail.com> writes:
> > >>
> > >>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
> > >>>> Vitor Sessak<vitor1001 at gmail.com> writes:
> > >>>>
> > >>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
> > >>>>>>> out and in != out.
> > >>>>>>
> > >>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
> > >>>>>> ?
> > >>>>>
> > >>>>> No, the 3% slowdown was when converting our general code (using FFT)
> > >>>>> to have in != out.
> > >>>>
> > >>>> And that was due to missed optimisations caused by gcc not knowing
> > >>>> that those pointers don't alias each other. Marking them restrict is
> > >>>> not good either, since we actually want to pass the same value
> > >>>> sometimes.
> > >>>
> > >>> That and one extra used register.
> > >>
> > >> So what do we do? I see the following options:
> > >>
> > >> 1. Change mp3 decoder to work with inplace transform.
> > >
> > > Looks hard with no speed loss
> >
> > Just hard or impossible?
>
> hard, not impossible
> just consider that dct32() trashes its input array
>
> Either way, the in != out thing is not a big issue if its not slower
> what is a big issue is that high level optimizations have to be done
> before asm optimisations
>
> is our dct32() code optimal? If i didnt miscount mp3lib does 4 butterflies
> less but i could have miscounted. Also our dct32() should be benchmarked
i misscounted ^^;; (my grep counted the 4 BF() in BF1/BF2 too)
still my point stands that our dct32 code should be benchmarked against other
decoders
note though i just looked at mp3lib from mplayer svn which is a bit outdated
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100620/e7246db2/attachment.pgp>
More information about the ffmpeg-devel
mailing list