[FFmpeg-devel] [PATCH] snow SSE2 add_yblock
Fri Aug 31 14:29:58 CEST 2007
On Fri, Aug 31, 2007 at 11:07:49AM +0200, Reimar D?ffinger wrote:
> On Fri, Aug 31, 2007 at 05:04:44AM +0200, Michael Niedermayer wrote:
> > On Fri, Aug 31, 2007 at 02:57:05AM +0200, Reimar D?ffinger wrote:
> > > On Thu, Aug 30, 2007 at 06:19:49PM +0200, Michael Niedermayer wrote:
> > > > On Thu, Aug 30, 2007 at 04:56:41PM +0200, Reimar D?ffinger wrote:
> > > > > attached patch should have a working version.
> > > > > I have replaced several of the hardcoded registers by something more
> > > > > flexible because I found it also nicer to read.
> > > > > Suggestions welcome (though optimizations IMO should be done after
> > > > > applying and reenabling).
> > > > > And better don't try to read the patch but apply and read the resulting
> > > > > asm, diff made something quite butchered out of this.
> > > >
> > > > while iam glad that you fix the bugs, cleanup the code and all,
> > > > this really doesnt belong in a single patch
> > >
> > > Well, I considered it a replacing of the old code. If you consider it
> > > bugfix and improvements of the current code I guess not.
> > > Splitting it properly will involve loads of patches. I can do that
> > > eventually, but it will take some time and involve quite a few
> > > cosmetic/not so useful patches (also since I don't remember for sure for
> > > all cases why I did them like this).
> > i dont think it requires that many patches
> > its just fix bugs (1-2 patches)
> > rewrite (cosmetic) same object files generated (1 patch)
> Then you don't end up with what this patch does, there'd still be:
> reoder instructions to make more sense/easier to understand and are more
> consistent between the two variants
well, i didnt even notice that you reordered the instructions relative
to the current code ...
this of course should be seperate and benchmarked
> use some different xmm registers so the correspond more closely to
> variables in the C code (and again are more similar between the wro
> do not use hardcoded registers
that should be benchmarked as well, ive seen cases where leaving the register
choice to gcc caused a slowdown (that is in all the h.264 cabac code where
they are hardcoded currently)
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel