[FFmpeg-devel] [PATCH] h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.

Michael Niedermayer michaelni at gmx.at
Wed Dec 3 23:04:48 CET 2014


On Wed, Dec 03, 2014 at 10:39:00PM +0100, Reimar Döffinger wrote:
> On Wed, Dec 03, 2014 at 01:19:48PM +0100, Michael Niedermayer wrote:
> > On Wed, Dec 03, 2014 at 09:00:39AM +0100, Reimar Döffinger wrote:
> > > On 03.12.2014, at 01:40, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > > On Sat, Nov 22, 2014 at 02:09:01PM +0100, Reimar Döffinger wrote:
> > > >> On Mon, Nov 17, 2014 at 01:41:13PM +0100, Michael Niedermayer wrote:
> > > >>> On Mon, Nov 17, 2014 at 08:19:32AM +0100, Reimar Döffinger wrote:
> > > >>>> On 17.11.2014, at 02:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > >>>>> On Sat, Nov 15, 2014 at 06:16:03PM +0100, Reimar Döffinger wrote:
> > > >>>>>> 11674 -> 10877 decicycles on my Phenom II.
> > > >>>>>> Overall speedup was unfortunately within measurement error.
> > > >>>>> 
> > > >>>>> here its  10153 ->10135
> > > >>>> 
> > > >>>> I suspect it also depends a bit on the compiler and how it changes the surrounding code.
> > > >>>> Note that I also tested with PIC actually.
> > > >>>> 
> > > >>>>> but ive a slightly odd feeling about the chnages to the asm code,
> > > >>>>> iam not sure if all assemblers will be happy about the changed
> > > >>>>> code
> > > >>>> 
> > > >>>> Do you mean particularly the movzbl change?
> > > >>> 
> > > >>> yes and the k stuff
> > > >>> 
> > > >>> 
> > > >>>> I am also unsure about that, I think there was a reason for that %k6 mess...
> > > >>>> But this as well as movzx seemed to work for me...
> > > >>> 
> > > >>> it works here too i just have the feeling it might fail on some odd
> > > >>> assembler or platform. Thats not meant to keep you from pushing this
> > > >>> just that it might require to be reverted or fixed if such
> > > >>> problems actually occor
> > > >> 
> > > >> I pushed it.
> > > >> If anyone sees issues please tell me and I'll look into it!
> > > > 
> > > > i think these fate failures are caused by it but thats based just
> > > > on other commits in the range looking unlikely:
> > > > 
> > > > http://fate.ffmpeg.org/report.cgi?time=20141122231657&slot=x86_64-darwin-clang-3.5-O3
> > > > http://fate.ffmpeg.org/report.cgi?time=20141122223720&slot=x86_64-darwin-clang-3.5
> > > 
> > > That's annoying, I only expected compile errors, this looks more like a compiler bug.
> > > Can someone run tests?
> > > Does just using the "m" instead of "r" constraint like on 32 bit fix it?
> > 
> > still aborts with:
> 
> Oh dear.
> On re-reading the code it seems I got a bit confused on what %0 actually
> points to (I somehow thought it actually pointed to the on-stack x86_reg).
> I can't test and benchmark today, but I think this one might fix it:

applied

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141203/6af35678/attachment.asc>


More information about the ffmpeg-devel mailing list