[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Michael Niedermayer michaelni at gmx.at
Sat Mar 29 13:58:32 CET 2014


On Sat, Mar 29, 2014 at 03:18:44PM +1100, Matt Oliver wrote:
> OK here is a slightly different approach to fixing the missing CLTD
> instruction support for icl inline asm. After looking into the situation
> further I noticed that CLTD is only used in 2 places and in both places it
> is for generating a sign mask.
> 
> Since on most modern processors CLTD (CDQ technically) is a 2 clock cycle
> instruction then a sign mask can alternately be created using an arithmetic
> right shift (which is always a 1 clock cycle instruction). Although this
> often requires an extra mov instruction to backup the register contents
> this extra mov is again 1 clock cycle so the net difference is nothing. So
> on most current processors replacing it has zero net difference (in fact
> the uops for cdq are often an mov/sar anyway). Its only different on older
> processors such as AMDs K8/K9/Jaguar and PentiumM etc. that actually have a
> 1 clock cycle cdq. To try and rectify I renamed some registers to optimize
> performance through removing pipeline stalls and allowing throughput
> optimization so that the mov/sar can be dual issued on any processor that
> supports it. This should reduce times by 1 clock cycle.

2 instructions instead of 1 increase pressure on the code cache
and codecs like h264 are complex enough so the code cache size
should have signifiant effects



> 
> So the net performance difference depends on processor but it will either
> be zero diff or potentially 1 clock cycle faster (changing it in MASK_ABS
> removes the fixed eax/edx register requirements and allows the compiler to
> arrange things as it wants. This allows it to remove a mov or 2 aswell).
> Since we are talking 1 clock cycle here then benchmarking shows no real
> world difference between the 2 approaches. However this second approach
> does work on icl whereas the previous does not. So I want technically call
> this an optimization as it has no real world performance diff but it does
> fix icl inline asm support.

did you benchmark this ?

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140329/e022a05d/attachment.asc>


More information about the ffmpeg-devel mailing list