[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows
protogonoi at gmail.com
Sat Apr 5 05:17:41 CEST 2014
Heres an additional patch that modifies some of the inline asm to make it
work under icl.
One previous issue was a lea instruction that would not compile under 64b
icl. It seems icl has some sort of conformance checks which fail due to a
32 bit lea operation on what it must assume is an address (which is why it
only fails in 64b due to the mismatch in forcing a 32b value).
On a second look the lea is only being used to perform an add operation
without modifying the flags register. This however is actually used to
create a value that previously already existed in register (the code
subtracts a value and then uses the lea to re-add it). So the lea can be
removed and the original value can just be used instead. This requires an
additional register to store the old value for a couple of instructions but
based on where the code is used there doesnt appear to be much issue with
that (as the removal of a add operation outweighs the extra register cost).
The function itself benches as being ~5.3% faster when tested over a 100
million iterations on random data. To test the affect of the changed
register usage I tested it within vp9s decode_coeffs_b_generic and
vp5s vp5_parse_vector_adjustment. The performance gains where obviously
less due to the rest of the function overhead (0.5%, 3.9% respectively) but
there was no performance degradation and this new function compiles under
The only remaining inline asm still in master that does not compile under
icl is BRANCHLESS_GET_CABAC in x86/cabac.h. So if no one has any objections
I was going to move that to external asm.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1551 bytes
Desc: not available
More information about the ffmpeg-devel