[FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

James Darnley jdarnley at obe.tv
Fri Dec 2 01:49:23 EET 2016


On 2016-12-01 23:16, Michael Niedermayer wrote:
> On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote:
>> Yorkfield:
>>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>>  - sse2: 3.35x faster (278 vs.  83 cycles)
>>
>> Skylake:
>>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>>  - sse2: 2.34x faster (169 vs.  72 cycles)
>>  - avx:  2.32x faster (169 vs.  73 cycles)
>> ---
>>  libavcodec/x86/h264_deblock_10bit.asm | 118 ++++++++++++++++++++++++++++++++++
>>  libavcodec/x86/h264dsp_init.c         |   9 +++
>>  2 files changed, 127 insertions(+)
> 
> breaks build on linux x86-32
> 
> YASM    libavcodec/x86/h264_deblock_10bit.o
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' (first use)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error:  (Each undefined symbol is reported only once.)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode

Ah.  I shouldn't do clever things like trying to use the byte-sized
registers.  It isn't needed and causes problems like this.  Changed
locally.  Also changed in the 4:2:0 chroma intra patch.



More information about the ffmpeg-devel mailing list