[FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm
Ivan Kalvachev
ikalvachev at gmail.com
Sun Jun 11 12:34:46 EEST 2017
On 6/10/17, James Darnley <james.darnley at gmail.com> wrote:
> On 2017-06-09 13:41, Ivan Kalvachev wrote:
>> On 6/9/17, Michael Niedermayer <michael at niedermayer.cc> wrote:
>>> seems this breaks build with mingw64, didnt investigate but it
>>> fails with these errors:
>>>
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> collect2: error: ld returned 1 exit status
>>> collect2: error: ld returned 1 exit status
>>> make: *** [ffmpeg_g.exe] Error 1
>>> make: *** Waiting for unfinished jobs....
>>> make: *** [ffprobe_g.exe] Error 1
>>
>>
>> const_*_edge is used on only one place is the code.
>> Would you check if this patch fixes the issue.
>>
>> I expected that the addresses would be pre-calculated
>> by n/yasm as one value and indexed
>> relative to the section start.
>> Instead it seems that each entry is represented with
>> its own address and offset from it.
>> Since the offset is negative it uses all 64 bits and
>> it makes difference if it is truncated to 32 bits.
>>
>> Same issue could happen with clang tools.
>
> The problem is with the relative addressing. You need to load the real
> address first before you can offset with another register at runtime. So
> something like:
>
>> mov reg1, [read_only_const]
lea ?
>> mova mmreg, [reg1 + reg2]
.
OK, Getting mingw for my distro is problem
and compiling one myself would take a bit more effort/time.
So I'm posting a patch that "should" work.
======
--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -406,7 +406,7 @@ align 16
; uint32 N - Number of vector elements. Must be 0 < N < 8192
;
%macro PVQ_FAST_SEARCH 0
-cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
+cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N
%define tmpX rsp
; movsxdifnidn Nq, Nd
@@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
add Nq, r4q ; Nq = align(Nq, mmsize)
sub rsp, Nq ; allocate tmpX[Nq]
+%ifdef PIC
+ lea r5q, [const_align_abs_edge] ; rip+const
+ movups m3, [r5q+r4q-mmsize] ; this is
the bit mask for the padded read at the end of the input
+%else
movups m3, [const_align_abs_edge-mmsize+r4q] ; this is
the bit mask for the padded read at the end of the input
+%endif
lea r4q, [Nq-mmsize] ; Nq is rounded up (aligned
up) to mmsize, so r4q can't become negative here, unless N=0.
movups m2, [inXq + r4q]
======
What I find surprising is that PIC is enabled only on Windows and does not seem
to depend on CONFIG_PIC, so textrels are used all over assembly code.
Do I miss something?
Are there option(s) to signal/error when texrel is been used in code
that should be pic ?
More information about the ffmpeg-devel
mailing list