[FFmpeg-devel] [PATCH 3/3][RFC] avfilter/vf_chromakey: Add OpenCL acceleration
Timothy Gu
timothygu99 at gmail.com
Wed Oct 14 04:50:37 CEST 2015
On Tue, Oct 13, 2015 at 3:28 AM Timo Rothenpieler <timo at rothenpieler.org>
wrote:
> > Hi
> >
> > I use your filter, but the kernel can't pass the compile, you should
> consider the "double" type in the kernel, some GPU card does not support
> double type
> > I add "#pragma OPENCL_EXTENSION cl_khr_fp64: enable " to the kernel, but
> it does not works
> >
> > I will check the error tomorrow
>
> I tested this filter on Nvidia on Linux, using driver 355 and on the
> Intel CPU-based OpenCL SDK so far.
> Using floats potentially has an impact on the keying quality.
>
Segfaults here with ffmpeg -f lavfi -i allrgb -vf chromakey=green:opencl=1
-f null -
gdb doesn't give anything useful:
$ gdb ffmpeg_g
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ffmpeg_g...done.
(gdb) r -loglevel debug -f lavfi -i allrgb -vf chromakey=green:opencl=1 -f
null -
Starting program: /home/timothy-gu/ffmpeg/ffmpeg/ffmpeg_g -loglevel debug
-f lavfi -i allrgb -vf chromakey=green:opencl=1 -f null -
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
ffmpeg version N-75973-g3cff255 Copyright (c) 2000-2015 the FFmpeg
developers
built with gcc 4.9.2 (Debian 4.9.2-10)
configuration: --enable-gpl --enable-libass --enable-libfdk-aac
--enable-libfreetype --enable-libmp3lame --enable-libopus
--enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264
--enable-nonfree --extra-ldflags='-L/opt/intel/opencl-1.2-5.0.0.57/lib64
-Wl,-rpath,/opt/intel/opencl-1.2-5.0.0.57/lib64' --enable-opencl
libavutil 55. 3.100 / 55. 3.100
libavcodec 57. 5.100 / 57. 5.100
libavformat 57. 3.100 / 57. 3.100
libavdevice 57. 0.100 / 57. 0.100
libavfilter 6. 11.100 / 6. 11.100
libswscale 4. 0.100 / 4. 0.100
libswresample 2. 0.100 / 2. 0.100
libpostproc 54. 0.100 / 54. 0.100
Splitting the commandline.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging
level) with argument 'debug'.
Reading option '-f' ... matched as option 'f' (force format) with argument
'lavfi'.
Reading option '-i' ... matched as input file with argument 'allrgb'.
Reading option '-vf' ... matched as option 'vf' (set video filters) with
argument 'chromakey=green:opencl=1'.
Reading option '-f' ... matched as option 'f' (force format) with argument
'null'.
Reading option '-' ... matched as output file.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input file allrgb.
Applying option f (force format) with argument lavfi.
Successfully parsed a group of options.
Opening an input file: allrgb.
detected 4 logical cores
[New Thread 0x7fffed3b9700 (LWP 13438)]
[New Thread 0x7fffecbb8700 (LWP 13439)]
[New Thread 0x7fffec3b7700 (LWP 13440)]
[New Thread 0x7fffebbb6700 (LWP 13441)]
[New Thread 0x7fffeb3b5700 (LWP 13442)]
[Parsed_allrgb_0 @ 0x1bb0fa0] size:4096x4096 rate:25/1 duration:-1.000000
sar:1/1
[AVFilterGraph @ 0x1baff00] query_formats: 2 queried, 1 merged, 0 already
done, 0 delayed
[lavfi @ 0x1baf700] All info found
Input #0, lavfi, from 'allrgb':
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0, 1, 1/25: Video: rawvideo, 1 reference frame (RGB[24] /
0x18424752), rgb24, 4096x4096 [SAR 1:1 DAR 1:1], 1/25, 25 tbr, 25 tbn, 25
tbc
Successfully opened the file.
Parsing a group of options: output file -.
Applying option vf (set video filters) with argument
chromakey=green:opencl=1.
Applying option f (force format) with argument null.
Successfully parsed a group of options.
Opening an output file: -.
Successfully opened the file.
[New Thread 0x7fffe4bb2700 (LWP 13443)]
[New Thread 0x7fffe43b1700 (LWP 13444)]
[New Thread 0x7fffe3bb0700 (LWP 13445)]
[New Thread 0x7fffe33af700 (LWP 13446)]
[New Thread 0x7fffe2bae700 (LWP 13447)]
[Parsed_chromakey_0 @ 0x1bb52a0] Setting 'color' to value 'green'
[Parsed_chromakey_0 @ 0x1bb52a0] Setting 'opencl' to value '1'
[OPENCLUTILS @ 0x1510100] Could not get device ID: DEVICE NOT FOUND:
[OPENCLUTILS @ 0x1510100] Platform Name: Intel(R) Corporation, Device Name:
Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
[New Thread 0x7fffdb6b8700 (LWP 13448)]
[New Thread 0x7fffdaeb6700 (LWP 13450)]
[New Thread 0x7fffdb2b7700 (LWP 13449)]
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'video_size' to value
'4096x4096'
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'pix_fmt' to value '2'
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'time_base' to value
'1/25'
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'pixel_aspect' to value
'1/1'
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'sws_param' to value
'flags=2'
[graph 0 input from stream 0:0 @ 0x1f86a00] Setting 'frame_rate' to value
'25/1'
[graph 0 input from stream 0:0 @ 0x1f86a00] w:4096 h:4096 pixfmt:rgb24
tb:1/25 fr:25/1 sar:1/1 sws_param:flags=2
[auto-inserted scaler 0 @ 0x1ecaa80] Setting 'flags' to value 'bicubic'
[auto-inserted scaler 0 @ 0x1ecaa80] w:iw h:ih flags:'bicubic' interl:0
[Parsed_chromakey_0 @ 0x1bb52a0] auto-inserting filter 'auto-inserted
scaler 0' between the filter 'graph 0 input from stream 0:0' and the filter
'Parsed_chromakey_0'
[AVFilterGraph @ 0x1bb47a0] query_formats: 3 queried, 1 merged, 1 already
done, 0 delayed
[auto-inserted scaler 0 @ 0x1ecaa80] picking yuva444p out of 3 ref:rgb24
alpha:0
[auto-inserted scaler 0 @ 0x1ecaa80] w:4096 h:4096 fmt:rgb24 sar:1/1 ->
w:4096 h:4096 fmt:yuva444p sar:1/1 flags:0x4
Output #0, null, to 'pipe:':
Metadata:
encoder : Lavf57.3.100
Stream #0:0, 0, 1/25: Video: rawvideo, 1 reference frame (Y4[0][8] /
0x8003459), yuva444p, 4096x4096 [SAR 1:1 DAR 1:1], 1/25, q=2-31, 200 kb/s,
25 fps, 25 tbn, 25 tbc
Metadata:
encoder : Lavc57.5.100 rawvideo
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> rawvideo (native))
Press [q] to stop, [?] for help
[null @ 0x1bb32e0] Encoder did not produce proper pts, making some up.
frame= 2 fps=0.0 q=-0.0 size=N/A time=00:00:00.08 bitrate=N/A
Program received signal SIGSEGV, Segmentation fault.
0x00007fffd8136e6b in ?? ()
(gdb) bt
#0 0x00007fffd8136e6b in ?? ()
#1 0x00000000ffffd094 in ?? ()
#2 0x00007ffff3f3a459 in _IO_vsnprintf (string=0x7fffffffc6c0 "
\307\377\377\377\177", maxlen=<optimized out>,
format=0x2 <error: Cannot access memory at address 0x2>, args=0x1) at
vsnprintf.c:119
#3 0x40efc02000000000 in ?? ()
#4 0x40efc02000000000 in ?? ()
#5 0x40efc02000000000 in ?? ()
#6 0x000000ff000000ff in ?? ()
#7 0x000000ff000000ff in ?? ()
#8 0x000000ff000000ff in ?? ()
#9 0x000000ff000000ff in ?? ()
#10 0x0056005600560056 in ?? ()
#11 0x0056005600560056 in ?? ()
#12 0x004a004a004a004a in ?? ()
#13 0x004a004a004a004a in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) disass $pc-32 $pc+32
A syntax error in expression, near `$pc+32'.
(gdb) disass $pc-32,$pc+32
Dump of assembler code from 0x7fffd8136e4b to 0x7fffd8136e8b:
0x00007fffd8136e4b: (bad)
0x00007fffd8136e4c: add %al,(%rax)
0x00007fffd8136e4e: mov (%rdi,%r9,1),%bl
0x00007fffd8136e52: vpextrb $0x6,%xmm8,%esi
0x00007fffd8136e58: test $0x1,%al
0x00007fffd8136e5a: mov %eax,0x52c(%rsp)
0x00007fffd8136e61: je 0x7fffd8136e6e
0x00007fffd8136e63: mov 0x6a8(%rsp),%rdi
=> 0x00007fffd8136e6b: mov (%rdi,%rdx,1),%al
0x00007fffd8136e6e: mov %bl,0x500(%rsp)
0x00007fffd8136e75: vpextrb $0x8,%xmm8,%edi
0x00007fffd8136e7b: mov %edi,0x520(%rsp)
0x00007fffd8136e82: test $0x1,%cl
0x00007fffd8136e85: mov %bl,0x560(%rsp)
End of assembler dump.
When it does work it does give a pretty good speedup (~2.1x) even when only
using the CPU.
Timothy
More information about the ffmpeg-devel
mailing list