[FFmpeg-user] memory leak with vaapi decode since ffmpeg_vaapi: Convert to use hw_frames_ctx only

Mark Thompson sw at jkqxz.net
Sun May 28 16:39:43 EEST 2017


On 28/05/17 12:22, Andy Furniss wrote:
> ==13089== HEAP SUMMARY:
> ==13089==     in use at exit: 1,641,516,251 bytes in 201,131 blocks
> ==13089==   total heap usage: 2,075,668 allocs, 1,874,537 frees, 68,823,339,225 bytes allocated
> ==13089==
> ==13089== 368 bytes in 1 blocks are possibly lost in loss record 908 of 932
> ==13089==    at 0x4C28B06: malloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:299)
> ==13089==    by 0xFBED2DE: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:56)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2C21: ff_vaapi_decode_make_param_buffer (/ffmpeg/libavcodec/vaapi_decode.c:40)
> ==13089==    by 0xAFDA07: vaapi_h264_start_frame (/ffmpeg/libavcodec/vaapi_h264.c:286)
> ==13089==    by 0x853E0B: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:690)
> ==13089==    by 0x853E0B: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 22,968 bytes in 11 blocks are possibly lost in loss record 922 of 932
> ==13089==    at 0x4C28B06: malloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:299)
> ==13089==    by 0xFBED2DE: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:56)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2D46: ff_vaapi_decode_make_slice_buffer (/ffmpeg/libavcodec/vaapi_decode.c:86)
> ==13089==    by 0xAFCCBB: vaapi_h264_decode_slice (/ffmpeg/libavcodec/vaapi_h264.c:380)
> ==13089==    by 0x853CCD: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:703)
> ==13089==    by 0x853CCD: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 3,042,432 (800,640 direct, 2,241,792 indirect) bytes in 10,008 blocks are definitely lost in loss record 925 of 932
> ==13089==    at 0x4C2A898: calloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711)
> ==13089==    by 0xFBED2C0: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:49)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2C21: ff_vaapi_decode_make_param_buffer (/ffmpeg/libavcodec/vaapi_decode.c:40)
> ==13089==    by 0xAFDBA9: vaapi_h264_start_frame (/ffmpeg/libavcodec/vaapi_h264.c:299)
> ==13089==    by 0x853E0B: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:690)
> ==13089==    by 0x853E0B: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 4,483,216 (800,640 direct, 3,682,576 indirect) bytes in 10,008 blocks are definitely lost in loss record 927 of 932
> ==13089==    at 0x4C2A898: calloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711)
> ==13089==    by 0xFBED2C0: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:49)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2C21: ff_vaapi_decode_make_param_buffer (/ffmpeg/libavcodec/vaapi_decode.c:40)
> ==13089==    by 0xAFDA07: vaapi_h264_start_frame (/ffmpeg/libavcodec/vaapi_h264.c:286)
> ==13089==    by 0x853E0B: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:690)
> ==13089==    by 0x853E0B: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 6,872,207 bytes in 150 blocks are possibly lost in loss record 928 of 932
> ==13089==    at 0x4C28B06: malloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:299)
> ==13089==    by 0xFBED2DE: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:56)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2DA5: ff_vaapi_decode_make_slice_buffer (/ffmpeg/libavcodec/vaapi_decode.c:100)
> ==13089==    by 0xAFCCBB: vaapi_h264_decode_slice (/ffmpeg/libavcodec/vaapi_h264.c:380)
> ==13089==    by 0x853CCD: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:703)
> ==13089==    by 0x853CCD: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 86,766,408 (3,202,560 direct, 83,563,848 indirect) bytes in 40,032 blocks are definitely lost in loss record 930 of 932
> ==13089==    at 0x4C2A898: calloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711)
> ==13089==    by 0xFBED2C0: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:49)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2D46: ff_vaapi_decode_make_slice_buffer (/ffmpeg/libavcodec/vaapi_decode.c:86)
> ==13089==    by 0xAFCCBB: vaapi_h264_decode_slice (/ffmpeg/libavcodec/vaapi_h264.c:380)
> ==13089==    by 0x853CCD: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:703)
> ==13089==    by 0x853CCD: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== 1,540,168,859 (3,202,560 direct, 1,536,966,299 indirect) bytes in 40,032 blocks are definitely lost in loss record 932 of 932
> ==13089==    at 0x4C2A898: calloc (/home/andy/Src2016/valgrind-3.12.0/coregrind/m_replacemalloc/vg_replace_malloc.c:711)
> ==13089==    by 0xFBED2C0: vlVaCreateBuffer (/mesa/src/gallium/state_trackers/va/buffer.c:49)
> ==13089==    by 0x603BB82: vaCreateBuffer (/libva/va/va.c:1116)
> ==13089==    by 0xED2DA5: ff_vaapi_decode_make_slice_buffer (/ffmpeg/libavcodec/vaapi_decode.c:100)
> ==13089==    by 0xAFCCBB: vaapi_h264_decode_slice (/ffmpeg/libavcodec/vaapi_h264.c:380)
> ==13089==    by 0x853CCD: decode_nal_units (/ffmpeg/libavcodec/h264dec.c:703)
> ==13089==    by 0x853CCD: h264_decode_frame (/ffmpeg/libavcodec/h264dec.c:1008)
> ==13089==    by 0xA0A54F: frame_worker_thread (/ffmpeg/libavcodec/pthread_frame.c:199)
> ==13089==    by 0x8D12433: start_thread (in /lib/libpthread-2.22.so)
> ==13089==    by 0x901106C: clone (in /lib/libc-2.22.so)
> ==13089==
> ==13089== LEAK SUMMARY:
> ==13089==    definitely lost: 8,006,400 bytes in 100,080 blocks
> ==13089==    indirectly lost: 1,626,454,515 bytes in 99,918 blocks
> ==13089==      possibly lost: 6,895,543 bytes in 162 blocks
> ==13089==    still reachable: 159,793 bytes in 971 blocks
> ==13089==         suppressed: 0 bytes in 0 blocks
> ==13089== Reachable blocks (those to which a pointer was found) are not shown.
> ==13089== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==13089==
> ==13089== For counts of detected and suppressed errors, rerun with: -v
> ==13089== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)

Hmph, looks like the parameter buffer confusion strikes again.


The original VAAPI specification contained the following functions:

"""
/**
 * After this call, the buffer is deleted and this buffer_id is no longer valid
 * Only call this if the buffer is not going to be passed to vaRenderBuffer
 */
VAStatus vaDestroyBuffer (
...
/**
 * Send decode buffers to the server.
 * Buffers are automatically destroyed afterwards
 */
VAStatus vaRenderPicture (
"""

The newer lavc implementation follows this, and does not call vaDestroyBuffer() on buffers passed to vaRenderPicture().  Not doing this would cause random crashes in multithreaded programs, because another buffer with the same ID could be created between the two calls.

However, the Intel implementation never actually followed the specification - it leaked the buffers passed to vaRenderPicture().  So, a special driver quirk was added to detect that driver and additionally destroy the buffers in that case: <http://git.videolan.org/?p=ffmpeg.git;a=commit;h=4926fa9a4aa03f3b751f52e900b9efb87fea0591>.  This restored same behaviour as old lavc with the Intel driver without breaking other implementations.

That worked for a bit.

Unfortunately, Intel got tired of people complaining that their driver was broken.  Rather than fixing it, they decided to change the specification: <https://github.com/01org/libva/commit/3eb038aa13bdd785808286c0a4995bd7a1ef07e9> (this is an ABI and API break, and the SONAME was not changed).

Thus we have the state now:

VAAPI < 0.40:
A)  Destroy buffers: wrong, will crash in multithreaded programs.
B)  Don't destroy buffers: correct, driver quirk restores behaviour for Intel.

VAAPI >= 0.40:
A)  Destroy buffers: correct, but may crash old drivers which haven't been updated (not detectable, because the library allows loading old drivers without telling the user).
B)  Don't destroy buffers: wrong, driver quirk restores correct behaviour for Intel, others may leak memory.

lavc currently chooses (B) in both cases on the grounds that leaking memory is preferable to crashing randomly.  There is some thought of switching to (A) for VAAPI >= 0.40, but given that we can't detect the version at runtime and the libraries consider everything to be compatible this seems dangerous.


For Mesa, I feel like I remember it having a correct < 0.40 implementation, but looking again it doesn't seem to and there isn't any obvious change (maybe I am just wrong).  Do you know if this changed?  Adding the driver quirk like the Intel driver is probably the right thing to do if it has always been like that, if it hasn't then we need more magic to be able to distinguish versions to prevent older ones from crashing.

Thanks,

- Mark


More information about the ffmpeg-user mailing list