[FFmpeg-user] current ffmpeg creates shortened audio stream when filter amix
S Andreason
sandreas41 at gmail.com
Sun Sep 29 08:28:09 EEST 2019
I am getting a shortened audio stream when including the audio filters
aresample and amix, which later makes it impossible to concat the clips,
because the different stream lengths lose sync between audio and video,
with errors:
Invalid audio PTS
First, here is the output from latest ffmpeg in debian package, which
works correctly:
$ ffmpeg-3.2.14-1~deb9u1 -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV
-i Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a
-filter_complex
"[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'"
-filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k
-r 30 20190922_1532_ch5.1e-3.mov
ffmpeg version 3.2.14-1~deb9u1 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
configuration: --prefix=/usr --extra-version='1~deb9u1'
--toolchain=hardened --libdir=/usr/lib/i386-linux-gnu
--incdir=/usr/include/i386-linux-gnu --enable-gpl --disable-stripping
--enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa
--enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca
--enable-libcdio --enable-libebur128 --enable-libflite
--enable-libfontconfig --enable-libfreetype --enable-libfribidi
--enable-libgme --enable-libgsm --enable-libmp3lame --enable-libopenjpeg
--enable-libopenmpt --enable-libopus --enable-libpulse
--enable-librubberband --enable-libshine --enable-libsnappy
--enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora
--enable-libtwolame --enable-libvorbis --enable-libvpx
--enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid
--enable-libzmq --enable-libzvbi --enable-omx --enable-openal
--enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libiec61883
--enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264
--enable-shared
libavutil 55. 34.101 / 55. 34.101
libavcodec 57. 64.101 / 57. 64.101
libavformat 57. 56.101 / 57. 56.101
libavdevice 57. 1.100 / 57. 1.100
libavfilter 6. 65.100 / 6. 65.100
libavresample 3. 1. 0 / 3. 1. 0
libswscale 4. 2.100 / 4. 2.100
libswresample 2. 3.100 / 2. 3.100
libpostproc 54. 1.100 / 54. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
Metadata:
major_brand : qt
minor_version : 512
compatible_brands: qt
encoder : Lavf57.56.101
Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 /
0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97
tbr, 30k tbn, 60k tbc (default)
Metadata:
handler_name : DataHandler
Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz,
stereo, s16, 1536 kb/s (default)
Metadata:
handler_name : DataHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from
'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
Metadata:
major_brand : M4A
minor_version : 512
compatible_brands: isomiso2
encoder : Lavf57.56.101
Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
mono, fltp, 218 kb/s (default)
Metadata:
handler_name : SoundHandler
No pixel format specified, yuvj420p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[libx264 @ 0x170dc20] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x170dc20] profile High, level 3.1
[libx264 @ 0x170dc20] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC
codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html -
options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7
psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1
8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6
lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0
bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1
b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250
keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr
mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4
ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-3.mov':
Metadata:
major_brand : qt
minor_version : 512
compatible_brands: qt
encoder : Lavf57.56.101
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661),
yuvj420p(pc), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 tbn, 30 tbc
(default)
Metadata:
encoder : Lavc57.64.101 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 128 kb/s (default)
Metadata:
encoder : Lavc57.64.101 aac
Stream mapping:
Stream #0:0 (h264) -> crop (graph 0)
Stream #0:1 (pcm_s16le) -> aresample (graph 1)
Stream #1:0 (aac) -> amix:input1 (graph 1)
drawtext (graph 0) -> Stream #0:0 (libx264)
amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
frame= 420 fps= 12 q=-1.0 Lsize= 8303kB time=00:00:14.01
bitrate=4853.1kbits/s speed=0.417x
video:8063kB audio:224kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 0.198401%
[libx264 @ 0x170dc20] frame I:2 Avg QP:14.83 size:195688
[libx264 @ 0x170dc20] frame P:106 Avg QP:19.65 size: 59553
[libx264 @ 0x170dc20] frame B:312 Avg QP:25.61 size: 4972
[libx264 @ 0x170dc20] consecutive B-frames: 1.0% 0.0% 0.0% 99.0%
[libx264 @ 0x170dc20] mb I I16..4: 27.6% 29.0% 43.4%
[libx264 @ 0x170dc20] mb P I16..4: 1.1% 1.3% 0.6% P16..4: 30.5%
31.5% 22.6% 0.0% 0.0% skip:12.4%
[libx264 @ 0x170dc20] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 36.1%
7.7% 1.3% direct: 4.2% skip:50.7% L0:37.5% L1:38.6% BI:23.9%
[libx264 @ 0x170dc20] final ratefactor: 18.99
[libx264 @ 0x170dc20] 8x8 transform intra:37.9% inter:54.5%
[libx264 @ 0x170dc20] coded y,uvDC,uvAC intra: 56.2% 64.1% 51.9% inter:
27.2% 19.7% 1.0%
[libx264 @ 0x170dc20] i16 v,h,dc,p: 73% 9% 14% 4%
[libx264 @ 0x170dc20] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 11% 40% 4% 6%
6% 6% 5% 10%
[libx264 @ 0x170dc20] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 15% 6% 9%
9% 9% 8% 11%
[libx264 @ 0x170dc20] i8c dc,h,v,p: 54% 25% 17% 5%
[libx264 @ 0x170dc20] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x170dc20] ref P L0: 41.9% 11.1% 40.8% 6.0% 0.2%
[libx264 @ 0x170dc20] ref B L0: 93.5% 5.9% 0.6%
[libx264 @ 0x170dc20] ref B L1: 99.4% 0.6%
[libx264 @ 0x170dc20] kb/s:4717.36
[aac @ 0x170fac0] Qavg: 582.581
Next ffprobe shows the video length:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.mov':
encoder : Lavf57.56.101
Duration: 00:00:14.03, start: 0.000000, bitrate: 4849 kb/s
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661),
yuvj420p(pc), 1024x768, 4717 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (defaul
encoder : Lavc57.64.101 libx264
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 131 kb/s (default)
And to get the ACTUAL audio length, I split the audio stream to it's own
file.mpa using ffmpeg, then ffprobe:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.m4a':
encoder : Lavf58.33.100
Duration: 00:00:14.03, start: 0.000000, bitrate: 133 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 131 kb/s (default)
Then I repeat the above with only the change to use ffmpeg current by git:
$ ffmpeg -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV -i
Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a -filter_complex
"[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'"
-filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k
-r 30 20190922_1532_ch5.1e-g.mov
ffmpeg version N-95129-g04858650b1 Copyright (c) 2000-2019 the FFmpeg
developers
built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
configuration: --prefix=/usr/local --enable-gpl --enable-libmp3lame
--enable-libvorbis --enable-libx264 --enable-libopenjpeg
--enable-libfreetype --disable-doc --disable-htmlpages
--disable-podpages --enable-shared --enable-libvpx
--extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib/i386-linux-gnu
--enable-libass --enable-libtesseract
libavutil 56. 35.100 / 56. 35.100
libavcodec 58. 59.101 / 58. 59.101
libavformat 58. 33.100 / 58. 33.100
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 59.100 / 7. 59.100
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
Metadata:
major_brand : qt
minor_version : 512
compatible_brands: qt
encoder : Lavf57.56.101
Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 /
0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97
tbr, 30k tbn, 60k tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz,
stereo, s16, 1536 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from
'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
Metadata:
major_brand : M4A
minor_version : 512
compatible_brands: isomiso2
encoder : Lavf57.56.101
Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
mono, fltp, 218 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:0 (h264) -> crop (graph 0)
Stream #0:1 (pcm_s16le) -> aresample (graph 1)
Stream #1:0 (aac) -> amix:input1 (graph 1)
drawtext (graph 0) -> Stream #0:0 (libx264)
amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[libx264 @ 0x142f2c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x142f2c0] profile High, level 3.1
[libx264 @ 0x142f2c0] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC
codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html -
options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7
psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1
8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6
lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0
bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1
b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250
keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr
mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4
ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-g.mov':
Metadata:
major_brand : qt
minor_version : 512
compatible_brands: qt
encoder : Lavf58.33.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661),
yuvj420p(pc, progressive), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360
tbn, 30 tbc (default)
Metadata:
encoder : Lavc58.59.101 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 128 kb/s (default)
Metadata:
encoder : Lavc58.59.101 aac
frame= 420 fps= 14 q=-1.0 Lsize= 8270kB time=00:00:13.90
bitrate=4873.7kbits/s speed=0.45x
video:8061kB audio:193kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 0.185768%
[libx264 @ 0x142f2c0] frame I:2 Avg QP:14.84 size:195655
[libx264 @ 0x142f2c0] frame P:106 Avg QP:19.64 size: 59577
[libx264 @ 0x142f2c0] frame B:312 Avg QP:25.62 size: 4960
[libx264 @ 0x142f2c0] consecutive B-frames: 1.0% 0.0% 0.0% 99.0%
[libx264 @ 0x142f2c0] mb I I16..4: 27.8% 28.6% 43.6%
[libx264 @ 0x142f2c0] mb P I16..4: 1.2% 1.3% 0.6% P16..4: 30.5%
31.4% 22.6% 0.0% 0.0% skip:12.5%
[libx264 @ 0x142f2c0] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 36.0%
7.7% 1.3% direct: 4.2% skip:50.8% L0:37.6% L1:38.7% BI:23.8%
[libx264 @ 0x142f2c0] final ratefactor: 18.99
[libx264 @ 0x142f2c0] 8x8 transform intra:36.9% inter:54.6%
[libx264 @ 0x142f2c0] coded y,uvDC,uvAC intra: 56.3% 63.9% 51.8% inter:
27.2% 19.7% 1.0%
[libx264 @ 0x142f2c0] i16 v,h,dc,p: 73% 9% 14% 4%
[libx264 @ 0x142f2c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 13% 39% 4% 6%
6% 6% 5% 10%
[libx264 @ 0x142f2c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 14% 6% 9%
9% 9% 8% 11%
[libx264 @ 0x142f2c0] i8c dc,h,v,p: 54% 24% 17% 5%
[libx264 @ 0x142f2c0] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x142f2c0] ref P L0: 41.5% 11.5% 40.8% 6.0% 0.2%
[libx264 @ 0x142f2c0] ref B L0: 93.4% 6.0% 0.6%
[libx264 @ 0x142f2c0] ref B L1: 99.4% 0.6%
[libx264 @ 0x142f2c0] kb/s:4716.52
[aac @ 0x142d800] Qavg: 297.740
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.mov':
encoder : Lavf58.33.100
Duration: 00:00:14.00, start: 0.000000, bitrate: 4838 kb/s
Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc),
1024x768, 4716 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
encoder : Lavc58.59.101 libx264
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 128 kb/s (default)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.m4a':
encoder : Lavf58.33.100
Duration: 00:00:12.33, start: 0.000000, bitrate: 130 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
The audio is 1.70 seconds shorter, always. Different video input lengths
and different audio lengths result in the same 1.70 seconds lost.
If I don't have any voice input and audio filter then the output streams
match length, since they are from the same input video.
I've also tried first resampling the voice-over audio to 48000 and
stereo first, then removing the aresample filter, leaving only the amix.
Still bad audio.
Since the next step would be to mix the audio in audacity and remux it
back together, I'll stop testing now and see what you think.
Stewart
More information about the ffmpeg-user
mailing list