[FFmpeg-user] current ffmpeg creates shortened audio stream when filter amix
Paul B Mahol
onemda at gmail.com
Sun Sep 29 11:38:33 EEST 2019
On 9/29/19, S Andreason <sandreas41 at gmail.com> wrote:
> I am getting a shortened audio stream when including the audio filters
> aresample and amix, which later makes it impossible to concat the clips,
> because the different stream lengths lose sync between audio and video,
> with errors:
> Invalid audio PTS
>
> First, here is the output from latest ffmpeg in debian package, which
> works correctly:
>
> $ ffmpeg-3.2.14-1~deb9u1 -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV
> -i Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a
> -filter_complex
> "[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'"
> -filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k
> -r 30 20190922_1532_ch5.1e-3.mov
> ffmpeg version 3.2.14-1~deb9u1 Copyright (c) 2000-2019 the FFmpeg developers
> built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
> configuration: --prefix=/usr --extra-version='1~deb9u1'
> --toolchain=hardened --libdir=/usr/lib/i386-linux-gnu
> --incdir=/usr/include/i386-linux-gnu --enable-gpl --disable-stripping
> --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa
> --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca
> --enable-libcdio --enable-libebur128 --enable-libflite
> --enable-libfontconfig --enable-libfreetype --enable-libfribidi
> --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libopenjpeg
> --enable-libopenmpt --enable-libopus --enable-libpulse
> --enable-librubberband --enable-libshine --enable-libsnappy
> --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora
> --enable-libtwolame --enable-libvorbis --enable-libvpx
> --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid
> --enable-libzmq --enable-libzvbi --enable-omx --enable-openal
> --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libiec61883
> --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264
> --enable-shared
> libavutil 55. 34.101 / 55. 34.101
> libavcodec 57. 64.101 / 57. 64.101
> libavformat 57. 56.101 / 57. 56.101
> libavdevice 57. 1.100 / 57. 1.100
> libavfilter 6. 65.100 / 6. 65.100
> libavresample 3. 1. 0 / 3. 1. 0
> libswscale 4. 2.100 / 4. 2.100
> libswresample 2. 3.100 / 2. 3.100
> libpostproc 54. 1.100 / 54. 1.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> '20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
> Metadata:
> major_brand : qt
> minor_version : 512
> compatible_brands: qt
> encoder : Lavf57.56.101
> Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
> Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 /
> 0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97
> tbr, 30k tbn, 60k tbc (default)
> Metadata:
> handler_name : DataHandler
> Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz,
> stereo, s16, 1536 kb/s (default)
> Metadata:
> handler_name : DataHandler
> Input #1, mov,mp4,m4a,3gp,3g2,mj2, from
> 'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
> Metadata:
> major_brand : M4A
> minor_version : 512
> compatible_brands: isomiso2
> encoder : Lavf57.56.101
> Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
> Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
> mono, fltp, 218 kb/s (default)
> Metadata:
> handler_name : SoundHandler
> No pixel format specified, yuvj420p for H.264 encoding chosen.
> Use -pix_fmt yuv420p for compatibility with outdated media players.
> [libx264 @ 0x170dc20] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
> AVX LZCNT BMI1 SlowPshufb
> [libx264 @ 0x170dc20] profile High, level 3.1
> [libx264 @ 0x170dc20] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC
> codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html -
> options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7
> psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1
> 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6
> lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0
> bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1
> b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250
> keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr
> mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4
> ip_ratio=1.40 aq=1:1.00
> Output #0, mov, to '20190922_1532_ch5.1e-3.mov':
> Metadata:
> major_brand : qt
> minor_version : 512
> compatible_brands: qt
> encoder : Lavf57.56.101
> Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661),
> yuvj420p(pc), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 tbn, 30 tbc
> (default)
> Metadata:
> encoder : Lavc57.64.101 libx264
> Side data:
> cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: -1
> Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 128 kb/s (default)
> Metadata:
> encoder : Lavc57.64.101 aac
> Stream mapping:
> Stream #0:0 (h264) -> crop (graph 0)
> Stream #0:1 (pcm_s16le) -> aresample (graph 1)
> Stream #1:0 (aac) -> amix:input1 (graph 1)
> drawtext (graph 0) -> Stream #0:0 (libx264)
> amix (graph 1) -> Stream #0:1 (aac)
> Press [q] to stop, [?] for help
> frame= 420 fps= 12 q=-1.0 Lsize= 8303kB time=00:00:14.01
> bitrate=4853.1kbits/s speed=0.417x
> video:8063kB audio:224kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 0.198401%
> [libx264 @ 0x170dc20] frame I:2 Avg QP:14.83 size:195688
> [libx264 @ 0x170dc20] frame P:106 Avg QP:19.65 size: 59553
> [libx264 @ 0x170dc20] frame B:312 Avg QP:25.61 size: 4972
> [libx264 @ 0x170dc20] consecutive B-frames: 1.0% 0.0% 0.0% 99.0%
> [libx264 @ 0x170dc20] mb I I16..4: 27.6% 29.0% 43.4%
> [libx264 @ 0x170dc20] mb P I16..4: 1.1% 1.3% 0.6% P16..4: 30.5%
> 31.5% 22.6% 0.0% 0.0% skip:12.4%
> [libx264 @ 0x170dc20] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 36.1%
> 7.7% 1.3% direct: 4.2% skip:50.7% L0:37.5% L1:38.6% BI:23.9%
> [libx264 @ 0x170dc20] final ratefactor: 18.99
> [libx264 @ 0x170dc20] 8x8 transform intra:37.9% inter:54.5%
> [libx264 @ 0x170dc20] coded y,uvDC,uvAC intra: 56.2% 64.1% 51.9% inter:
> 27.2% 19.7% 1.0%
> [libx264 @ 0x170dc20] i16 v,h,dc,p: 73% 9% 14% 4%
> [libx264 @ 0x170dc20] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 11% 40% 4% 6%
> 6% 6% 5% 10%
> [libx264 @ 0x170dc20] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 15% 6% 9%
> 9% 9% 8% 11%
> [libx264 @ 0x170dc20] i8c dc,h,v,p: 54% 25% 17% 5%
> [libx264 @ 0x170dc20] Weighted P-Frames: Y:9.4% UV:0.0%
> [libx264 @ 0x170dc20] ref P L0: 41.9% 11.1% 40.8% 6.0% 0.2%
> [libx264 @ 0x170dc20] ref B L0: 93.5% 5.9% 0.6%
> [libx264 @ 0x170dc20] ref B L1: 99.4% 0.6%
> [libx264 @ 0x170dc20] kb/s:4717.36
> [aac @ 0x170fac0] Qavg: 582.581
>
> Next ffprobe shows the video length:
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.mov':
> encoder : Lavf57.56.101
> Duration: 00:00:14.03, start: 0.000000, bitrate: 4849 kb/s
> Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661),
> yuvj420p(pc), 1024x768, 4717 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (defaul
> encoder : Lavc57.64.101 libx264
> Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
> stereo, fltp, 131 kb/s (default)
>
> And to get the ACTUAL audio length, I split the audio stream to it's own
> file.mpa using ffmpeg, then ffprobe:
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.m4a':
> encoder : Lavf58.33.100
> Duration: 00:00:14.03, start: 0.000000, bitrate: 133 kb/s
> Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
> stereo, fltp, 131 kb/s (default)
>
> Then I repeat the above with only the change to use ffmpeg current by git:
>
> $ ffmpeg -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV -i
> Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a -filter_complex
> "[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'"
> -filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k
> -r 30 20190922_1532_ch5.1e-g.mov
> ffmpeg version N-95129-g04858650b1 Copyright (c) 2000-2019 the FFmpeg
> developers
> built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
> configuration: --prefix=/usr/local --enable-gpl --enable-libmp3lame
> --enable-libvorbis --enable-libx264 --enable-libopenjpeg
> --enable-libfreetype --disable-doc --disable-htmlpages
> --disable-podpages --enable-shared --enable-libvpx
> --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib/i386-linux-gnu
> --enable-libass --enable-libtesseract
> libavutil 56. 35.100 / 56. 35.100
> libavcodec 58. 59.101 / 58. 59.101
> libavformat 58. 33.100 / 58. 33.100
> libavdevice 58. 9.100 / 58. 9.100
> libavfilter 7. 59.100 / 7. 59.100
> libswscale 5. 6.100 / 5. 6.100
> libswresample 3. 6.100 / 3. 6.100
> libpostproc 55. 6.100 / 55. 6.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> '20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
> Metadata:
> major_brand : qt
> minor_version : 512
> compatible_brands: qt
> encoder : Lavf57.56.101
> Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
> Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 /
> 0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97
> tbr, 30k tbn, 60k tbc (default)
> Metadata:
> handler_name : VideoHandler
> Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz,
> stereo, s16, 1536 kb/s (default)
> Metadata:
> handler_name : SoundHandler
> Input #1, mov,mp4,m4a,3gp,3g2,mj2, from
> 'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
> Metadata:
> major_brand : M4A
> minor_version : 512
> compatible_brands: isomiso2
> encoder : Lavf57.56.101
> Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
> Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz,
> mono, fltp, 218 kb/s (default)
> Metadata:
> handler_name : SoundHandler
> Stream mapping:
> Stream #0:0 (h264) -> crop (graph 0)
> Stream #0:1 (pcm_s16le) -> aresample (graph 1)
> Stream #1:0 (aac) -> amix:input1 (graph 1)
> drawtext (graph 0) -> Stream #0:0 (libx264)
> amix (graph 1) -> Stream #0:1 (aac)
> Press [q] to stop, [?] for help
> [libx264 @ 0x142f2c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
> AVX LZCNT BMI1 SlowPshufb
> [libx264 @ 0x142f2c0] profile High, level 3.1
> [libx264 @ 0x142f2c0] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC
> codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html -
> options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7
> psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1
> 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6
> lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0
> bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1
> b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250
> keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr
> mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4
> ip_ratio=1.40 aq=1:1.00
> Output #0, mov, to '20190922_1532_ch5.1e-g.mov':
> Metadata:
> major_brand : qt
> minor_version : 512
> compatible_brands: qt
> encoder : Lavf58.33.100
> Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661),
> yuvj420p(pc, progressive), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360
> tbn, 30 tbc (default)
> Metadata:
> encoder : Lavc58.59.101 libx264
> Side data:
> cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: N/A
> Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 128 kb/s (default)
> Metadata:
> encoder : Lavc58.59.101 aac
> frame= 420 fps= 14 q=-1.0 Lsize= 8270kB time=00:00:13.90
> bitrate=4873.7kbits/s speed=0.45x
> video:8061kB audio:193kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 0.185768%
> [libx264 @ 0x142f2c0] frame I:2 Avg QP:14.84 size:195655
> [libx264 @ 0x142f2c0] frame P:106 Avg QP:19.64 size: 59577
> [libx264 @ 0x142f2c0] frame B:312 Avg QP:25.62 size: 4960
> [libx264 @ 0x142f2c0] consecutive B-frames: 1.0% 0.0% 0.0% 99.0%
> [libx264 @ 0x142f2c0] mb I I16..4: 27.8% 28.6% 43.6%
> [libx264 @ 0x142f2c0] mb P I16..4: 1.2% 1.3% 0.6% P16..4: 30.5%
> 31.4% 22.6% 0.0% 0.0% skip:12.5%
> [libx264 @ 0x142f2c0] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 36.0%
> 7.7% 1.3% direct: 4.2% skip:50.8% L0:37.6% L1:38.7% BI:23.8%
> [libx264 @ 0x142f2c0] final ratefactor: 18.99
> [libx264 @ 0x142f2c0] 8x8 transform intra:36.9% inter:54.6%
> [libx264 @ 0x142f2c0] coded y,uvDC,uvAC intra: 56.3% 63.9% 51.8% inter:
> 27.2% 19.7% 1.0%
> [libx264 @ 0x142f2c0] i16 v,h,dc,p: 73% 9% 14% 4%
> [libx264 @ 0x142f2c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 13% 39% 4% 6%
> 6% 6% 5% 10%
> [libx264 @ 0x142f2c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 14% 6% 9%
> 9% 9% 8% 11%
> [libx264 @ 0x142f2c0] i8c dc,h,v,p: 54% 24% 17% 5%
> [libx264 @ 0x142f2c0] Weighted P-Frames: Y:9.4% UV:0.0%
> [libx264 @ 0x142f2c0] ref P L0: 41.5% 11.5% 40.8% 6.0% 0.2%
> [libx264 @ 0x142f2c0] ref B L0: 93.4% 6.0% 0.6%
> [libx264 @ 0x142f2c0] ref B L1: 99.4% 0.6%
> [libx264 @ 0x142f2c0] kb/s:4716.52
> [aac @ 0x142d800] Qavg: 297.740
>
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.mov':
> encoder : Lavf58.33.100
> Duration: 00:00:14.00, start: 0.000000, bitrate: 4838 kb/s
> Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc),
> 1024x768, 4716 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
> encoder : Lavc58.59.101 libx264
> Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 128 kb/s (default)
>
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.m4a':
> encoder : Lavf58.33.100
> Duration: 00:00:12.33, start: 0.000000, bitrate: 130 kb/s
> Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
> stereo, fltp, 128 kb/s (default)
>
> The audio is 1.70 seconds shorter, always. Different video input lengths
> and different audio lengths result in the same 1.70 seconds lost.
>
> If I don't have any voice input and audio filter then the output streams
> match length, since they are from the same input video.
>
> I've also tried first resampling the voice-over audio to 48000 and
> stereo first, then removing the aresample filter, leaving only the amix.
> Still bad audio.
> Since the next step would be to mix the audio in audacity and remux it
> back together, I'll stop testing now and see what you think.
There are numerous issues with your report. First how is this supposed
to work at all
if one use two filter-complex at once? Second amix gets only one input
in your command, and with no options given it accepts actually two
inputs.
So your commands should not work at all.
> Stewart
>
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-user
mailing list