[FFmpeg-user] 5% of audio samples missing when capturing audio on a mac

Sat Sep 12 11:47:43 EEST 2020

Hi,

I am attempting to capture a webcam with audio on a MacBook pro (Catalina
10.15.6), but I am having trouble with the audio stream. The video part is
fine, but audio seems to be missing about 5% of the expected samples. This
simple command illustrates the problem:

ffmpeg -v 9 -loglevel 99 -y -f avfoundation -i ":0" -t 10 out.wav

The console output is below.

I expect to capture 10s of audio from the built-in microphone. However, the
resulting audio is ~9.5s with audible clicking that I think indicate
missing samples. (Note that if -t 100 is used, ~95s is captured, so this is
not a warm up issue.) The output says 413184 samples decoded, but I would
expect closer to 441000 =44100 Hz * 10s. Indeed, if I add -async 1 option,
silence is inserted with messages "adding 4608 audio samples of silence"
throughout.

I found a bug on the bug tracker about audio issues related to capturing
the screen:
https://trac.ffmpeg.org/ticket/4513

Could someone try reproducing the issue or pointing me in the right
direction?

I have ffmpeg installed using homebrew with --HEAD option, matching the
latest commit on master . Recording using QuickTime or OBS works fine.

This is the log output of the above command:

$ ffmpeg -v 9 -loglevel 99 -y -f avfoundation -i ":0" -t 10 out.wav
ffmpeg version git-2020-09-12-1c09456 Copyright (c) 2000-2020 the FFmpeg
developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.17)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD-1c09456_1
--enable-shared --enable-pthreads --enable-version3 --enable-avresample
--cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls
--enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d
--enable-libmp3lame --enable-libopus --enable-librav1e
--enable-librubberband --enable-libsnappy --enable-libsrt
--enable-libtesseract --enable-libtheora --enable-libvidstab
--enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma
--enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass
--enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg
--enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox
--disable-libjack --disable-indev=jack
  libavutil      56. 58.100 / 56. 58.100
  libavcodec     58.105.100 / 58.105.100
  libavformat    58. 54.100 / 58. 54.100
  libavdevice    58. 11.101 / 58. 11.101
  libavfilter     7. 87.100 /  7. 87.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with
argument '9'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging
level) with argument '99'.
Reading option '-y' ... matched as option 'y' (overwrite output files) with
argument '1'.
Reading option '-f' ... matched as option 'f' (force format) with argument
'avfoundation'.
Reading option '-i' ... matched as input url with argument ':0'.
Reading option '-t' ... matched as option 't' (record or transcode
"duration" seconds of audio/video) with argument '10'.
Reading option 'out.wav' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 9.
Applying option y (overwrite output files) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url :0.
Applying option f (force format) with argument avfoundation.
Successfully parsed a group of options.
Opening an input file: :0.
[avfoundation @ 0x7ff5c080a600] audio device 'Built-in Microphone' opened
[avfoundation @ 0x7ff5c080a600] All info found
[avfoundation @ 0x7ff5c080a600] stream 0: start_time: 7242.35 duration:
NOPTS
[avfoundation @ 0x7ff5c080a600] format: start_time: 7242.35 duration: NOPTS
(estimate from bit rate) bitrate=2822 kb/s
Input #0, avfoundation, from ':0':
  Duration: N/A, start: 7242.345805, bitrate: 2822 kb/s
    Stream #0:0, 1, 1/1000000: Audio: pcm_f32le, 44100 Hz, stereo, flt,
2822 kb/s
Successfully opened the file.
Parsing a group of options: output url out.wav.
Applying option t (record or transcode "duration" seconds of audio/video)
with argument 10.
Successfully parsed a group of options.
Opening an output file: out.wav.
[file @ 0x7ff5bfe438c0] Setting default whitelist 'file,crypto,data'
Successfully opened the file.
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
cur_dts is invalid st:0 (0) [init:0 i_done:0 finish:0] (this is harmless if
it occurs once at the start per stream)
detected 4 logical cores
[graph_0_in_0_0 @ 0x7ff5bfe2ac80] Setting 'time_base' to value '1/44100'
[graph_0_in_0_0 @ 0x7ff5bfe2ac80] Setting 'sample_rate' to value '44100'
[graph_0_in_0_0 @ 0x7ff5bfe2ac80] Setting 'sample_fmt' to value 'flt'
[graph_0_in_0_0 @ 0x7ff5bfe2ac80] Setting 'channel_layout' to value '0x3'
[graph_0_in_0_0 @ 0x7ff5bfe2ac80] tb:1/44100 samplefmt:flt samplerate:44100
chlayout:0x3
[format_out_0_0 @ 0x7ff5bfe7c580] Setting 'sample_fmts' to value 's16'
[format_out_0_0 @ 0x7ff5bfe7c580] auto-inserting filter 'auto_resampler_0'
between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 0x7ff5bfe54e80] query_formats: 5 queried, 9 merged, 3
already done, 0 delayed
[auto_resampler_0 @ 0x7ff5bfe523c0] [SWR @ 0x7ff5bff69000] Using fltp
internally between filters
[auto_resampler_0 @ 0x7ff5bfe523c0] ch:2 chl:stereo fmt:flt r:44100Hz ->
ch:2 chl:stereo fmt:s16 r:44100Hz
Output #0, wav, to 'out.wav':
  Metadata:
    ISFT            : Lavf58.54.100
    Stream #0:0, 0, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001),
44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.105.100 pcm_s16le
[out_0_0 @ 0x7ff5bfe7a280] EOF on sink link out_0_0:default.=   1x
No more output streams to write to, finishing.
size=    1611kB time=00:00:10.00 bitrate=1319.5kbits/s speed=0.998x
video:0kB audio:1611kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 0.004729%
Input file #0 (:0):
  Input stream #0:0 (audio): 807 packets read (3305472 bytes); 807 frames
decoded (413184 samples);
  Total: 807 packets (3305472 bytes) demuxed
Output file #0 (out.wav):
  Output stream #0:0 (audio): 806 frames encoded (412328 samples); 806
packets muxed (1649312 bytes);
  Total: 806 packets (1649312 bytes) muxed
807 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x7ff5bfe39880] Statistics: 4 seeks, 9 writeouts

Norbert