[FFmpeg-user] Combining two audio feeds from live rtsp sources into stereo is delayed

Mark Umansky umansky at torcrobotics.com
Tue Oct 28 22:15:20 CET 2014


Hello,

I have two live video feeds with audio coming in over rtsp. I would like to take each audio feed that also comes in, which is in mono, and combine them both into a true stereo feed so that one feed plays in one ear and one feed plays in the other simultaneously.

When I use amerge like described in: https://trac.ffmpeg.org/wiki/AudioChannelManipulation#a2monostereo, I get a delay between the two sides. It does not occur if I use two mono files as inputs instead of two live feeds. If I switch the order in of 0 and 1 in -filter_complex "[0:a][1:a]amerge[aout]" it changes which one gets delayed, but the delay is still there. If I open the two video feeds using VLC, both feeds come in at the same time, so there's no delay on the feed side.

Command used:
ffmpeg -y -i "rtsp://172.24.0.31:554/axis-media/media.amp?camera=1" -i "rtsp://172.24.0.32:554/axis-media/media.amp?camera=1" -filter_complex "[0:a][1:a]amerge[aout]" -map "[aout]" output.m4a -report

Thanks,
Mark

Full log output:

ffmpeg started on 2014-10-28 at 17:05:14
Report written to "ffmpeg-20141028-170514.log"
Command line:
ffmpeg -y -i "rtsp://172.24.0.31:554/axis-media/media.amp?camera=1" -i "rtsp://172.24.0.32:554/axis-media/media.amp?camera=1" -filter_complex "[0:a][1:a]amerge[aout]" -map "[aout]" output.m4a -report
ffmpeg version N-66521-g3edb9aa Copyright (c) 2000-2014 the FFmpeg developers
  built on Sep 27 2014 22:10:25 with gcc 4.9.1 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-zlib
  libavutil      54.  7.101 / 54.  7.101
  libavcodec     56.  1.101 / 56.  1.101
  libavformat    56.  7.101 / 56.  7.101
  libavdevice    56.  1.100 / 56.  1.100
  libavfilter     5.  1.102 /  5.  1.102
  libswscale      3.  1.100 /  3.  1.100
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  1.100 / 53.  1.100
Splitting the commandline.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-i' ... matched as input file with argument 'rtsp://172.24.0.31:554/axis-media/media.amp?camera=1'.
Reading option '-i' ... matched as input file with argument 'rtsp://172.24.0.32:554/axis-media/media.amp?camera=1'.
Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument '[0:a][1:a]amerge[aout]'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '[aout]'.
Reading option 'output.m4a' ... matched as output file.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option y (overwrite output files) with argument 1.
Applying option filter_complex (create a complex filtergraph) with argument [0:a][1:a]amerge[aout].
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input file rtsp://172.24.0.31:554/axis-media/media.amp?camera=1.
Successfully parsed a group of options.
Opening an input file: rtsp://172.24.0.31:554/axis-media/media.amp?camera=1.
[rtsp @ 0000000000331820] SDP:
v=0
o=- 1414515911610422 1414515911610422 IN IP4 172.24.0.31
s=Media Presentation
e=NONE
b=AS:50032
t=0 0
a=control:rtsp://172.24.0.31:554/axis-media/media.amp?camera=1
a=range:npt=0.000000-
m=video 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:50000
a=framerate:25.0
a=transform:0.916667,0.000000,0.000000;0.000000,1.000000,0.000000;0.000000,0.000000,1.000000
a=control:rtsp://172.24.0.31:554/axis-media/media.amp/trackID=1?camera=1
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1; profile-level-id=4D0029; sprop-parameter-sets=Z00AKeKQFgJNgScFAQXh4kRU,aO48gA==
m=audio 0 RTP/AVP 97
c=IN IP4 0.0.0.0
b=AS:32
a=control:rtsp://172.24.0.31:554/axis-media/media.amp/trackID=2?camera=1
a=rtpmap:97 mpeg4-generic/16000/1
a=fmtp:97 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1408; sizeLength=13; indexLength=3; indexDeltaLength=3; profile=1; bitrate=32000;

[rtsp @ 0000000000331820] video codec set to: h264
[rtsp @ 0000000000331820] RTP Packetization Mode: 1
[rtsp @ 0000000000331820] RTP Profile IDC: 4d Profile IOP: 0 Level: 29
[rtsp @ 0000000000331820] Extradata set to 00000000003327a0 (size: 30)!
[rtsp @ 0000000000331820] audio codec set to: aac
[rtsp @ 0000000000331820] audio samplerate set to: 16000
[rtsp @ 0000000000331820] audio channels set to: 1
[udp @ 00000000003526e0] end receive buffer size reported is 65536
[udp @ 00000000003527a0] end receive buffer size reported is 65536
[udp @ 0000000002eaf2a0] end receive buffer size reported is 65536
[udp @ 0000000002eaf360] end receive buffer size reported is 65536
[rtsp @ 0000000000331820] hello state=0
[rtsp @ 0000000000331820] All info found
rfps: 24.416667 0.018100
rfps: 24.500000 0.013298
rfps: 24.583333 0.009234
rfps: 24.583333 0.009234
rfps: 24.666667 0.005910
rfps: 24.666667 0.005910
rfps: 24.750000 0.003324
rfps: 24.750000 0.003324
rfps: 24.833333 0.001477
rfps: 24.833333 0.001477
rfps: 24.916667 0.000369
rfps: 24.916667 0.000369
rfps: 25.000000 0.000000
rfps: 25.000000 0.000000
rfps: 25.083333 0.000370
rfps: 25.083333 0.000370
rfps: 25.166667 0.001479
rfps: 25.166667 0.001479
rfps: 25.250000 0.003326
rfps: 25.250000 0.003326
rfps: 25.333333 0.005913
rfps: 25.333333 0.005913
rfps: 25.416667 0.009238
rfps: 25.416667 0.009238
rfps: 25.500000 0.013302
rfps: 25.583333 0.018105
rfps: 49.416667 0.018098
rfps: 49.500000 0.013296
rfps: 49.583333 0.009233
rfps: 49.583333 0.009233
rfps: 49.666667 0.005908
rfps: 49.666667 0.005908
rfps: 49.750000 0.003323
rfps: 49.750000 0.003323
rfps: 49.833333 0.001476
rfps: 49.833333 0.001476
rfps: 49.916667 0.000369
rfps: 49.916667 0.000369
rfps: 50.000000 0.000000
rfps: 50.083333 0.000370
rfps: 50.083333 0.000370
rfps: 50.166667 0.001479
rfps: 50.166667 0.001479
rfps: 50.250000 0.003327
rfps: 50.250000 0.003327
rfps: 50.333333 0.005914
rfps: 50.333333 0.005914
rfps: 50.416667 0.009240
rfps: 50.416667 0.009240
rfps: 50.500000 0.013305
rfps: 50.583333 0.018108
[rtsp @ 0000000000331820] Setting avg frame rate based on r frame rate
Input #0, rtsp, from 'rtsp://172.24.0.31:554/axis-media/media.amp?camera=1':
  Metadata:
    title           : Media Presentation
  Duration: N/A, start: 0.040022, bitrate: N/A
    Stream #0:0, 28, 1/90000: Video: h264 (Main), yuvj420p(pc, bt470bg), 704x576 [SAR 12:11 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 180k tbc
    Stream #0:1, 17, 1/16000: Audio: aac, 16000 Hz, mono, fltp
Successfully opened the file.
Parsing a group of options: input file rtsp://172.24.0.32:554/axis-media/media.amp?camera=1.
Successfully parsed a group of options.
Opening an input file: rtsp://172.24.0.32:554/axis-media/media.amp?camera=1.
[rtsp @ 0000000005c74120] SDP:
v=0
o=- 1414515912722348 1414515912722348 IN IP4 172.24.0.32
s=Media Presentation
e=NONE
b=AS:50032
t=0 0
a=control:rtsp://172.24.0.32:554/axis-media/media.amp?camera=1
a=range:npt=0.000000-
m=video 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:50000
a=framerate:30.0
a=transform:1.000000,0.000000,0.000000;0.000000,0.909091,0.000000;0.000000,0.000000,1.000000
a=control:rtsp://172.24.0.32:554/axis-media/media.amp/trackID=1?camera=1
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1; profile-level-id=4D0029; sprop-parameter-sets=Z00AKeKQFge2BqwYBBuHiRFQ,aO48gA==
m=audio 0 RTP/AVP 97
c=IN IP4 0.0.0.0
b=AS:32
a=control:rtsp://172.24.0.32:554/axis-media/media.amp/trackID=2?camera=1
a=rtpmap:97 mpeg4-generic/16000/1
a=fmtp:97 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1408; sizeLength=13; indexLength=3; indexDeltaLength=3; profile=1; bitrate=32000;

[rtsp @ 0000000005c74120] video codec set to: h264
[rtsp @ 0000000005c74120] RTP Packetization Mode: 1
[rtsp @ 0000000005c74120] RTP Profile IDC: 4d Profile IOP: 0 Level: 29
[rtsp @ 0000000005c74120] Extradata set to 00000000003326c0 (size: 30)!
[rtsp @ 0000000005c74120] audio codec set to: aac
[rtsp @ 0000000005c74120] audio samplerate set to: 16000
[rtsp @ 0000000005c74120] audio channels set to: 1
[udp @ 0000000005e59ec0] end receive buffer size reported is 65536
[udp @ 0000000005c24260] end receive buffer size reported is 65536
[udp @ 0000000005e55b80] end receive buffer size reported is 65536
[udp @ 0000000002eafa00] end receive buffer size reported is 65536
[rtsp @ 0000000005c74120] hello state=0
[rtsp @ 0000000005c74120] All info found
rfps: 29.250000 0.019157
rfps: 29.333333 0.014976
rfps: 29.416667 0.011309
rfps: 29.416667 0.011309
rfps: 29.500000 0.008156
rfps: 29.500000 0.008156
rfps: 29.583333 0.005517
rfps: 29.583333 0.005517
rfps: 29.666667 0.003392
rfps: 29.666667 0.003392
rfps: 29.750000 0.001781
rfps: 29.750000 0.001781
rfps: 29.833333 0.000685
rfps: 29.833333 0.000685
rfps: 29.916667 0.000103
rfps: 29.916667 0.000103
rfps: 30.000000 0.000035
rfps: 30.000000 0.000035
rfps: 30.083333 0.000481
rfps: 30.083333 0.000481
rfps: 30.166667 0.001441
rfps: 30.166667 0.001441
rfps: 30.250000 0.002916
rfps: 30.250000 0.002916
rfps: 30.333333 0.004904
rfps: 30.333333 0.004904
rfps: 30.416667 0.007407
rfps: 30.416667 0.007407
rfps: 30.500000 0.010424
rfps: 30.500000 0.010424
rfps: 30.583333 0.013955
rfps: 30.666667 0.018000
rfps: 59.250000 0.017560
rfps: 59.333333 0.013568
rfps: 59.416667 0.010090
rfps: 59.416667 0.010090
rfps: 59.500000 0.007126
rfps: 59.500000 0.007126
rfps: 59.583333 0.004676
rfps: 59.583333 0.004676
rfps: 59.666667 0.002740
rfps: 59.666667 0.002740
rfps: 59.750000 0.001319
rfps: 59.750000 0.001319
rfps: 59.833333 0.000411
rfps: 59.833333 0.000411
rfps: 59.916667 0.000018
rfps: 59.916667 0.000018
rfps: 60.000000 0.000139
rfps: 60.000000 0.000139
rfps: 29.970030 0.000000
rfps: 59.940060 0.000000
[rtsp @ 0000000005c74120] Setting avg frame rate based on r frame rate
Input #1, rtsp, from 'rtsp://172.24.0.32:554/axis-media/media.amp?camera=1':
  Metadata:
    title           : Media Presentation
  Duration: N/A, start: 0.033356, bitrate: N/A
    Stream #1:0, 28, 1/90000: Video: h264 (Main), yuvj420p(pc, smpte170m), 704x480 [SAR 10:11 DAR 4:3], 29.97 fps, 29.97 tbr, 90k tbn, 180k tbc
    Stream #1:1, 14, 1/16000: Audio: aac, 16000 Hz, mono, fltp
Successfully opened the file.
Parsing a group of options: output file output.m4a.
Applying option map (set input stream mapping) with argument [aout].
Successfully parsed a group of options.
Opening an output file: output.m4a.
detected 8 logical cores
[graph 0 input from stream 0:1 @ 0000000005d7ed00] Setting 'time_base' to value '1/16000'
[graph 0 input from stream 0:1 @ 0000000005d7ed00] Setting 'sample_rate' to value '16000'
[graph 0 input from stream 0:1 @ 0000000005d7ed00] Setting 'sample_fmt' to value 'fltp'
[graph 0 input from stream 0:1 @ 0000000005d7ed00] Setting 'channel_layout' to value '0x4'
[graph 0 input from stream 0:1 @ 0000000005d7ed00] tb:1/16000 samplefmt:fltp samplerate:16000 chlayout:0x4
[graph 0 input from stream 1:1 @ 0000000005d7eee0] Setting 'time_base' to value '1/16000'
[graph 0 input from stream 1:1 @ 0000000005d7eee0] Setting 'sample_rate' to value '16000'
[graph 0 input from stream 1:1 @ 0000000005d7eee0] Setting 'sample_fmt' to value 'fltp'
[graph 0 input from stream 1:1 @ 0000000005d7eee0] Setting 'channel_layout' to value '0x4'
[graph 0 input from stream 1:1 @ 0000000005d7eee0] tb:1/16000 samplefmt:fltp samplerate:16000 chlayout:0x4
[audio format for output stream 0:0 @ 0000000005e46120] Setting 'sample_fmts' to value 's16'
[audio format for output stream 0:0 @ 0000000005e46120] Setting 'sample_rates' to value '96000|88200|64000|48000|44100|32000|24000|22050|16000|12000|11025|8000|7350'
Successfully opened the file.
[Parsed_amerge_0 @ 0000000005b99f20] No channel layout for input 1
[AVFilterGraph @ 00000000003399e0] query_formats: 4 queried, 3 merged, 0 already done, 9 delayed
[AVFilterGraph @ 00000000003399e0] query_formats not finished
[Parsed_amerge_0 @ 0000000005b99f20] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[Parsed_amerge_0 @ 0000000005b99f20] auto-inserting filter 'auto-inserted resampler 0' between the filter 'graph 0 input from stream 0:1' and the filter 'Parsed_amerge_0'
[Parsed_amerge_0 @ 0000000005b99f20] auto-inserting filter 'auto-inserted resampler 1' between the filter 'graph 0 input from stream 1:1' and the filter 'Parsed_amerge_0'
[AVFilterGraph @ 00000000003399e0] query_formats: 1 queried, 3 merged, 9 already done, 0 delayed
[auto-inserted resampler 0 @ 0000000002ea8920] ch:1 chl:mono fmt:fltp r:16000Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
[auto-inserted resampler 1 @ 00000000003315c0] ch:1 chl:mono fmt:fltp r:16000Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
[Parsed_amerge_0 @ 0000000005b99f20] in0:mono + in1:mono -> out:stereo
Output #0, ipod, to 'output.m4a':
  Metadata:
    title           : Media Presentation
    encoder         : Lavf56.7.101
    Stream #0:0, 0, 1/16000: Audio: aac (libvo_aacenc) (mp4a / 0x6134706D), 16000 Hz, stereo, s16, 128 kb/s (default)
    Metadata:
      encoder         : Lavc56.1.101 libvo_aacenc
Stream mapping:
  Stream #0:1 (aac) -> amerge:in0
  Stream #1:1 (aac) -> amerge:in1
  amerge -> Stream #0:0 (libvo_aacenc)
Press [q] to stop, [?] for help
[NULL @ 0000000000332220] RTP: missed 21 packets
size=      22kB time=00:00:01.33 bitrate= 135.1kbits/s
size=      30kB time=00:00:01.84 bitrate= 133.1kbits/s
size=      38kB time=00:00:02.36 bitrate= 132.1kbits/s
size=      45kB time=00:00:02.80 bitrate= 131.4kbits/s
size=      53kB time=00:00:03.32 bitrate= 130.9kbits/s
size=      61kB time=00:00:03.83 bitrate= 130.5kbits/s
size=      69kB time=00:00:04.34 bitrate= 130.2kbits/s
size=      77kB time=00:00:04.85 bitrate= 130.0kbits/s
size=      85kB time=00:00:05.36 bitrate= 129.8kbits/s
size=      93kB time=00:00:05.88 bitrate= 129.6kbits/s
size=     101kB time=00:00:06.39 bitrate= 129.5kbits/s
[libvo_aacenc @ 0000000005e45c00] Trying to remove 448 more samples than there are in the queue
size=     106kB time=00:00:06.64 bitrate= 130.9kbits/s
video:0kB audio:105kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.204427%
Input file #0 (rtsp://172.24.0.31:554/axis-media/media.amp?camera=1):
  Input stream #0:0 (video): 28 packets read (86703 bytes);
  Input stream #0:1 (audio): 104 packets read (25941 bytes); 104 frames decoded (106496 samples);
  Total: 132 packets (112644 bytes) demuxed
Input file #1 (rtsp://172.24.0.32:554/axis-media/media.amp?camera=1):
  Input stream #1:0 (video): 28 packets read (40018 bytes);
  Input stream #1:1 (audio): 103 packets read (25682 bytes); 103 frames decoded (105472 samples);
  Total: 131 packets (65700 bytes) demuxed
Output file #0 (output.m4a):
  Output stream #0:0 (audio): 103 frames encoded (105472 samples); 105 packets muxed (107520 bytes);
  Total: 105 packets (107520 bytes) muxed
207 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0000000005b97300] Statistics: 30 seeks, 128 writeouts




More information about the ffmpeg-user mailing list