[FFmpeg-user] IP camera recording via RTSP: audio/video desync (dropped frames?)
Vladimir Mishonov
me at player701.ru
Tue Jul 12 23:39:21 EEST 2022
Hello everyone.
I've been having some problems with recording video from an IP camera
that transmits a live feed over RTSP. I've searched a lot for potential
solutions, but none helped so far. I've been going crazy trying to fix
this for days, so now I'm asking here as a last resort.
This is the command-line I'm using to record the video. Is is supposed
to continously record the camera feed, cutting it up in 10 minute-long
segments:
> ffmpeg -nostdin -flags low_delay -fflags
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -c:v copy -map 0:a -c:a copy -f segment -strftime 1
> -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600
> "%Y-%m-%dT%H-%M-%S.mkv"
FFMPEG outputs no warnings while running, but the recorded segments
exhibit the following undesired issues:
1) Significant video lag gets accumulated over time. E.g. the first
frame of the segment recorded at 12:00:00 may show the time 11:59:30 on
the camera clock. The delay is only a few seconds at first but gets more
and more noticeable with each recording. Note that the camera clock
itself is working correctly - restarting the recording process resets
the delay.
2) As a result of #1, audio starts skipping ahead of video over time,
and just like with the previous issue, the delay gets progressively
worse, correlating linearly with the video lag. E.g. if I knock on a
solid surface with my finger in view of the camera, the sound will come
much earlier than the video. This becomes easily noticeable after only
some 10-15 minutes of continuous recording during day-time. The delay
between audio and video is about the same as the one between the
starting time of the segment and the clock time visible on the first
frame.
Note that neither the video nor the audio is being transcoded, so this
cannot be a result of CPU overload on my end. I do not see any CPU usage
spikes while recording.
FFMPEG input and output parameters are as follows (according to its own
output):
> Input #0, rtsp, from 'rtsp://login@password@ip.ad.dre.ss:554/url':
> Metadata:
> title : RTSP Server
> Duration: N/A, start: 0.000000, bitrate: N/A
> Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive),
> 2560x1440, 25 fps, 25 tbr, 90k tbn
> Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
> Output #0, segment, to '%Y-%m-%dT%H-%M-%S.mkv':
> Metadata:
> title : RTSP Server
> encoder : Lavf59.16.100
> Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive),
> 2560x1440, q=2-31, 25 fps, 25 tbr, 1k tbn
> Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
> Stream mapping:
> Stream #0:0 -> #0:0 (copy)
> Stream #0:1 -> #0:1 (copy)
Here's what I've been able to find out about this problem so far:
=== 0. The issue only manifests with this specific camera ===
I have multiple IP cameras at home, and the problematic one is a
different make and model than the rest, so it must be something about
how it encodes and streams the video/audio. Whatever the cause is, I
know that all my cameras stream video over RTSP at 25 FPS and audio at
8000 Hz - and recording with the exact same command-line works perfectly
with all of them except for this one. Note that this is also the only
camera where "-use_wallclock_as_timestamps 1" causes the video to
stutter (see below).
The camera itself is a cheap Chinese IP camera. It doesn't have a
specific brand on it, but here's what it says in the web UI: "Device
Model: F8/IPG-9280PGS-AI". Not sure if this will help, but mentioning it
just in case.
Since recording from all other cameras works correctly, I assume this is
not a bug in FFMPEG itself. I've tested with two different versions
(5.0.1 on Windows and 4.4.2 on FreeBSD), and they both exhibit the
problem with only this specific camera.
=== 1. "-use_wallclock_as_timestamps 1" works, but is undesirable ===
If "-use_wallclock_as_timestamps" 1 is added to the command line before
the input, both issues are resolved, HOWEVER, this results in the video
constantly stuttering (it looks as if it skips ahead slightly with every
keyframe). Therefore, this option cannot be used as a solution. In
addition to that, FFMPEG starts complaining about non-monotonous DTS in
output stream 0:0 (but all recordings still seem to play back normally,
except for the stuttering).
=== 2. The video lag is related to having audio input ===
If there is no audio input, then there is no video lag either - the
first frame of each segment (give or take a keyframe interval) is always
on time according to the camera clock.
> # this does not have video lag accumulating over time
> ffmpeg -nostdin -flags low_delay -fflags
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -c:v copy -f segment -strftime 1 -reset_timestamps 1
> -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"
However, removing audio is not a viable solution either, since I do want
to have audio recorded along with the video.
=== 3. The actual FPS is sometimes less than the one reported by RTSP
===
The RTSP streaming source reports a constant frame rate of 25 FPS, but
I've inspected the recordings and found that at least one segment had
only 14900 frames, so its actual frame rate was 14900/600 = 24.8(3) FPS.
But even though some frames are missing and the file is reported to have
a framerate of 25 FPS and a length of 10 minutes, it does not seem to
have any video playback issues at all. This is a significant difference
compared to recordings from my other cameras: I tried picking some at
random, and they all reported 15000 frames exactly.
Some other segments recorded during my later experiments had even less
frames, e.g. one had only about 12000, and even though it still had a
10-minute runtime, the camera clock only counted about 9 minutes and 50
seconds from start to end, so I guess it played back a bit slower than
in realtime (but I could not perceive any difference).
What's more interesting is that all these frame count discrepancies
happened only during day-time. It appears that every segment recorded
during night-time, when the camera is in IR black-and-white mode, has
exactly 15000 frames, meaning a "real" 25 FPS, and the video lag does
not increase at that time (and nor does the audio desync). Therefore, I
can conclude that the issue has something to do with the camera not
being able to encode at a constant 25 FPS at all times. I assume it is
possible that an audio desync will happen when frames are dropped.
=== 4. Adding "-vsync cfr" doesn't help - need to re-encode? ===
This seemed like the way to go, according to the documentation. If the
camera sometimes cannot produce frames at its reported rate, then we
should duplicate them to bring the frame count up to the expected value
to match the reported FPS. Then the discrepancy will be gone, and the
sync issues along with it. Right?
> cfr (1)
> Frames will be duplicated and dropped to achieve exactly the
> requested constant frame rate.
Unfortunately, this didn't work as I'd hoped: some segments still had
less frames than expected, and the lag was still happening as a result.
> # This should work but doesn't
> ffmpeg -nostdin -flags low_delay -fflags
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -vsync cfr -c:v copy -map 0:a -c:a copy -f segment -strftime 1
> -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600
> "%Y-%m-%dT%H-%M-%S.mkv"
Moving "-vsync cfr" to before input didn't appear to change anything.
(The documentation says it's global, so I assume its position is
irrelevant.)
I guess that it's possible that -vsync simply does not work when copying
streams, since FFMPEG might need to decode the video to be able to
duplicate frames. But it didn't produce an error message, so I thought
it would work. Note that transcoding a 1440p video stream 24/7 is
completely out of the question for me (the recording server simply does
not have that much CPU power).
=== 5. Sync audio instead of video? ===
If I cannot drop video frames, perhaps I can adjust the audio track to
match the video instead? Like I said before, I cannot transcode the
video... but transcoding audio doesn't take nearly as much processing
power, so it wouldn't be too big of a problem for me. Furthermore, it is
even required if I want to broadcast the feed to YouTube (and in this
particular case, I do) because it dictates that the audio must be
encoded with either AAC or MP3. For live streaming, I use the AAC
encoder with the following options:
> -c:a aac -ar 48000 -ac 2 -b:a 128k
With these parameters, when recording from other cameras there was
sometimes a slight delay between the audio and video streams (not
increasing over time). I managed to remove it by adding:
> -af aresample=async=1
Naturally, this didn't help with the audio sync issues for the
problematic camera. I also tried increasing the async=value, up to
10000, to no avail. The documentation suggests it might be helpful in
syncing the audio track, but so far my experiments with it have been
unsuccessful.
=== 6. Conclusion ===
At this point, I'm a bit lost. I tried googling, but I couldn't find
much about the scenario where video is copied and not transcoded, so I
ultimately decided to ask for help. I think the way to go is to tinker
with the audio filters (see above), but I don't really have any
experience with them, so I'm not exactly sure what I should do to solve
the issue. I really hope it is possible to solve it without re-encoding
the video stream. Even better if I don't actually have to re-encode the
audio either, but it's not a big deal if I do.
If necessary, I will try to provide as much additional information as I
can that might help finding a solution.
Thank you very much.
--
Kind regards,
Vladimir
More information about the ffmpeg-user
mailing list