[FFmpeg-user] IP camera recording via RTSP: audio/video desync (dropped frames?)

Tue Jul 12 23:39:21 EEST 2022

Hello everyone.

I've been having some problems with recording video from an IP camera 
that transmits a live feed over RTSP. I've searched a lot for potential 
solutions, but none helped so far. I've been going crazy trying to fix 
this for days, so now I'm asking here as a last resort.

This is the command-line I'm using to record the video. Is is supposed 
to continously record the camera feed, cutting it up in 10 minute-long 
segments:

> ffmpeg -nostdin -flags low_delay -fflags 
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -c:v copy -map 0:a -c:a copy -f segment -strftime 1 
> -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 
> "%Y-%m-%dT%H-%M-%S.mkv"

FFMPEG outputs no warnings while running, but the recorded segments 
exhibit the following undesired issues:

1) Significant video lag gets accumulated over time. E.g. the first 
frame of the segment recorded at 12:00:00 may show the time 11:59:30 on 
the camera clock. The delay is only a few seconds at first but gets more 
and more noticeable with each recording. Note that the camera clock 
itself is working correctly - restarting the recording process resets 
the delay.

2) As a result of #1, audio starts skipping ahead of video over time, 
and just like with the previous issue, the delay gets progressively 
worse, correlating linearly with the video lag. E.g. if I knock on a 
solid surface with my finger in view of the camera, the sound will come 
much earlier than the video. This becomes easily noticeable after only 
some 10-15 minutes of continuous recording during day-time. The delay 
between audio and video is about the same as the one between the 
starting time of the segment and the clock time visible on the first 
frame.

Note that neither the video nor the audio is being transcoded, so this 
cannot be a result of CPU overload on my end. I do not see any CPU usage 
spikes while recording.

FFMPEG input and output parameters are as follows (according to its own 
output):

> Input #0, rtsp, from 'rtsp://login@password@ip.ad.dre.ss:554/url':
>   Metadata:
>     title           : RTSP Server
>   Duration: N/A, start: 0.000000, bitrate: N/A
>   Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 
> 2560x1440, 25 fps, 25 tbr, 90k tbn
>   Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
> Output #0, segment, to '%Y-%m-%dT%H-%M-%S.mkv':
>   Metadata:
>     title           : RTSP Server
>     encoder         : Lavf59.16.100
>   Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 
> 2560x1440, q=2-31, 25 fps, 25 tbr, 1k tbn
>   Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
> Stream mapping:
>   Stream #0:0 -> #0:0 (copy)
>   Stream #0:1 -> #0:1 (copy)

Here's what I've been able to find out about this problem so far:

=== 0. The issue only manifests with this specific camera ===

I have multiple IP cameras at home, and the problematic one is a 
different make and model than the rest, so it must be something about 
how it encodes and streams the video/audio. Whatever the cause is, I 
know that all my cameras stream video over RTSP at 25 FPS and audio at 
8000 Hz - and recording with the exact same command-line works perfectly 
with all of them except for this one. Note that this is also the only 
camera where "-use_wallclock_as_timestamps 1" causes the video to 
stutter (see below).

The camera itself is a cheap Chinese IP camera. It doesn't have a 
specific brand on it, but here's what it says in the web UI: "Device 
Model: F8/IPG-9280PGS-AI". Not sure if this will help, but mentioning it 
just in case.

Since recording from all other cameras works correctly, I assume this is 
not a bug in FFMPEG itself. I've tested with two different versions 
(5.0.1 on Windows and 4.4.2 on FreeBSD), and they both exhibit the 
problem with only this specific camera.

=== 1. "-use_wallclock_as_timestamps 1" works, but is undesirable ===

If "-use_wallclock_as_timestamps" 1 is added to the command line before 
the input, both issues are resolved, HOWEVER, this results in the video 
constantly stuttering (it looks as if it skips ahead slightly with every 
keyframe). Therefore, this option cannot be used as a solution. In 
addition to that, FFMPEG starts complaining about non-monotonous DTS in 
output stream 0:0 (but all recordings still seem to play back normally, 
except for the stuttering).

=== 2. The video lag is related to having audio input ===

If there is no audio input, then there is no video lag either - the 
first frame of each segment (give or take a keyframe interval) is always 
on time according to the camera clock.

> # this does not have video lag accumulating over time
> ffmpeg -nostdin -flags low_delay -fflags 
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -c:v copy -f segment -strftime 1 -reset_timestamps 1 
> -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"

However, removing audio is not a viable solution either, since I do want 
to have audio recorded along with the video.

=== 3. The actual FPS is sometimes less than the one reported by RTSP 
===

The RTSP streaming source reports a constant frame rate of 25 FPS, but 
I've inspected the recordings and found that at least one segment had 
only 14900 frames, so its actual frame rate was 14900/600 = 24.8(3) FPS. 
But even though some frames are missing and the file is reported to have 
a framerate of 25 FPS and a length of 10 minutes, it does not seem to 
have any video playback issues at all. This is a significant difference 
compared to recordings from my other cameras: I tried picking some at 
random, and they all reported 15000 frames exactly.

Some other segments recorded during my later experiments had even less 
frames, e.g. one had only about 12000, and even though it still had a 
10-minute runtime, the camera clock only counted about 9 minutes and 50 
seconds from start to end, so I guess it played back a bit slower than 
in realtime (but I could not perceive any difference).

What's more interesting is that all these frame count discrepancies 
happened only during day-time. It appears that every segment recorded 
during night-time, when the camera is in IR black-and-white mode, has 
exactly 15000 frames, meaning a "real" 25 FPS, and the video lag does 
not increase at that time (and nor does the audio desync). Therefore, I 
can conclude that the issue has something to do with the camera not 
being able to encode at a constant 25 FPS at all times. I assume it is 
possible that an audio desync will happen when frames are dropped.

=== 4. Adding "-vsync cfr" doesn't help - need to re-encode? ===

This seemed like the way to go, according to the documentation. If the 
camera sometimes cannot produce frames at its reported rate, then we 
should duplicate them to bring the frame count up to the expected value 
to match the reported FPS. Then the discrepancy will be gone, and the 
sync issues along with it. Right?

> cfr (1)
>   Frames will be duplicated and dropped to achieve exactly the 
> requested constant frame rate.

Unfortunately, this didn't work as I'd hoped: some segments still had 
less frames than expected, and the lag was still happening as a result.

> # This should work but doesn't
> ffmpeg -nostdin -flags low_delay -fflags 
> +nobuffer+genpts+igndts+ignidx+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 \
> -i rtsp://login@password@ip.ad.dre.ss:554/url \
> -map 0:v -vsync cfr -c:v copy -map 0:a -c:a copy -f segment -strftime 1 
> -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 
> "%Y-%m-%dT%H-%M-%S.mkv"

Moving "-vsync cfr" to before input didn't appear to change anything. 
(The documentation says it's global, so I assume its position is 
irrelevant.)

I guess that it's possible that -vsync simply does not work when copying 
streams, since FFMPEG might need to decode the video to be able to 
duplicate frames. But it didn't produce an error message, so I thought 
it would work. Note that transcoding a 1440p video stream 24/7 is 
completely out of the question for me (the recording server simply does 
not have that much CPU power).

=== 5. Sync audio instead of video? ===

If I cannot drop video frames, perhaps I can adjust the audio track to 
match the video instead? Like I said before, I cannot transcode the 
video... but transcoding audio doesn't take nearly as much processing 
power, so it wouldn't be too big of a problem for me. Furthermore, it is 
even required if I want to broadcast the feed to YouTube (and in this 
particular case, I do) because it dictates that the audio must be 
encoded with either AAC or MP3. For live streaming, I use the AAC 
encoder with the following options:

> -c:a aac -ar 48000 -ac 2 -b:a 128k

With these parameters, when recording from other cameras there was 
sometimes a slight delay between the audio and video streams (not 
increasing over time). I managed to remove it by adding:

> -af aresample=async=1

Naturally, this didn't help with the audio sync issues for the 
problematic camera. I also tried increasing the async=value, up to 
10000, to no avail. The documentation suggests it might be helpful in 
syncing the audio track, but so far my experiments with it have been 
unsuccessful.

=== 6. Conclusion ===

At this point, I'm a bit lost. I tried googling, but I couldn't find 
much about the scenario where video is copied and not transcoded, so I 
ultimately decided to ask for help. I think the way to go is to tinker 
with the audio filters (see above), but I don't really have any 
experience with them, so I'm not exactly sure what I should do to solve 
the issue. I really hope it is possible to solve it without re-encoding 
the video stream. Even better if I don't actually have to re-encode the 
audio either, but it's not a big deal if I do.

If necessary, I will try to provide as much additional information as I 
can that might help finding a solution.

Thank you very much.

-- 
Kind regards,
Vladimir