[FFmpeg-user] IP camera recording via RTSP: audio/video desync (dropped frames?)
Vladimir Mishonov
me at player701.ru
Sat Jul 16 10:21:21 EEST 2022
After another day of searching for solutions, I finally managed to come
up with one. Remember I said that "-use_wallclock_as_timestamps 1" fixes
the sync issues but causes the video to stutter? So I attempted to fix
the stuttering with the following filter:
> setts='max(floor(PTS/X)*X,if(N,PREV_OUTPTS+X))'
Note the X: it has to be substituted with a constant depending on the
use case (recording/streaming), and/or the stream the filter is being
applied to.
Explanation: this filter expects input timestamps to be generated from
the wallclock time but not necessarily spread out evenly. To fix
stuttering, it adjusts the timestamps to be multiples of X = timebase
times 1000 (for 25 FPS this is 0.04*1000=40) by computing the frame
number and rounding it down to the nearest integer. It also ensures that
the timestamps are always increasing - if the adjusted value is found to
be less than the previous value plus one frame, then this sum is used as
the output instead.
There is also a catch here - the camera can sometimes alter the frame
rate, which causes the formula to produce weird results. Rectifying this
is possible by assuming a constant frame rate for the input stream ("-r
25").
Also note that for smooth playback, the filter has to be applied both to
audio and video. When recording segments without re-encoding audio, X
should be set to 40 for both streams (assuming 25 FPS). Streaming,
however, is another matter. When RTMP streaming via "-f flv" (I use
fifo+flv), X has to be set to 1 for the video stream (I think in this
case "floor" can be dropped since all values seem to be integers, but
not entirely sure). Audio is another beast entirely: X should be 320
because the original sampling frequency is 8000 Hz mono, meaning
8000/25=320 samples per frame. The filter also needs to be applied via
"-af" instead of "-bsf:a" to operate on the source data.
The final command-line is as follows (both recording and streaming):
> ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
> -rtsp_transport tcp -timeout 3000000 -use_wallclock_as_timestamps 1 \
> -r 25 -i rtsp://login:password@ip.ad.dre.ss:554/url \
> -map 0:v -c:v copy -bsf:v
> setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
> -map 0:a -c:a copy -bsf:a
> setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
> -f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1
> -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"
> -map 0:v -c:v copy -bsf:v setts='max(floor(PTS),if(N,PREV_OUTPTS+1))'
> -map 0:a -c:a aac -ar 48000 -ac 2 -b:a 128k -af
> asetpts='max(floor(PTS/320)*320,if(N,PREV_OUTPTS+320))' \
> -f fifo -fifo_format flv -drop_pkts_on_overflow 1 -attempt_recovery 1
> -recover_any_error 1 -format_opts flvflags=no_duration_filesize
> rtmp://<STREAM_URL>
NB: the current version of FFmpeg in the FreeBSD ports collection
(4.4.2) needs these two patches for the proposed solution to work:
https://github.com/FFmpeg/FFmpeg/commit/301d275301d72387732ccdc526babaf984ddafe5
https://github.com/FFmpeg/FFmpeg/commit/b0b3fce3c33352a87267b6ffa51da31d5162daff
The first patch fixes the expression parser erroring out, and the second
one fixes the PREV_OUTPTS value always equal to NOPTS. Also, "timeout"
has to be replaced with "stimeout".
I'm still not sure if this solution is the proper one. So far, it's been
running for many hours, and the resulting video is smooth as butter, and
without any gradually increasing audio/video lag. But it looks extremely
overcomplicated, not to mention it took me several days of researching
and analyzing the video files to implement. Also, I don't know where the
timestamp drift actually occurs - most signs point to the camera, but
there's also the fact that some sort of conversion takes place depending
on the output (e.g. segment/mkv measures timestamps in 1/1000ths of a
second, but flv measures them in frames), and it might be possible that
there's a bug somewhere in there.
For simplicity though, let's assume there's no bug, and the fault occurs
at the source. We know that the audio is always on time, so why not use
the timestamps of the audio packets for the video too? E.g. for each
incoming video frame, assign it the timestamp of the latest audio packet
received (not the wallclock time). The problem is that "setts" filters
cannot interact with each other, so it's not possible to use them for
this purpose.
Well, even though I've managed to somehow deal with this problem, I'm
still no expert. So further comments are still welcome. Until then, I
hope the information provided in this thread will be useful to anybody
who encounters a similar issue.
Thank you very much.
---
Kind regards,
Vladimir
More information about the ffmpeg-user
mailing list