[Libav-user] security camera app: bookmark+seek mp4 by wall-clock? low latency? visual timestamps?

Fri Aug 26 09:04:18 CEST 2011

Dear List,

I am developing a "security monitoring" type of application, that takes 
input from multiple IP cameras (a mix of Sanyo, Axis and Arecont, each 
covering a different area, with possibly different resolutions and 
framerates; all accessible through rtsp).  The application includes a 
recording part, which reads RTSP packets and writes them to a disk file; 
and a playback part which reads those files and plays them back. I also 
want to show the files while recording with minimum latency.

One of the requirements for the playback application is to show 
frame-synchronized views from different cameras (at 25fps, this requires 
40ms precision). The recorded files might need to be played on an 
independent system (using e.g. vlc of ffplay), so I would like to add a 
visual timestamp to the file, either onto the image, or as subtitles.

Some of the cameras insist that they need 4 image references, although 
the output stream consist entirely of I and P that never reference more 
than the previous ref frame (that is, the streams ARE low delay streams, 
despite the SPS/PPS flags).

Now, for the questions, with some discussion below.

(a) is there a general way to bookmark frames while writing a file in 
such a way that I can seek to them directly when playing, without 
searching? pts is NOT the answer, as explained below.

(b) is there a general way to encode *wall-clock* time  into a file I 
write? e.g. I want the "pts" (or equivalent) of the first frame of the 
file to be 2011-08-26 02:13:57.917 (millisec resolution sufficient - 48 
bits more than enough precision).

(c) is there a way to force "low delay" handling of a stream despite its 
SPS/PPS description? for some cameras, video decoding is lagging 4 
frames after the packet arrival, which - at 5fps, is almost 1 second 
delay. (one suggestion given below, looking for more options)

(d) is there a way, other than subtitles, to add a visual timestamp to 
the file while writing it, without decoding+overlaying+reencoding?

(e) is there a way to tell, without decoding the video stream, that a 
received packet starts a new non-key frame?

Discussion:

for (a) (bookmarking), the solution that I am using so far is:

  when recording, after I receive a packet from the RTSP stream, I note 
the exact (ntp synchronized) time, and the exact file offset using 
avio_tell(), and write them to a database, together with the 
AV_PKT_FLAG_KEY of the packet and other data. Every camera has its own 
output file.

  when playing, to show a specific time, I independently in each file 
seek to the nearest preceding key frame using 
av_seek_frame(x,y,offset,AVSEEK_FLAG_BYTE), and run the following logic:

   do {
     av_read_frame(...);
     avcodec_decode_video2(...);
   } while (avio_tell(...) < file_offset_at_time_wanted);

  it mostly works well for h264 "container" files (not really a 
container, as it has no header, footer or structure beyond the packets). 
However, if I try to do that in any structured file like avi,mp4,3gp, 
seeking by byte location does not seem to work properly (read_frame and 
or decode_video2 fail).

  I'm looking for a solution that would work equally well for mp4 and 
avi files. It's possible that one does not exist -- for those formats 
that have a pts/dts, perhaps it is possible to use the pts/dts as index? 
(h264 files don't have pts or dts info at all)

for (b) (wall clock), assume an mp4 or avi file; if I don't start 
pts/dts at 0, I get a delay at the beginning of the file (proportional 
to first frame's pts/dts); I suspect there is a way to mark a file 
"starting at a late pts", but I must have missed it in the docs? If I 
can do that, I can just av_seek_frame() by pts, relying on mp4/avi's 
frame index instead of my own. can I do that?

for (c) (low latency), I found this post: 
<http://libav-users.943685.n4.nabble.com/Libav-user-latency-of-mpegts-handling-in-libavformat-tp3678681p3695084.html> 
with a suggestion for a solution. Seems to work, but looks fragile, and 
requires 4 calls to decode for each read_frame. Perhaps there is a 
better way?

for (d) (visual timestamp) and (e) (frame boundary) I have no idea.

Thank you for your time and ideas,

Camera Man.