[FFmpeg-devel] AVCHD/H.264 decoder: further development/corrections
Sun Feb 1 00:09:47 CET 2009
On Sun, Jan 25, 2009 at 08:08:06PM +0100, Ivan Schreter wrote:
> I identified the following problems and potential solutions:
> 1. Inconsistency between packets returned via av_read_frame() and
> actually delivered full frames from avcodec_decode_video()
> 2. Key frame calculation and seeking
> 3. Reporting frame type to libavformat
> Now the details:
> *1. Inconsistency between packets and decoded frames*
> H.264 decoder returns AVPackets via av_read_frame(), which contain
> either a full frame or just a field (half frame). The former case is not
> problematic, since decoded frames are 1:1 to returned packets. It is
> problematic, though, when the decoder returns packets, which DO NOT
> correspond to a full frame. This is the case of interlaced AVCHD video
> as produced by various full-HD camcorders (at least Panasonic, Sony and
> Canon). H.264 standard allows namely coding by field, so one picture in
> H.264 terms (as currently returned as AVPacket from av_read_frame()) can
> contain either a single field, two fields (frame) or even repeated
> fields (so in total 1-3 fields per AVPacket).
> I'd concentrate first on H.264 pictures having 1 to 2 fields only, since
> the other case (3 fields per picture) is probably not that interesting
> now (it is used to quasi stretch FPS from original cinema material to
> television frame rates).
> Although the decoder itself takes this into account, the interface in
> libavformat doesn't. Thus, currently only video having full frames per
> packet decodes really correctly (and this also only with not-yet-applied
> patch concerning frame types). Reason: av_read_frame() doesn't return
> whole frames, although it is documented so.
"decoding" of fields and even field/frame mixes works perfectly, and bitexact
you can try the reference bitstreams ...
what doesnt work is the timestamps and these cause the user apps o drop and
> *Potential solution:* For field pictures, delay returning a packet from
> h264_parse(), until the second field picture is also read. The decoder
> should take then care of decoding both fields correctly and returning a
> full frame for each packet.
as mentioned in another mail this has its problems sadly
> *Alternative solution:* Return field packet from h264_parse()
> immediately, but somehow tell libavformat that the packet does not
> represent a full frame and second field has to be read as well. Read it
> in libavformat, extending the existing packet. Thus, av_read_frame()
> returns then full frame.
you might want to look at
svn di -r12162:12161
> Now the question: Which solution is the "right" one? I'd go for the
> first one or possibly for the alternative. The first proposed solution
> seems to be most "compatible", since we don't need to extend AVPacket to
> address the issue.
> Your opinions? Or eventually a different idea?
The avparser for h264 should take the input timestamps frm the demuxer
decode all the relevant SEIs and headers and return the correctly
> Further, I'd propose keeping a small cache of (PTS, position,
> convergence_duration) triples for frames containing SEI recovery point
> message, so the seeking around "current" location would be faster.
> Reason: video editing software, where we often need to seek one frame
> Your opinions/suggestions?
> *3. Reporting frame type to libavformat*
> This is a minor thing, but still important for correct computation of
> PTS/DTS and key frame flags. compute_pkt_fields() relies on having the
> information about picture type (I/P/B-frame). However, H.264 doesn't
> have strict I/P/B frames, there is even a possibility to have mixed-type
> slices inside of one frame. Indeed, my camcorder produces in interlaced
> mode top field as I-slice and bottom field as P-slice referring to the
> top field.
> So my suggestion is, report picture type I-frame for key frames (which
> are key frames is discussed above) and report P-frame for all frames
> containing only P- and I- slices. Other frames containing also B-slices
> will be reported as B-frames.
this is technically correct i agree, but because it takes time and the
information is effectively useless, there is no relation beteen pict_type
and timestamps ...
we can take a shortcut and just use the type of the first slice
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel