[FFmpeg-devel] mpegts_write_header(), the buffering and thus reordering of audio samples.

Fri Nov 4 02:22:50 CET 2011

On Thu, Nov 03, 2011 at 05:34:21PM -0700, Will Korbe wrote:
> On Thu, Nov 3, 2011 at 1:50 PM, Will Korbe <wkorbe at gmail.com> wrote:
> > On Thu, Nov 3, 2011 at 1:42 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> On Thu, Nov 03, 2011 at 01:21:54PM -0700, Will Korbe wrote:
> >>> On Thu, Nov 3, 2011 at 12:09 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>> > On Wed, Nov 02, 2011 at 04:13:57PM -0700, Will Korbe wrote:
> >>> >> On Tue, Nov 1, 2011 at 11:39 AM, Will Korbe <wkorbe at gmail.com> wrote:
> >>> >> > On Mon, Oct 31, 2011 at 7:41 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>> >> >> On Mon, Oct 31, 2011 at 06:24:07PM -0700, Will Korbe wrote:
> >>> >> >>> Hello,
> >>> >> >>>
> >>> >> >>> I hopefully have a minor question about the
> >>> >> >>> libavformat/mpegtssec.c:mpegts_write_header() implementation. My hours
> >>> >> >>> of online searching and specification reading for the answer has been
> >>> >> >>> unsuccessful.
> >>> >> >>>
> >>> >> >>> I see there is an efficiency built in to bundle multiple audio samples
> >>> >> >>> into one PES packet and was wondering if someone could point out a
> >>> >> >>> paragraph from the MPEG2 specification which indicates the reordering
> >>> >> >>> of TS Packets this causes is okay? I have a video with many 41 byte
> >>> >> >>> audio samples at the start, and this seems to push the first audio
> >>> >> >>> samples 5 seconds into the output. At this point, the PTS and DTS for
> >>> >> >>> the audio samples are beyond the last PCR value by 5 seconds. The
> >>> >> >>> video plays fine in QuickTime, the audio and video are precisely in
> >>> >> >>> sync, so I assume QuickTime has logic to buffer the video frames until
> >>> >> >>> seeming the first audio sample to account for this, but I wonder if
> >>> >> >>> most MPEG2TS players handle this the same way. If this is spelled out
> >>> >> >>> in the specification, I would sure feel more comfortable with it.
> >>> >> >>>
> >>> >> >>> Any information would be useful.
> >>> >> >>
> >>> >> >> I dont remember a tight limit on the PCR-DTS distance in the spec but
> >>> >> >> then i dont remember the spec very well ...
> >>> >> >> but having this large is bad because this means a long startup delay.
> >>> >> >> So 5 seconds is IMHO not appropriate by default.
> >>> >> >> Of course that delay would only affect actual broadcast and not playing
> >>> >> >> from a local file with a modern player.
> >>> >> >> the user parameter max_delay should be able to limit this, if not
> >>> >> >> a bugreport & patch are welcome
> >>> >> >>
> >>> >> >>
> >>> >> >> [...]
> >>> >> >> --
> >>> >> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>> >> >>
> >>> >> >> Asymptotically faster algorithms should always be preferred if you have
> >>> >> >> asymptotical amounts of data
> >>> >> >>
> >>> >> >> _______________________________________________
> >>> >> >> ffmpeg-devel mailing list
> >>> >> >> ffmpeg-devel at ffmpeg.org
> >>> >> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >> > Hi Michael,
> >>> >> >
> >>> >> > Thank you for your response. From what I can tell, max_delay does not
> >>> >> > limit the amount of audio sample buffering, rather, it is a macro,
> >>> >> > DEFAULT_PES_PAYLOAD_SIZE (2930 bytes). In my test case,
> >>> >> > AVFormatContext::max_delay is set to 50000 (50 ms). My sample video is
> >>> >> > actually a movie trailer with close to 5 seconds of silence before the
> >>> >> > audio sample byte sizes become substantial, after that point, the
> >>> >> > audio samples are large enough that the PTS/DTS values become just 500
> >>> >> > ms behind the last PCR. Given this, it seems when there are
> >>> >> > multi-second periods of silence, the PTS/DTS of an audio sample has
> >>> >> > the potential to become multi-seconds behind the last PCR in the TS
> >>> >> > Packet Stream. As you mentioned, this may be acceptable with regards
> >>> >> > to the specification and/or current playback implementations, however,
> >>> >> > to attempt to handle more cases, perhaps the buffering of audio
> >>> >> > samples should be limited by max_delay as you suggest, that would seem
> >>> >> > to make it more inline with the mpeg2 timing model, but I'm not and
> >>> >> > expert in this area. I plan to investigate further to see if a BUG
> >>> >> > report is warranted.
> >>> >> >
> >>> >> > Thank you,
> >>> >> > Will
> >>> >> >
> >>> >>
> >>> >> When I study the MPEG2 timing and buffering model, it would seem to
> >>> >> indicate the PCR value is the system-clock used to synchronize access
> >>> >> units (audio samples and video frames). For the decoding side, the
> >>> >> only delay specified is the buffers after the demux step, apparently
> >>> >> one for the video frames, and one for audio samples. If the MPEG2TS
> >>> >> stream was processed serially where the PCR value indicates the
> >>> >> current system time as it is received in the MPEGTS stream, then in
> >>> >> the audio or video buffers, if any access units have a DTS value which
> >>> >> matches the current PCR, those access units are instantly removed from
> >>> >> the buffer and decoded. If we don't have a DTS, the PTS is considered.
> >>> >> When any of the access units have a PTS equal to the current PCR, then
> >>> >> those access units are presented. Given this, it seems apparent to me
> >>> >> that this is not how some players are implemented, otherwise the audio
> >>> >> would arrive late, after the video was decoded and presented, and
> >>> >> would have been dropped. I suspect players have an additional layer of
> >>> >> logic and an initial delay built-in to handle MPEG2TS streams that are
> >>> >> not quite spec compliant, otherwise my test video stream wouldn't have
> >>> >> played correctly. If anyone knows more about this and can point out a
> >>> >> flaw in my reasoning, please let me know.
> >>> >>
> >>> >> If I compare ffmpeg's MPEG2TS output with Apple's mediastreamsegmenter
> >>> >> MPEG2TS output, the mediastreamsegmenter tool also groups audio
> >>> >> samples together to form larger PES packets, however, it places this
> >>> >> group of audio samples in the correct location in the MPEG2TS stream
> >>> >> (DTS and PTS < PCR). I suspect they buffer both audio samples and
> >>> >> video frames to be able to do this, using considerably more memory in
> >>> >> the process.
> >>> >>
> >>> >> I work on an application that doesn't have the luxury to use extra
> >>> >> memory, so I patched the mpegts_write_pes function making a trade-off
> >>> >> which increases the output overhead slightly in some cases, but it
> >>> >> does not use extra memory. It also was a minor modification. In
> >>> >> mpegts_write_pes(), when it is determined a PCR will be written, which
> >>> >> is just for video frames, I check the AVStreams associated to the
> >>> >> AVFormatContext, and if they are buffering access units where the
> >>> >> first DTS is not less than the PCR value, I write out the buffered
> >>> >> access units before continuing to write out the PCR value. This keeps
> >>> >> everything in order.
> >>> >>
> >>> >> The extra overhead bytes introduced is due to the additional stuffing
> >>> >> bytes which results from not filling the TS Packets as completely as
> >>> >> before with audio samples, and due to the additional PES headers. The
> >>> >> patch still allows audio samples to get grouped, but only while their
> >>> >> DTS value is less than the next PCR value mpegts_write_pes will add to
> >>> >> the output stream. With a max_delay of 50ms, I noticed an increase in
> >>> >> bytes of 0.8%. With a max_dealy of 1000ms, I didn't see an increase in
> >>> >> overhead since this allows more audio samples to get grouped.
> >>> >>
> >>> >> I suspect this patch isn't general purpose enough, some would rather
> >>> >> use more memory instead of increasing the potential container
> >>> >> overhead, for example. Also, this was just tested with ffmpeg 0.5
> >>> >> since that is the version we are currently working with here, but I
> >>> >> forward ported it below to the latest version and it is shown below to
> >>> >> see if anyone has comments regarding it.
> >>> >
> >>> > i have some comments
> >>> > i think write_pcr can be set later and pcr is not set then and thus
> >>> > used uninitialized
> >>> >
> >>> > also somehow it feels like the code would fit better in
> >>> > mpegts_write_packet().
> >>> > I mean calling mpegts_write_pes() for all streams out of
> >>> > mpegts_write_pes() itself feels like a recipe for overflowing the
> >>> > stack if theres some way to cause this to call too deep recursively
> >>> >
> >>> > and i think the code needs to be always called not just when
> >>> > write_pcr is 1
> >>> >
> >>> > either way the patch is a needed bugfix and if you improve it iam
> >>> > happy to include it
> >>> >
> >>> > [..]
> >>> > --
> >>> > Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>> >
> >>> > I know you won't believe me, but the highest form of Human Excellence is
> >>> > to question oneself and others. -- Socrates
> >>> >
> >>> > _______________________________________________
> >>> > ffmpeg-devel mailing list
> >>> > ffmpeg-devel at ffmpeg.org
> >>> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>> >
> >>> >
> >>>
> >>> Hi Michael,
> >>>
> >>> Thank you for your comments, I'll try to address some of them below.
> >>>
> >>> > i think write_pcr can be set later and pcr is not set then and thus
> >>> > used uninitialized
> >>>
> >>> Ah, yes, in the latest ffmpeg version that is true, in the 0.5 version
> >>> it wasn't, I missed that in the forward port due to haste in an
> >>> attempt to get some feedback sooner. This definitely needs fixed.
> >>>
> >>> > also somehow it feels like the code would fit better in
> >>> > mpegts_write_packet().
> >>>
> >>> Yes, I also would have preferred that as well, it is what I first
> >>> looked at, but it did not look trivial to attempt to predict if it was
> >>> time to write one more or more PCRs, and what their values would be
> >>> from within mpegts_write_packet(). Inside mpegts_write_pes(), there is
> >>> a while loop, and depending on certain conditions, in between one of
> >>> the many TS_Packets, a PCR is added during one of the loop iterations
> >>> and potentially during more than one of the loop iterations, though
> >>> not likely.
> >>>
> >>> > I mean calling mpegts_write_pes() for all streams out of
> >>> > mpegts_write_pes() itself feels like a recipe for overflowing the
> >>> > stack if theres some way to cause this to call too deep recursively
> >>>
> >>> I did ensure the number of recursive calls is limited by the number of
> >>> AVStreams minus 1 associated to the AVFormatContext (setting
> >>> ts_st2->payload_index to 0 before calling mpegts_write_pes() again,
> >>> and not calling mpegts_write_pes() on itself). I assumed the number of
> >>> streams associated to a context is usually relatively low, it was 2
> >>> for our use case, one video and one audio, would this get larger than
> >>> say 10? .. and yes, I agree, the recursive nature of this is tricky, I
> >>> did try to think through all of the permutations that could arise.
> >>> Since write_pcr is only true for video streams, and we don't buffer
> >>> video frames, it seems there shouldn't be any out of order issues
> >>> causing a PCR value to be written before the access units with a lower
> >>> DTS/PTS value.
> >>>
> >>
> >>> > and i think the code needs to be always called not just when
> >>> > write_pcr is 1
> >>>
> >>> I chose to do this when write_pcr is 1 since I was only concerned
> >>> about keeping the access unit's DTS below the PCR. If we are not
> >>> writing a PCR value, then this can't occur. Was there another case
> >>> that occurred to you?
> >>
> >> the way i remember the spec, the PCR increases even when its not
> >> stored. so to keep DTS in the future of when the data actually is
> >> received in a reference decoder. Checking would be needed more often.
> >>
> >> [...]
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>
> >> In fact, the RIAA has been known to suggest that students drop out
> >> of college or go to community college in order to be able to afford
> >> settlements. -- The RIAA
> >>
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel at ffmpeg.org
> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >>
> >
> >> the way i remember the spec, the PCR increases even when its not
> >> stored. so to keep DTS in the future of when the data actually is
> >> received in a reference decoder. Checking would be needed more often.
> >
> > I see, I guess we could end up with a case where the PCR increased
> > beyond the the lowest buffered audio sample's DTS, and the next PCR to
> > write/store would be greater than the audio-sample's DTS, but we are
> > not at the point to write/store it yet.
> >
> > Thank you,
> > Will
> >
> 
> I've cleaned up the patch based on the comments thus far, with one
> exception, the recursive call is still in place. I would like to come
> up with a way to remove the recursive call and take care of flushing
> the buffers in mpegts_write_packet(), but it looks like we need to
> predict from within mpegts_write_packet() what the newest PCR value
> will be after calling mpegts_write_pes() for non-buffered access units
> to see if we need to flush out buffered access units beforehand. In
> the case when mux_rate is 0, that is trivial, but when mux_rate > 0,
> it seems that each while-loop iteration in mpegts_write_pes could
> increase the PCR value, and its not a simple division of the payload
> length by the TS_PACKET_SIZE to figure out how many while-loop
> iterations there will be to calculate what the PCR value will be in
> the end since various headers are added under certain conditions that
> could increase the number of while-loop iterations. It looks possible
> to do this though, I will think about this some more.

you might also want to look at output_packet() of libavformat/mpegenc.c
which handles this in mpeg-PS simply by outputing based on the
buffer fullness. This works because the closer dts are to SCR (PCR here)
the closer the buffers are to being empty.

also indention depth should be 4 in ffmpeg.

thanks for your work on this
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111104/ace3cf5f/attachment.asc>