[FFmpeg-devel] mpegts_write_header(), the buffering and thus reordering of audio samples.
Michael Niedermayer
michaelni at gmx.at
Thu Nov 3 21:42:36 CET 2011
On Thu, Nov 03, 2011 at 01:21:54PM -0700, Will Korbe wrote:
> On Thu, Nov 3, 2011 at 12:09 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Nov 02, 2011 at 04:13:57PM -0700, Will Korbe wrote:
> >> On Tue, Nov 1, 2011 at 11:39 AM, Will Korbe <wkorbe at gmail.com> wrote:
> >> > On Mon, Oct 31, 2011 at 7:41 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> >> On Mon, Oct 31, 2011 at 06:24:07PM -0700, Will Korbe wrote:
> >> >>> Hello,
> >> >>>
> >> >>> I hopefully have a minor question about the
> >> >>> libavformat/mpegtssec.c:mpegts_write_header() implementation. My hours
> >> >>> of online searching and specification reading for the answer has been
> >> >>> unsuccessful.
> >> >>>
> >> >>> I see there is an efficiency built in to bundle multiple audio samples
> >> >>> into one PES packet and was wondering if someone could point out a
> >> >>> paragraph from the MPEG2 specification which indicates the reordering
> >> >>> of TS Packets this causes is okay? I have a video with many 41 byte
> >> >>> audio samples at the start, and this seems to push the first audio
> >> >>> samples 5 seconds into the output. At this point, the PTS and DTS for
> >> >>> the audio samples are beyond the last PCR value by 5 seconds. The
> >> >>> video plays fine in QuickTime, the audio and video are precisely in
> >> >>> sync, so I assume QuickTime has logic to buffer the video frames until
> >> >>> seeming the first audio sample to account for this, but I wonder if
> >> >>> most MPEG2TS players handle this the same way. If this is spelled out
> >> >>> in the specification, I would sure feel more comfortable with it.
> >> >>>
> >> >>> Any information would be useful.
> >> >>
> >> >> I dont remember a tight limit on the PCR-DTS distance in the spec but
> >> >> then i dont remember the spec very well ...
> >> >> but having this large is bad because this means a long startup delay.
> >> >> So 5 seconds is IMHO not appropriate by default.
> >> >> Of course that delay would only affect actual broadcast and not playing
> >> >> from a local file with a modern player.
> >> >> the user parameter max_delay should be able to limit this, if not
> >> >> a bugreport & patch are welcome
> >> >>
> >> >>
> >> >> [...]
> >> >> --
> >> >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >> >>
> >> >> Asymptotically faster algorithms should always be preferred if you have
> >> >> asymptotical amounts of data
> >> >>
> >> >> _______________________________________________
> >> >> ffmpeg-devel mailing list
> >> >> ffmpeg-devel at ffmpeg.org
> >> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >> >>
> >> >>
> >> >
> >> > Hi Michael,
> >> >
> >> > Thank you for your response. From what I can tell, max_delay does not
> >> > limit the amount of audio sample buffering, rather, it is a macro,
> >> > DEFAULT_PES_PAYLOAD_SIZE (2930 bytes). In my test case,
> >> > AVFormatContext::max_delay is set to 50000 (50 ms). My sample video is
> >> > actually a movie trailer with close to 5 seconds of silence before the
> >> > audio sample byte sizes become substantial, after that point, the
> >> > audio samples are large enough that the PTS/DTS values become just 500
> >> > ms behind the last PCR. Given this, it seems when there are
> >> > multi-second periods of silence, the PTS/DTS of an audio sample has
> >> > the potential to become multi-seconds behind the last PCR in the TS
> >> > Packet Stream. As you mentioned, this may be acceptable with regards
> >> > to the specification and/or current playback implementations, however,
> >> > to attempt to handle more cases, perhaps the buffering of audio
> >> > samples should be limited by max_delay as you suggest, that would seem
> >> > to make it more inline with the mpeg2 timing model, but I'm not and
> >> > expert in this area. I plan to investigate further to see if a BUG
> >> > report is warranted.
> >> >
> >> > Thank you,
> >> > Will
> >> >
> >>
> >> When I study the MPEG2 timing and buffering model, it would seem to
> >> indicate the PCR value is the system-clock used to synchronize access
> >> units (audio samples and video frames). For the decoding side, the
> >> only delay specified is the buffers after the demux step, apparently
> >> one for the video frames, and one for audio samples. If the MPEG2TS
> >> stream was processed serially where the PCR value indicates the
> >> current system time as it is received in the MPEGTS stream, then in
> >> the audio or video buffers, if any access units have a DTS value which
> >> matches the current PCR, those access units are instantly removed from
> >> the buffer and decoded. If we don't have a DTS, the PTS is considered.
> >> When any of the access units have a PTS equal to the current PCR, then
> >> those access units are presented. Given this, it seems apparent to me
> >> that this is not how some players are implemented, otherwise the audio
> >> would arrive late, after the video was decoded and presented, and
> >> would have been dropped. I suspect players have an additional layer of
> >> logic and an initial delay built-in to handle MPEG2TS streams that are
> >> not quite spec compliant, otherwise my test video stream wouldn't have
> >> played correctly. If anyone knows more about this and can point out a
> >> flaw in my reasoning, please let me know.
> >>
> >> If I compare ffmpeg's MPEG2TS output with Apple's mediastreamsegmenter
> >> MPEG2TS output, the mediastreamsegmenter tool also groups audio
> >> samples together to form larger PES packets, however, it places this
> >> group of audio samples in the correct location in the MPEG2TS stream
> >> (DTS and PTS < PCR). I suspect they buffer both audio samples and
> >> video frames to be able to do this, using considerably more memory in
> >> the process.
> >>
> >> I work on an application that doesn't have the luxury to use extra
> >> memory, so I patched the mpegts_write_pes function making a trade-off
> >> which increases the output overhead slightly in some cases, but it
> >> does not use extra memory. It also was a minor modification. In
> >> mpegts_write_pes(), when it is determined a PCR will be written, which
> >> is just for video frames, I check the AVStreams associated to the
> >> AVFormatContext, and if they are buffering access units where the
> >> first DTS is not less than the PCR value, I write out the buffered
> >> access units before continuing to write out the PCR value. This keeps
> >> everything in order.
> >>
> >> The extra overhead bytes introduced is due to the additional stuffing
> >> bytes which results from not filling the TS Packets as completely as
> >> before with audio samples, and due to the additional PES headers. The
> >> patch still allows audio samples to get grouped, but only while their
> >> DTS value is less than the next PCR value mpegts_write_pes will add to
> >> the output stream. With a max_delay of 50ms, I noticed an increase in
> >> bytes of 0.8%. With a max_dealy of 1000ms, I didn't see an increase in
> >> overhead since this allows more audio samples to get grouped.
> >>
> >> I suspect this patch isn't general purpose enough, some would rather
> >> use more memory instead of increasing the potential container
> >> overhead, for example. Also, this was just tested with ffmpeg 0.5
> >> since that is the version we are currently working with here, but I
> >> forward ported it below to the latest version and it is shown below to
> >> see if anyone has comments regarding it.
> >
> > i have some comments
> > i think write_pcr can be set later and pcr is not set then and thus
> > used uninitialized
> >
> > also somehow it feels like the code would fit better in
> > mpegts_write_packet().
> > I mean calling mpegts_write_pes() for all streams out of
> > mpegts_write_pes() itself feels like a recipe for overflowing the
> > stack if theres some way to cause this to call too deep recursively
> >
> > and i think the code needs to be always called not just when
> > write_pcr is 1
> >
> > either way the patch is a needed bugfix and if you improve it iam
> > happy to include it
> >
> > [..]
> > --
> > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >
> > I know you won't believe me, but the highest form of Human Excellence is
> > to question oneself and others. -- Socrates
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> >
>
> Hi Michael,
>
> Thank you for your comments, I'll try to address some of them below.
>
> > i think write_pcr can be set later and pcr is not set then and thus
> > used uninitialized
>
> Ah, yes, in the latest ffmpeg version that is true, in the 0.5 version
> it wasn't, I missed that in the forward port due to haste in an
> attempt to get some feedback sooner. This definitely needs fixed.
>
> > also somehow it feels like the code would fit better in
> > mpegts_write_packet().
>
> Yes, I also would have preferred that as well, it is what I first
> looked at, but it did not look trivial to attempt to predict if it was
> time to write one more or more PCRs, and what their values would be
> from within mpegts_write_packet(). Inside mpegts_write_pes(), there is
> a while loop, and depending on certain conditions, in between one of
> the many TS_Packets, a PCR is added during one of the loop iterations
> and potentially during more than one of the loop iterations, though
> not likely.
>
> > I mean calling mpegts_write_pes() for all streams out of
> > mpegts_write_pes() itself feels like a recipe for overflowing the
> > stack if theres some way to cause this to call too deep recursively
>
> I did ensure the number of recursive calls is limited by the number of
> AVStreams minus 1 associated to the AVFormatContext (setting
> ts_st2->payload_index to 0 before calling mpegts_write_pes() again,
> and not calling mpegts_write_pes() on itself). I assumed the number of
> streams associated to a context is usually relatively low, it was 2
> for our use case, one video and one audio, would this get larger than
> say 10? .. and yes, I agree, the recursive nature of this is tricky, I
> did try to think through all of the permutations that could arise.
> Since write_pcr is only true for video streams, and we don't buffer
> video frames, it seems there shouldn't be any out of order issues
> causing a PCR value to be written before the access units with a lower
> DTS/PTS value.
>
> > and i think the code needs to be always called not just when
> > write_pcr is 1
>
> I chose to do this when write_pcr is 1 since I was only concerned
> about keeping the access unit's DTS below the PCR. If we are not
> writing a PCR value, then this can't occur. Was there another case
> that occurred to you?
the way i remember the spec, the PCR increases even when its not
stored. so to keep DTS in the future of when the data actually is
received in a reference decoder. Checking would be needed more often.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111103/59599e21/attachment.asc>
More information about the ffmpeg-devel
mailing list