[MPlayer-dev-eng] [PATCH] replacement for internal mpg123 fork (mp3lib)

Thu May 13 03:52:15 CEST 2010

On Thu, 2010-05-13 at 02:14 +0200, Thomas Orgis wrote:
> > You're buffering data internally and there's no particularly precise
> > tracking which input part certain output corresponds to. This means that
> > audio timestamps will be inaccurate.
> 
> The internal buffering amounts to:
> 1. Data that is skipped (ID3v2, junk)
> 2. The data for the current frame to be decoded.
> 3. The header of a following frame (checked during initial/re-sync to
> confirm that the current frame is valid).
> 4. Whatever is still left over from the bigger input chunk handed in.
> 
> So... if you can get the stream chopped in to proper packets from the
> demuxer (or whatsitsname ... I am not so experienced with video
> decoders) and essentially hand data in pieces of whole MPEG frames, we
> are talking of one frame being buffered ahead, typically 1/38th second.

You're talking about a potential different implementation that handles
things at a packet level, not the current patch? The current patch seems
to feed the decoder 1024 bytes at the time, which should make your "4."
the largest contribution and result in more buffering than you're
talking about here.

> If you don't want the ahead buffering, you just have to disable
> MPG123_SEEKBUFFER and feed the data in exact chunks ... and grab all
> samples that can be delivered before mpg123 signals MPG123_NEED_MORE.
> This can be less than a full frame's worth because of cutting leading
> silence added as encoder padding and decoder delay. Do you have a take
> on that? Should that feature rather be disabled?
> At least for the first frame(s), the relation between input bytes and
> output bytes is not that simple.

Well if some reasonably close guess is the best you can do, then that'll
have to be used... Minor inaccuracies shouldn't matter too much (and it
may not be certain how accurately the encoder assigned timestamps to the
packets in those cases either).

> For the proper feeding it would be crucial here to have proper MPEG
> stream parsing in advance, though.

Adding parsing is possible, but does add some complexity so a separate
parsing step is better avoided if reasonably easy to do.
(It's almost certain that some mp3 input streams will not be correctly
packetized, so relying on things working without parsing is not an
option.)

> But then, if we'd use the API that is large file sensitive (a read
> callback), one can make mpg123 just read exactly the frame data needed
> for decoding instead of feeding a fixed input block.

I think current ad_mp3lib does basically this, though with the second
timing method I mentioned. I haven't looked at its sync behavior in
detail, but my impression is that while status line behavior shows it's
not completely accurate (and it cannot be because it uses the
bitrate-dependent timing method with VBR) the buffering amounts stay low
enough that the inaccuracies don't cause problems in most practical
cases.

> > There are two mechanisms that can be used for audio time tracking in
> > MPlayer. The more precise one is setting (in the decoder) sh_audio->pts
> > to the pts of the last audio packet, and sh_audio->pts_bytes to how many
> > bytes of output have been produced since the start of that packet. Then
> > the precise pts at the end of audio decoded so far can be calculated as
> > sh_audio->pts + sh_audio->pts_bytes*output_seconds_per_byte.
> 
> So, one would need the notion of MPEG frames as the unit that is used
> for reading, the current count of MPEG frames since the last resync...
> and how many samples of the current one have been extracted.

Counting MPEG frames wouldn't really work I think. If you assume
properly packetized input then frames will correspond to packets, but if
you handle other kinds of input directly in the decoder then frame
counts don't help much. You need to know what _packet_ given output
corresponds to, even if the packet contained multiple MPEG frames or
only part of one. And even if you assume packetized input you at least
have to deal correctly with the case where a packet is corrupted and
doesn't really contain a frame (total counts since resync feel quite
fragile for tracking things).

Assigning the timestamps is necessarily somewhat arbitrary without
properly packetized input, as it's not clear exactly which point the
timestamp should correspond to - the MPEG frame that continues at the
start of packet or the first new frame starting after the timestamp are
the most obvious choices.

If you have a read callback function and keep track of the timestamps in
the last packet(s) read that could work, at least assuming there's no
extra buffering behavior in the decoder or you at least know what it is.

> Of course, it would be possible to code both variants into the one
> source and decide in configure with variant can be used. After the work
> I already spent on this, it wouldn't be too bad. But: In this case, I
> would rather do a cleaner cut and use the callback input via void*
> handle that is present with mpg123 version 1.12, in addition to the
> dual-mode large file support.
> So: Good sync with mpg123 >= 1.12, workable playback with mpg123 >=
> 1.0.0 .
> 
> Opinions?

Well it's hard to comment on the dual-mode idea since I'm not familiar
with the API and don't know how just how simple or complex it would be.

>  Also some impression on how bad the a/v sync actually is with
> the current patch would be nice.

I haven't tried to calculate or measure a precise amount, but it seems
the sync could vary by amounts corresponding to over 1000 input bytes,
which would be quite substantial with low-bitrate audio.