[MPlayer-dev-eng] [PATCH] replacement for internal mpg123 fork (mp3lib)

Thu May 13 16:44:36 CEST 2010

On Thu, 2010-05-13 at 15:31 +0200, Thomas Orgis wrote:
> Am Thu, 13 May 2010 04:52:15 +0300
> schrieb Uoti Urpala <uoti.urpala at pp1.inet.fi>: 
> 
> > You're talking about a potential different implementation that handles
> > things at a packet level, not the current patch?
> 
> Yes. I'm guessing ... Reimar mentioned ds_read_packet() ... that would
> give input in form of packets as they are in the a/v stream, I presume.
> But those packets are not mpeg frames.

I think the packets should be frames at least sometimes (but in practice
they're not guaranteed to always be).

> > > For the proper feeding it would be crucial here to have proper MPEG
> > > stream parsing in advance, though.
> > 
> > Adding parsing is possible, but does add some complexity so a separate
> > parsing step is better avoided if reasonably easy to do.
> 
> I wasn't sure about that yet... so there isn't any MPEG parsing going
> on in MPlayer, yes? So if a stream, be it embedded in a video or as
> standalone audio file, contains funny things like LAME header or ID3
> tags, those will still be present. All data that does not correlate to

Standalone audio mp3 can be demuxer at least by MPlayer's native
demux_audio (which handles mp3, wav and flac audio-only files) and
libavformat's mp3 demuxer. I think the latter separately parses id3
tags.

> The more I think about that part, the more I hope that nobody puts such
> "enriched" MPEG audio into videos... but then, I might not grasp what

Even if something like that caused temporarily incorrect timestamps at
the start of the video it shouldn't have much effect later.

> > (It's almost certain that some mp3 input streams will not be correctly
> > packetized, so relying on things working without parsing is not an
> > option.)
> 
> Since we settled that there is no parsing at the MPEG level going on...
> then the only thing MPlayer does is scan the data a bit to determine
> what to decoder to run on it.

The audio file demuxers do try to split frames into separate packets. My
comment above was mostly referring to video files which have the audio
data already split into packets at the video container format level,
where that split doesn't correspond to mpeg frames. That case would
require a separate parsing step after demuxing to change the packet
boundaries.

>  So... you could point me to the code that
> prevents free format MP3 streams being sent to mpg123. We could decode
> them without fuss. They're not massively relevant, but possible.
> 
> Example: http://mpg123.org/test/drum-free133.mp3 

Seems that neither the default demux_audio nor lavf recognize that as a
valid mp3 file, so MPlayer is unable to determine file format and skips
it.

> Actually, when we decide that counting MPEG frames doesn't yield
> much... I think that we don't actually need to leave the current api of
> feeding blocks of data to mpg123_decode(). There is a way to get
> packets from MPlayer, with a timestamp, right (be it real or made up)? 
> If we feed one packet and read decoded data from mpg123 until it
> signals that it needs more data, we get the data corresponding to that
> packet.

Though you said before that it can wait to receive the header of the
following frame before outputting the current one? That'd affect things
a bit (though not necessarily much).

>  There is still the question of MPEG frames being split among
> packets, but this sounds like much less of an issue than blindly
> pushing 1K blocks (or 0.5K blocks... or whatnot).
> 
> Can you show me an example of the API to use in ad_mpg123.c to get
> packetized input with pts? We might get it done properly and still keep
> the compatibility to about any libmpg123 out there.

double pts;
char *start;
int len = ds_get_packet_pts(sh_audio->ds, &start, &pts);
if (len <= 0)
    EOF/ERROR;

The packet is "len" bytes from "start". The data does not need to be
separately freed; it's valid until you read more from the stream and at
that point is automatically freed. pts may be MP_NOPTS_VALUE meaning
that the packet has no timestamp.

Assuming that a decoder immediately outputs data corresponding to the
frame, the timing code can look something like this:

if (pts != MP_NOPTS_VALUE) {
    sh_audio->pts = pts;
    sh_audio->pts_bytes = 0;
}
int decode_len = decode(start, len);
if (decode_len >= 0)
    sh_audio->pts_bytes += decode_len;

If the audio packet had a timestamp then we mark that as the latest
output timestamp to count from, and set the bytes since that point to 0.
After decoding we increase the count of bytes output since last known
timestamp.