[FFmpeg-devel] [PATCH v3 3/3] aadec: fix seeking in mp3 content
Karsten Otto
ottoka at posteo.de
Fri Jul 6 11:49:46 EEST 2018
> Am 04.07.2018 um 23:54 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>
> Signierter PGP-Teil
> On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
>>
>>> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>
>>> Signierter PGP-Teil
>>> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>>>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>>>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>>>> be used for frame or chapter detection, unfortunately.
>>>>
>>>> More details inline below.
>>>>
>>>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>>
>>>>> Signierter PGP-Teil
>>>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>>>>
>>>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>>>>
>>>>>>> Signierter PGP-Teil
>>>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>>>> also adjust timestamp information accordingly.
>>>>>>>> ---
>>>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>>>>
>>>>>>> This codec specific code in demuxers should not be needed
>>>>>>>
>>>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>>>>
>>>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>>>> painful while wearing headphones!
>>>>>
>>>>> I think you mis-diagnose this at least somewhat
>>>>> your code searches for a specific mp3 header, the parser and decoder would
>>>>> accept a wider range of mp3 variants.
>>>>> But both can choose points that are not mp3 frame starts. (if that is the
>>>>> problem you are seeing, iam not completely sure it is)
>>>>>
>>>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>>>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>>>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>>>> By looking for the more restricted header, my patch finds the real next frame at
>>>> offset 78.
>>>>
>>>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>>>> version ID?
>>>>
>>>>> Also is the more restricted header you search for always used or could
>>>>> this be failing with some files ?
>>>>>
>>>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>>>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>>>>
>>>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>>>> main header which points to chunks. The one file i found has 12 such chunks
>>>>> the first represents the whole file i suspect, the next largest the audio
>>>>> data, another one the metadata.
>>>>> I guess the remaining 2 large ones could be a cover image and an index.
>>>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>>>> order in each file. I guess thats why the original aadec.c implementation just
>>>> looks for the largest chunk to play.
>>>>
>>>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>>>> values. the first in the file i have goes from 0 to 3 the second starts
>>>>> multiple times from 0 and seems monotonly increasing and staying within
>>>>> the filesize.
>>>>> The sample i have does not store mp3 but it looks like this is a index
>>>>> maybe offsets for packets in each of the 3 chapters.
>>>>>
>>>>> Please look at the data, if it can be used. It would be much better than
>>>>> scaning the file linearly and searching for some byte sequence to find
>>>>> packet starts.
>>>>>
>>>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>>>> offsets from the index.
>>>>
>>>
>>>> Long answer:
>>>> All offsets in the index are the same, and matching the "codec_second_size"
>>>> = crypto chunk size, roughly one second of audio:
>>>> - 1045 for format 2 (SIPR 8kbps)
>>>> - 2000 for format 3 (SIPR 16kbps)
>>>> - 3982 for format 4 (MP3 32kbps)
>>>> This is different from the respective frame size, which is 19, 20, and 104/105
>>>
>>> The first 2 are exact multiples of the frame size.
>>>
>>> I dont have a sample for the mp3 case so i cannot check but are you sure
>>> the actually writen numbers dont match up multiples of mp3 frames ?
>>>
>> Yes, I have tested both with some old aa books I still had on disk, and some I
>> bought more recently. Using -fdebug ts, I can clearly see a packet size of
>> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
>> specialized header detection on every chunk, I see that the actual mp3 frame
>> header indeed starts at offset 0 in chunk 0. But then it changes for every
>> subsequent chunk, I have seen a maximum of 103, as to be expected.
>
> Is there a public "aa" file with mp3 ?
> Id like to look at this myself
> also we should add such a file to fatesamples and add some tests with it
>
Sadly no, at least not legally. aa files are always personalized to a user account.
Audible offers some free of charge titles, you can get them without a subscription,
if you sign up / use an Amazon account. To get the aa version, choose
"normal quality"; otherwise you get the successor format aax, which is mp4.
>
>
>>
>>> If not the question that remains is how does the official code seek in this?
>>> It seems a bit odd that they would get the design of the index wrong but then
>>> implement demuxer side searching for a mp3 header not noticing that this is
>>> not working fully reliable.
>>> [This would imply that while they implemented the demuxer they would have
>>> been aware that their index doesnt work, why would they not fix it at this
>>> point ...]
>>> Its hard to imagine they would not notice this bug with the index at all,
>>> not that this is impossible or anything
>>>
>>
>> Agreed. Here is how I think this is actually supposed to work: Every encrypted
>> chunk contains roughly one second of audio. So, if you want to seek to a
>> specific second, you can just look up an index entry. This tells you the chapter
>> number and the offset in the chapter. Now, if you already know the chapter
>> offsets, you can jump right to the correct position in the audio section. (I assume
>> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
>> Regarding the odd mp3 frame offsets, a more restricted header detection will
>> take care of that, as I do in patch 3/3.
>
> The more restricted header detection will fail less often. But it would be
> unexpected that it never fails. Theres not really anything that prevents
> these specific sequence of bits to occur in places other than the header
>
> All this makes it a bit hard to belive for me that this is how the format is
> intended to work.
>
As I said, I think mp3 in aa is more a retrofit than intentional design. It works if
they always use the exact same encoding (as they do from my observations).
Then they can used an *exact* header match to sync, and the chance of the
same 32 bit pattern occurring unintentionally in audio content is highly unlikely.
>
>>
>> Of course, given each chunk is the same size, you don't really need an index,
>> but you can calculate the offset instead. This is what I am doing in patch 2/3.
>> The index only makes sense if the offsets were not linear, but shuffled around.
>> Maybe they did that in the original aa format 1 to scramble the audio, and kept
>> the index for backward compatibility. This is pure speculation though, as I have
>> no information at all about format 1.
>
> or variable bitrate
>
Keep in mind this is an index of crypto blocks = seconds, not mp3 frames. This
would not work very well with VBR.
>
>>
>> So, the whole thing boils down to knowing the chapter offsets. From the
>> current aadec.c implementation I know that this information exist inline in the
>> audio section, so that's what I am using in patch 2/3. As I said, the chapter
>> information might also exist somewhere else in the format, but finding it would
>> require more reverse engineering, which I am not comfortable with.
>>
>> Yes, my current patch requires a couple dozen seek operations on the file,
>> which is not optimal. But I don't think this is too much of a performance hit, as
>> nowadays everybody has an OS with a sturdy disk block cache, or read directly
>> from RAM storage anyway. It works at least, so I can use my old aa books on
>> my Linux phone now :-)
>
> people do also play stuff remotely and we had bug reports about things like
> http doing pretty poorly with seeks.
>
Ok, I had not thought about that. But the aa format was always intended for
download, never for streaming, as you can see in the format design. You can't
really expect it to work great for something it isn't supposed to to.
Anyways, without seek support, you cannot really *use* aa format as an audio
book: You loose the ability of jumping to chapters, skip back 30 secs, quit and
resume later where you left off, etc. My intention with this patch series was to
make aa more usable in that sense, even if it is not a perfect technical solution.
I don't know if that is good enough to warrant inclusion in the ffmpeg codebase.
>
> Thanks
>
Thank *you* for the great feedback!
Cheers, Karsten
More information about the ffmpeg-devel
mailing list