[FFmpeg-devel] update data_offset field in format context

Baptiste Coudurier baptiste.coudurier
Fri Nov 7 00:56:28 CET 2008

Hi guys,

Michael Niedermayer wrote:
> On Thu, Nov 06, 2008 at 05:50:46PM +0200, Yoav Steinberg wrote:
>> Michael Niedermayer wrote:
>>> On Wed, Nov 05, 2008 at 05:03:28PM +0200, Yoav Steinberg wrote:
>>>> Hi,
>>>> I've come across some instances where the data_offset field of 
>>>> AVFormatContext isn't updated after opening a file for input 
>>>> (av_open_input_file). From the comment in the header it seems that the 
>>>> data_offset field should represent the position in the input where the 
>>>> header ends and the data begins. In some cases the header parsing done 
>>>> during file input seems to run to the end of the input and isn't restored 
>>>> to the position where the data begins, yielding an invalid data_offset 
>>>> value equal to the file size (specifically this recreates when calling 
>>>> av_open_input_file on a mov file).
>>>> I've add some code which attempts to provide a more accurate data_offset 
>>>> value for such files based on the index_entries table (if one is 
>>>> available). This seems to work for me. It'll be cool if this is added to 
>>>> the trunk or if someone can explain why not to add this.
>>>> (My code is attached).
>>> This is not a proper solution to the problem, it also adds a obscure and
>>> more importantly completely undocumented behavior to index entries.
>>> For a proper solution (aka anything that might be accepted into svn)
>>> the first step is a full explanation of what is wrong, basically, if
>>> it cannot be reproduced exactly its not a full explanation.
>>> second would be the question if its easier to fix the affected demuxers
>>> or to change the core to guess the offset. Either way all demuxers must
>>> be looked at, in the first case to find&fix them in the second to ensure
>>> the core change works with all.
>>> [...]
>> In my specific application (using libavformat) I'm interested in using 
>> the data_offset field to figure out how much of the file is used for 
>> data and how much is used for "headers". This is for some general file 
>> "rating" system which isn't relevant to our discussion. I found the 
>> data_offsted field useful since it's documented as the "offset of the 
>> first packet". Problem was that some demuxers leave pb at the end of the 
>> input after after calling read_header. Since I wasn't sure if changing 
>> this behavior in each rogue demuxer is a good idea I found another 
>> solution which should work (and actually does work for my tested cases) 
>> independently of whether the demuxer seeks back or not after read_header.
>> Just as a note this solution was required for "mov" demuxer since its 
>> read_header reads the file to the end (if possible).
>> Question is whether the data_offset is something I should theoretically 
>> be able to count on, or whether it's just a helper utility for any 
>> demuxer that wants some place to save the data offset (without adding a 
>> private field).
>> Currently the:
>>          if (pb && !ic->data_offset)
>>              ic->data_offset = url_ftell(ic->pb);
>> in the core attempts to use the current position if it wasn't set by the 
>> demuxer, indicating a "best guess" policy. I was attempting in the patch 
>> to improve the guessing by employing the index entries table when available.
> well i didnt write these 2 lines IIRC so i can nt say for sure but not every
> piece of common code is a "best guess code"
> one very well could see it the other way around, that its factorized code
> from demuxers and only executed when its exactly correct.
>> I'd be willing to add a data_offset setting in the "mov" demuxer if lack 
>> of valid data_offset after reading the mov header is considered a bug. 
>> But I guess that if a valid data_offset is required only if the packet 
>> reading depends on it then having crap in the data_offset after reading 
>> the header isn't a bug. And in that case I can't complain...
>> What do you think?
> I think we should let baptiste who is mov maintainer comment but AFAICS
> data_offset has not much  meaning for mov. Headers can at least be at the
> begin or the end, and possibly even in the middle.
> Also a file with data chunks randomly shuffled and the first packet at
> the end and last one at the begin should be valid ...

I think you are right, Michael, this has not much meaning for mov/mp4,
and yes I believe too, that what you describe should be valid,
especially that now you can have sprinkled data chunks with fragments.

However, considering your problem, Yoav, you will have luck by checking
the first AVIndexEntry of each AVStream, indeed, because mov demuxer
currently constructs full index when reading header, and this
information is exported through libavformat API, this is not applicable
to every demuxer though.

Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
checking for life_signs in -lkenny... no

More information about the ffmpeg-devel mailing list