[FFmpeg-devel] [PATCH] h264/aac in flv

Sat May 24 00:04:03 CEST 2008

Michael Niedermayer wrote:
> On Fri, May 23, 2008 at 12:20:35PM -0700, Baptiste Coudurier wrote:
>> Hi Michael,
>>
>> Michael Niedermayer wrote:
>>> On Thu, May 22, 2008 at 08:12:27PM -0700, Baptiste Coudurier wrote:
>>>> Michael Niedermayer wrote:
>>>>> On Wed, May 07, 2008 at 04:39:05PM +0200, Baptiste Coudurier wrote:
>>>>>> Michael Niedermayer wrote:
>>>>>>> On Wed, May 07, 2008 at 03:38:02PM +0200, Baptiste Coudurier wrote:
>>>>>>>> Michael Niedermayer wrote:
>>>>>>>>> On Wed, May 07, 2008 at 12:36:42PM +0200, Baptiste Coudurier wrote:
>>>>>>>>>> Hi Michael,
>>>>>>>>>>
>>>>>>>>>> Michael Niedermayer wrote:
>>>>>>>>>>> On Tue, May 06, 2008 at 07:31:53PM +0200, Baptiste Coudurier wrote:
>>>>>>>>>>>> Michael Niedermayer wrote:
>>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>  static int get_audio_flags(AVCodecContext *enc){
>>>>>>>>>>>>>>>>>>      int flags = (enc->bits_per_sample == 16) ? FLV_SAMPLESSIZE_16BIT : FLV_SAMPLESSIZE_8BIT;
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> +    if (enc->codec_id == CODEC_ID_AAC) // specs force these parameters
>>>>>>>>>>>>>>>>>> +        return FLV_CODECID_AAC | FLV_SAMPLERATE_44100HZ | FLV_SAMPLESSIZE_16BIT | FLV_STEREO;
>>>>>>>>>>>>>>>>> Is this also correct for AAC streams for which these arent true? Or are
>>>>>>>>>>>>>>>>> such streams just not supported?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Streams are supported (like mp3 48khz btw), and play well. Like written,
>>>>>>>>>>>>>>>> specs mandates these values.
>>>>>>>>>>>>>>> I know but what about lets say 48khz AAC or 22khz AAC your code would mux this
>>>>>>>>>>>>>>> with a claimed samplerate of 44khz.
>>>>>>>>>>>>>>> Is such 22khz AAC and 48khz AAC legal in flv accoridng to spec or is just
>>>>>>>>>>>>>>> 44khz AAC allowed?
>>>>>>>>>>>>>>> If later then the muxer should reject AAC with samplerates differing from 
>>>>>>>>>>>>>>> 44khz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Rofl:
>>>>>>>>>>>>>> "Sampling rate
>>>>>>>>>>>>>> For AAC: always 3"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is this ok for you ?
>>>>>>>>>>>>> Ive not doubted that this has to be set to 3. The question is if
>>>>>>>>>>>>> 48khz/22khz AAC is allowed in flv or not. If one takes the spec literally
>>>>>>>>>>>>> then the awnser is clearly no.
>>>>>>>>>>>> Well, I would say it is probably yes, considering aac and
>>>>>>>>>>>> AudioSpecificConfig.
>>>>>>>>>>>>
>>>>>>>>>>>>> But your muxer would store them blindly
>>>>>>>>>>>>> with a claimed sample rate of 44khz.
>>>>>>>>>>>> Well, what can I say ? It's not clearly mentioned.
>>>>>>>>>>>> I fear they used the same crap as mp4, AudioSpecificConfig mentions the
>>>>>>>>>>>> correct sample rate and channels number.
>>>>>>>>>>>> This is the case for channels too:
>>>>>>>>>>>>
>>>>>>>>>>>> "SoundType UB[1]
>>>>>>>>>>>> 0 = sndMono
>>>>>>>>>>>> 1 = sndStereo
>>>>>>>>>>>> Mono or stereo sound
>>>>>>>>>>>> For Nellymoser: always 0
>>>>>>>>>>>> For AAC: always 1"
>>>>>>>>>>>>
>>>>>>>>>>>> Im in favor of muxing this way.
>>>>>>>>>>> Iam ok with that if the official software also generates such 22/48khz AAC.
>>>>>>>>>>>
>>>>>>>>>> Updated patches considering signed ctts offset.
>>>>>>>>>>
>>>>>>>>>> Now there is a problem with timestamps, also with mov (which use signed
>>>>>>>>>> offsets), because pts can be < dts.
>>>>>>>>> dts is the time at which a packet enters the decoder
>>>>>>>>> pts is the time at which the correspoding decoded packet leaves the decoder
>>>>>>>>>
>>>>>>>>> Thus dts <= pts
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also quoting  ISO/IEC 14496-12 Second edition 2005-04-01 Corrected version 2005-10-01:
>>>>>>>>> ---
>>>>>>>>> This box provides the offset between decoding time and composition time. Since decoding time must be less
>>>>>>>>>                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> than the composition time, the offsets are expressed as unsigned numbers such that CT(n) = DT(n) +
>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>> CTTS(n) where CTTS(n) is the (uncompressed) table entry for sample n.
>>>>>>>> FYI, this will be soon obsoleted by an amendment,
>>>>>>> link?
>>>>>> Sent the draft I have on hands.
>>>>> thanks
>>>>> I hope this draft is rejected, its insane.
>>>>> The problem is that the first dts in mov/mp4 is implicitly 0.
>>>>> Which doesnt work out because that just isnt always the case. Solution
>>>>> would be to specify this first dts.
>>>>>
>>>>> What they do is change the pts-dts offsets to signed so the pts are correct
>>>>> while the dts are no longer correct. (they also add various optional fields
>>>>> to allow one to guess the correct dts but optional == useless)
>>>>>
>>>>> The obvious implementation is to leave dts at AV_NOPTS_VALUE when there
>>>>> is a version 1 ctts. We could also attempt to guess the dts but this is
>>>>> not always possible. IP mpeg2 vs low delay IP mpeg2 being an example
>>>>> which is ambigous.
>>>>>
>>>> Well, I'll just let current code (ignore negative ctts) for mov/mp4 as is.
>>>> It seems a full H264 header parser is becoming really needed to
>>>> correctly compute pts (it seems gpac is good for raw h264 streams, it
>>>> handles bpyramid quite nicely).
>>>>
>>>> In the meantime we can still generate flv with positive dts by adding
>>>> delay.
>>> yes
>>>
>>>
>>>> Demuxer still handles cts as signed, this should cause no real harm,
>>>> except maybe some user complaining and I think Im ok with this, it will
>>>> remind me to think of a correct and good solution ;)
>>> Iam strongly against any demuxer ever returning pts < dts for undamaged
>>> streams.
>> Thing is we would loose pts for streams we would create and I don't know
> 
> I dont think so. What my suggestion is, is that we would set
> dts to AV_NOPTS_VALUE if theres any possibility that pts<dts might occur in
> a stream. After all the pts values are correct its the dts values which are
> not

Yes, I understand that, problem is current code in compute_pkt_fields,
which will set dts to pts :(
Honestly I don't feel like messing with this code right now, Im pretty
sure to screw something up.

> [...]
> 
>> Or do you want it set to
>> 64 in all cases for demuxers ?
> 
> i need to think about this more ...
> 

Ok.

> [...]
>
>> @@ -253,6 +261,38 @@
>>      url_fseek(pb, data_size + 10 - 3, SEEK_CUR);
>>      put_be32(pb, data_size + 11);
>>  
>> +    for (i = 0; i < s->nb_streams; i++) {
>> +        AVCodecContext *enc = s->streams[i]->codec;
>> +        if (enc->codec_id == CODEC_ID_AAC) {
>> +            put_byte(pb, FLV_TAG_TYPE_AUDIO);
>> +            put_be24(pb, enc->extradata_size + 2);
>> +            put_be24(pb, 0); // ts
>> +            put_byte(pb, 0); // ts ext
>> +            put_be24(pb, 0); // streamid
>> +            put_byte(pb, get_audio_flags(enc));
>> +            put_byte(pb, 0); // AAC sequence header
>> +            put_buffer(pb, enc->extradata, enc->extradata_size);
>> +            put_be32(pb, enc->extradata_size + 2 + 11); // previous tag size
>> +        } else if (enc->codec_id == CODEC_ID_H264) {
>> +            offset_t pos;
>> +            put_byte(pb, FLV_TAG_TYPE_VIDEO);
>> +            put_be24(pb, 0); // size, patched later
>> +            put_be24(pb, 0); // ts
>> +            put_byte(pb, 0); // ts ext
>> +            put_be24(pb, 0); // streamid
>> +            pos = url_ftell(pb);
>> +            put_byte(pb, enc->codec_tag | FLV_FRAME_KEY); // flags
>> +            put_byte(pb, 0); // AVC sequence header
>> +            put_be24(pb, 0); // composition time
>> +            ff_isom_write_avcc(pb, enc->extradata, enc->extradata_size);
>> +            data_size = url_ftell(pb) - pos;
>> +            url_fseek(pb, -data_size - 10, SEEK_CUR);
>> +            put_be24(pb, data_size);
>> +            url_fseek(pb, data_size + 10 - 3, SEEK_CUR);
>> +            put_be32(pb, data_size + 11); // previous tag size
>> +        }
>> +    }
> 
> I think this could be simplified with a
> if(enc->codec_id == CODEC_ID_AAC || enc->codec_id == CODEC_ID_H264)
> 
> and by using the fseek() based size patching for both cases.

Yes it seems, patch updated.

>> [...]
>>
>> +        size = pkt->size;
>> +        /* cast needed to get negative value */
>> +        if (!flv->delay && (int32_t)pkt->dts < 0)
>> +            flv->delay = -(int32_t)pkt->dts;
>> +    }
>> +
>> +    ts = pkt->dts + flv->delay; // add delay to force positive dts
>>      put_be24(pb,size + flags_size);
>
> [...]
> 
>> +        put_be24(pb,pkt->pts - ts + flv->delay);
> 
> put_be24(pb,pkt->pts - pkt->dts);
> 

Yes, cast needed though, if you prefer this way.

> 
>> +    }
>>      put_buffer(pb, pkt->data, size);
>>      put_be32(pb,size+flags_size+11); // previous tag size
>> -    flv->duration = pkt->pts + pkt->duration;
>> +    flv->duration = FFMAX(pkt->pts + pkt->duration, flv->duration);
> 
> seperate comit (this might apply to a few other things as well)
>  

Sure.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Smartjog USA Inc.                                http://www.smartjog.com
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264_flvenc.patch
Type: text/x-diff
Size: 5362 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080523/4d5c80f0/attachment.patch>