[FFmpeg-devel] [PATCH] h264/aac in flv

Fri May 23 22:56:33 CEST 2008

On Fri, May 23, 2008 at 12:20:35PM -0700, Baptiste Coudurier wrote:
> Hi Michael,
> 
> Michael Niedermayer wrote:
> > On Thu, May 22, 2008 at 08:12:27PM -0700, Baptiste Coudurier wrote:
> >> Michael Niedermayer wrote:
> >>> On Wed, May 07, 2008 at 04:39:05PM +0200, Baptiste Coudurier wrote:
> >>>> Michael Niedermayer wrote:
> >>>>> On Wed, May 07, 2008 at 03:38:02PM +0200, Baptiste Coudurier wrote:
> >>>>>> Michael Niedermayer wrote:
> >>>>>>> On Wed, May 07, 2008 at 12:36:42PM +0200, Baptiste Coudurier wrote:
> >>>>>>>> Hi Michael,
> >>>>>>>>
> >>>>>>>> Michael Niedermayer wrote:
> >>>>>>>>> On Tue, May 06, 2008 at 07:31:53PM +0200, Baptiste Coudurier wrote:
> >>>>>>>>>> Michael Niedermayer wrote:
> >>>>>>>>>>>>>>>> [...]
> >>>>>>>>>>>>>>>>  
> >>>>>>>>>>>>>>>>  static int get_audio_flags(AVCodecContext *enc){
> >>>>>>>>>>>>>>>>      int flags = (enc->bits_per_sample == 16) ? FLV_SAMPLESSIZE_16BIT : FLV_SAMPLESSIZE_8BIT;
> >>>>>>>>>>>>>>>>  
> >>>>>>>>>>>>>>>> +    if (enc->codec_id == CODEC_ID_AAC) // specs force these parameters
> >>>>>>>>>>>>>>>> +        return FLV_CODECID_AAC | FLV_SAMPLERATE_44100HZ | FLV_SAMPLESSIZE_16BIT | FLV_STEREO;
> >>>>>>>>>>>>>>> Is this also correct for AAC streams for which these arent true? Or are
> >>>>>>>>>>>>>>> such streams just not supported?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Streams are supported (like mp3 48khz btw), and play well. Like written,
> >>>>>>>>>>>>>> specs mandates these values.
> >>>>>>>>>>>>> I know but what about lets say 48khz AAC or 22khz AAC your code would mux this
> >>>>>>>>>>>>> with a claimed samplerate of 44khz.
> >>>>>>>>>>>>> Is such 22khz AAC and 48khz AAC legal in flv accoridng to spec or is just
> >>>>>>>>>>>>> 44khz AAC allowed?
> >>>>>>>>>>>>> If later then the muxer should reject AAC with samplerates differing from 
> >>>>>>>>>>>>> 44khz.
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Rofl:
> >>>>>>>>>>>> "Sampling rate
> >>>>>>>>>>>> For AAC: always 3"
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is this ok for you ?
> >>>>>>>>>>> Ive not doubted that this has to be set to 3. The question is if
> >>>>>>>>>>> 48khz/22khz AAC is allowed in flv or not. If one takes the spec literally
> >>>>>>>>>>> then the awnser is clearly no.
> >>>>>>>>>> Well, I would say it is probably yes, considering aac and
> >>>>>>>>>> AudioSpecificConfig.
> >>>>>>>>>>
> >>>>>>>>>>> But your muxer would store them blindly
> >>>>>>>>>>> with a claimed sample rate of 44khz.
> >>>>>>>>>> Well, what can I say ? It's not clearly mentioned.
> >>>>>>>>>> I fear they used the same crap as mp4, AudioSpecificConfig mentions the
> >>>>>>>>>> correct sample rate and channels number.
> >>>>>>>>>> This is the case for channels too:
> >>>>>>>>>>
> >>>>>>>>>> "SoundType UB[1]
> >>>>>>>>>> 0 = sndMono
> >>>>>>>>>> 1 = sndStereo
> >>>>>>>>>> Mono or stereo sound
> >>>>>>>>>> For Nellymoser: always 0
> >>>>>>>>>> For AAC: always 1"
> >>>>>>>>>>
> >>>>>>>>>> Im in favor of muxing this way.
> >>>>>>>>> Iam ok with that if the official software also generates such 22/48khz AAC.
> >>>>>>>>>
> >>>>>>>> Updated patches considering signed ctts offset.
> >>>>>>>>
> >>>>>>>> Now there is a problem with timestamps, also with mov (which use signed
> >>>>>>>> offsets), because pts can be < dts.
> >>>>>>> dts is the time at which a packet enters the decoder
> >>>>>>> pts is the time at which the correspoding decoded packet leaves the decoder
> >>>>>>>
> >>>>>>> Thus dts <= pts
> >>>>>>>
> >>>>>>>
> >>>>>>> Also quoting  ISO/IEC 14496-12 Second edition 2005-04-01 Corrected version 2005-10-01:
> >>>>>>> ---
> >>>>>>> This box provides the offset between decoding time and composition time. Since decoding time must be less
> >>>>>>>                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>>> than the composition time, the offsets are expressed as unsigned numbers such that CT(n) = DT(n) +
> >>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>>> CTTS(n) where CTTS(n) is the (uncompressed) table entry for sample n.
> >>>>>> FYI, this will be soon obsoleted by an amendment,
> >>>>> link?
> >>>> Sent the draft I have on hands.
> >>> thanks
> >>> I hope this draft is rejected, its insane.
> >>> The problem is that the first dts in mov/mp4 is implicitly 0.
> >>> Which doesnt work out because that just isnt always the case. Solution
> >>> would be to specify this first dts.
> >>>
> >>> What they do is change the pts-dts offsets to signed so the pts are correct
> >>> while the dts are no longer correct. (they also add various optional fields
> >>> to allow one to guess the correct dts but optional == useless)
> >>>
> >>> The obvious implementation is to leave dts at AV_NOPTS_VALUE when there
> >>> is a version 1 ctts. We could also attempt to guess the dts but this is
> >>> not always possible. IP mpeg2 vs low delay IP mpeg2 being an example
> >>> which is ambigous.
> >>>
> >> Well, I'll just let current code (ignore negative ctts) for mov/mp4 as is.
> >> It seems a full H264 header parser is becoming really needed to
> >> correctly compute pts (it seems gpac is good for raw h264 streams, it
> >> handles bpyramid quite nicely).
> >>
> > 
> >> In the meantime we can still generate flv with positive dts by adding
> >> delay.
> > 
> > yes
> > 
> > 
> >> Demuxer still handles cts as signed, this should cause no real harm,
> >> except maybe some user complaining and I think Im ok with this, it will
> >> remind me to think of a correct and good solution ;)
> > 
> > Iam strongly against any demuxer ever returning pts < dts for undamaged
> > streams.
> 
> Thing is we would loose pts for streams we would create and I don't know

I dont think so. What my suggestion is, is that we would set
dts to AV_NOPTS_VALUE if theres any possibility that pts<dts might occur in
a stream. After all the pts values are correct its the dts values which are
not

> any other tool creating h264 flv, so this is sad IMHO...
> This will make impossible stream copy to .mov/.mp4 for ex ...
> 
> > Could you split the pts unrelated code out? I think we could commit that
> 
> Sure, patch attached, and patch without pts handling attached too.
> 

> Can I change av_set_pts_info to 32 bits too ? 

ok

> Or do you want it set to
> 64 in all cases for demuxers ?

i need to think about this more ...

[...]

> Index: libavformat/flvdec.c
> ===================================================================
> --- libavformat/flvdec.c	(revision 13243)
> +++ libavformat/flvdec.c	(working copy)
> @@ -51,6 +51,7 @@
> @@ -79,6 +80,9 @@
> @@ -278,20 +282,33 @@
> @@ -337,7 +354,7 @@
> @@ -369,6 +386,22 @@
> @@ -376,7 +409,7 @@

ok assuming the cosmetics get commited seperatly

> Index: libavformat/flvenc.c
> ===================================================================
> --- libavformat/flvenc.c	(revision 13243)
> +++ libavformat/flvenc.c	(working copy)
> @@ -21,6 +21,7 @@
> @@ -30,6 +31,7 @@
> @@ -39,6 +41,7 @@
> @@ -50,11 +53,15 @@
> @@ -75,6 +82,7 @@

ok

> @@ -253,6 +261,38 @@
>      url_fseek(pb, data_size + 10 - 3, SEEK_CUR);
>      put_be32(pb, data_size + 11);
>  
> +    for (i = 0; i < s->nb_streams; i++) {
> +        AVCodecContext *enc = s->streams[i]->codec;
> +        if (enc->codec_id == CODEC_ID_AAC) {
> +            put_byte(pb, FLV_TAG_TYPE_AUDIO);
> +            put_be24(pb, enc->extradata_size + 2);
> +            put_be24(pb, 0); // ts
> +            put_byte(pb, 0); // ts ext
> +            put_be24(pb, 0); // streamid
> +            put_byte(pb, get_audio_flags(enc));
> +            put_byte(pb, 0); // AAC sequence header
> +            put_buffer(pb, enc->extradata, enc->extradata_size);
> +            put_be32(pb, enc->extradata_size + 2 + 11); // previous tag size
> +        } else if (enc->codec_id == CODEC_ID_H264) {
> +            offset_t pos;
> +            put_byte(pb, FLV_TAG_TYPE_VIDEO);
> +            put_be24(pb, 0); // size, patched later
> +            put_be24(pb, 0); // ts
> +            put_byte(pb, 0); // ts ext
> +            put_be24(pb, 0); // streamid
> +            pos = url_ftell(pb);
> +            put_byte(pb, enc->codec_tag | FLV_FRAME_KEY); // flags
> +            put_byte(pb, 0); // AVC sequence header
> +            put_be24(pb, 0); // composition time
> +            ff_isom_write_avcc(pb, enc->extradata, enc->extradata_size);
> +            data_size = url_ftell(pb) - pos;
> +            url_fseek(pb, -data_size - 10, SEEK_CUR);
> +            put_be24(pb, data_size);
> +            url_fseek(pb, data_size + 10 - 3, SEEK_CUR);
> +            put_be32(pb, data_size + 11); // previous tag size
> +        }
> +    }

I think this could be simplified with a
if(enc->codec_id == CODEC_ID_AAC || enc->codec_id == CODEC_ID_H264)

and by using the fseek() based size patching for both cases.

> @@ -280,13 +320,17 @@

ok

> @@ -309,18 +353,35 @@
>          put_byte(pb, FLV_TAG_TYPE_AUDIO);
>      }
>  
> +    if (enc->codec_id == CODEC_ID_H264) {
> +        if (ff_avc_parse_nal_units(pkt->data, &pkt->data, &pkt->size) < 0)
> +            return -1;
> +        assert(pkt->size);
> +        size = pkt->size;
> +        /* cast needed to get negative value */
> +        if (!flv->delay && (int32_t)pkt->dts < 0)
> +            flv->delay = -(int32_t)pkt->dts;
> +    }
> +
> +    ts = pkt->dts + flv->delay; // add delay to force positive dts
>      put_be24(pb,size + flags_size);
> -    put_be24(pb,pkt->pts);
> -    put_byte(pb,pkt->pts >> 24);
> +    put_be24(pb,ts);
> +    put_byte(pb,ts >> 24);
>      put_be24(pb,flv->reserved);
>      put_byte(pb,flags);
>      if (enc->codec_id == CODEC_ID_VP6)
>          put_byte(pb,0);
>      if (enc->codec_id == CODEC_ID_VP6F)
>          put_byte(pb, enc->extradata_size ? enc->extradata[0] : 0);
> +    else if (enc->codec_id == CODEC_ID_AAC)
> +        put_byte(pb,1); // AAC raw
> +    else if (enc->codec_id == CODEC_ID_H264) {
> +        put_byte(pb,1); // AVC NALU

> +        put_be24(pb,pkt->pts - ts + flv->delay);

put_be24(pb,pkt->pts - pkt->dts);

> +    }
>      put_buffer(pb, pkt->data, size);
>      put_be32(pb,size+flags_size+11); // previous tag size
> -    flv->duration = pkt->pts + pkt->duration;
> +    flv->duration = FFMAX(pkt->pts + pkt->duration, flv->duration);

seperate comit (this might apply to a few other things as well)

>  
>      put_flush_packet(pb);
>      return 0;

> @@ -342,4 +403,5 @@
>      flv_write_packet,
>      flv_write_trailer,
>      .codec_tag= (const AVCodecTag*[]){flv_video_codec_ids, flv_audio_codec_ids, 0},
> +    .flags= AVFMT_GLOBALHEADER,

ok

>  };
> Index: libavformat/flv.h
> ===================================================================
> --- libavformat/flv.h	(revision 13243)
> +++ libavformat/flv.h	(working copy)

ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080523/314658e8/attachment.pgp>