[FFmpeg-devel] Microsoft Smooth Streaming
Marcus Nascimento
marcus.cps at gmail.com
Wed Oct 26 15:16:28 CEST 2011
Thanks very much for your answer. That was helpful.
I'll study a little a get back asap.
On Wed, Oct 26, 2011 at 12:25 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
> On Tue, Oct 25, 2011 at 09:25:22PM -0200, Marcus Nascimento wrote:
> > Please, check the answers bellow.
> >
> > Thank you very much in advance.
>
> We have to thank you for the excelent explanation.
> Also iam CCing this to baptiste who is our mov/mp4 expert. He probably
> can help in explaining how to best connect all the things together.
>
>
Perfect! Thanks for that.
>
> >
> >
> > On Tue, Oct 25, 2011 at 3:54 PM, Nicolas George <
> > nicolas.george at normalesup.org> wrote:
> >
> > > Le quartidi 4 brumaire, an CCXX, Marcus Nascimento a écrit :
> > > > I'd like to extend FFMpeg to support Microsoft Smooth Streaming
> > > (streaming
> > > > playback), the same way it has been done by all the available
> Silverlight
> > > > players.
> > >
> > > Contributions are always welcome on principle.
> > >
> > > > By now I do not intend to dump data to a file to be played locally or
> > > > anything like that. And probably will never intend to do that. I just
> > > want
> > > > to play it.
> > >
> > > If it can play it, then it can also dump it to a file. I hope you were
> not
> > > counting otherwise.
> > >
> > >
> > Definitely not. I was only worried about legal issues. Don't want to
> cause
> > trouble to FFMpeg or something like that.
> >
> >
> > > > I did some research in this mail list and find out some posts that
> talked
> > > > about that before.
> > > > However I couldn't find in depth information or anything beyond the
> point
> > > > I'm stuck.
> > > >
> > > > I've done a lot of research on MS Smooth Streaming theory of
> operation,
> > > > studied some ISOFF (and PIFF) and some more.
> > > > It is pretty clear to me how MS Smooth Streaming works. Now it is
> time to
> > > > focus on how to do that in the FFMpeg way.
> > > >
> > > > First things first, I'd like to know how a streaming should be
> processed
> > > in
> > > > order to be played by FFMpeg.
> > >
> > > I believe you would receive more relevant replies faster if you took a
> few
> > > minutes to describe an overview of how the protocol works.
> > >
> > >
> > Right away. I'll give as many details as necessary here. Prepare yourself
> > for some reading!
> >
> > First of all, Microsoft Smooth Streaming basic idea is to encode the same
> > video in multiple bitrates. The client can decide which bitrate to use.
> At
> > any time it is possible to switch to another bitrate based on bandwidth
> > availability and other measurements.
> > Each encoding bitrate will originate an independent ISMV file (IIS Smooth
> > Media Video I supose).
> > The encoding keeps focus in the idea of fragmented structure that ISOFF
> (ISO
> > File Format - the MP4 file format) offers. Keyframes are generated
> regularly
> > and equally spaced in all ISMV files (2s).
> > This is more restrictive than regular encoding procedures that allow some
> > flexibility on keyframe intervals (I believe it, since I'm not an
> specialist
> > on that).
> > Important to say that all fragments always start with a keyframe.
> > Each ISOFF fragment is perfectly aligned between different bitrates (in
> > terms of time, of course. Data size may vary drastically). That alignment
> > allows the client to request different bitrates for one fragment and
> switch
> > to another bitrate in the next fragment.
> >
> > The ISMV file format is called PIFF and is based on the ISOFF with a few
> > additions. There are 3 uuid box types that are dedicated to DRM purposes
> (I
> > wont touch them here). Thus the meaning of PIFF: Protected Interoperable
> > File Format. The PIFF brand (ftyp box value) is "piff".
> > More on PIFF format here: http://go.microsoft.com/?linkid=9682897
> >
> > The server side (in the MS implementation) is just an extension to the
> IIS
> > called IIS Media Services.
> > That is just a web service that accepts HTTP requests with a custom
> > formatted URL.
> > The base URL is something like http://domain.com/video.ism (note that is
> not
> > ISMV), which is never requested.
> >
> > By the time the client wants to play a video, it will request a Manifest
> > file. The URL is <baseUrl>/Manifest.
> > The Manifest is just a XML file that provides some information regarding
> > different streams and other information.
> > Here is a basic example (modified parts of the original found here:
> >
> http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720.ism/Manifest
> > ):
> >
> > <SmoothStreamingMedia MajorVersion="2" MinorVersion="1"
> > Duration="1209510000">
> > <StreamIndex Type="video" Name="video" Chunks="4" QualityLevels="2"
> > MaxWidth="1280" MaxHeight="720" DisplayWidth="1280" DisplayHeight="720"
> > Url="QualityLevels({bitrate})/Fragments(video={start time})">
> > <QualityLevel Index="0" Bitrate="2962000" FourCC="H264" MaxWidth="1280"
> > MaxHeight="720"
> >
> CodecPrivateData="000000016764001FAC2CA5014016EFFC100010014808080A000007D200017700C100005A648000B4C9FE31C6080002D3240005A64FF18E1DA12251600000000168E9093525"/>
> > <QualityLevel Index="1" Bitrate="2056000" FourCC="H264" MaxWidth="992"
> > MaxHeight="560"
> >
> CodecPrivateData="000000016764001FAC2CA503E047BFF040003FC52020202800001F480005DC03030003EBE8000FAFAFE31C6060007D7D0001F5F5FC6387684894580000000168E9093525"/>
> > <c d="20020000"/>
> > <c d="20020000"/>
> > <c d="20020000"/>
> > <c d="6670001"/>
> > </StreamIndex>
> > <StreamIndex Type="audio" Index="0" Name="audio" Chunks="4"
> > QualityLevels="1" Url="QualityLevels({bitrate})/Fragments(audio={start
> > time})">
> > <QualityLevel FourCC="AACL" Bitrate="128000" SamplingRate="44100"
> > Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255"
> > CodecPrivateData="1210"/>
> > <c d="20201360"/>
> > <c d="19969161"/>
> > <c d="19969161"/>
> > <c d="8126985"/>
> > </StreamIndex>
> > </SmoothStreamingMedia>
> >
> > We can see it says the version of the smooth stream media and the
> duration
> > (this is measured in 1 / 10,000,000 seconds).
> > Next we see the video section which says each quality level has 4 chunks
> > (fragments), with 2 quality levels available. It also says the video
> > dimensions and the URL format.
> > Next it gives information about each bitrate with codec information and
> > codec private data (I believe it is used to configure the codec is a
> opaque
> > way).
>
> CodecPrivateData looks like H.264 SPS and PPS NAL units from a quick
> look. This should be decoded hex->binary and placed in extradata or
> injected into the bitstream. FFmpegs decoders are quite forgiving on
> where and how they get this data normally ...
>
>
Indeed, it seems you are right about CodecPrivateData. I found this at
google:
http://social.expression.microsoft.com/Forums/en/encoder/thread/c59d5c75-e9e0-482a-ad77-37ea1668d90f
It seems the CodecPrivateData won't be a problem. I'll postpone this detail
for later studying.
>
> > Next it lists each fragment size. The first fragment would be referenced
> as
> > 0 (zero), and the others as a sum of previous fragments size. I'm not
> sure
> > exactly what those values mean.
> > Next we have the same structure for audio description.
> >
> > After getting the Manifest file, the client must decide which quality
> level
> > is best suited for the device and its resources.
> > It is not clear to me on what parameters it bases it's decisions. I heard
> > about size of the screen and its resolution, computing power, download
> > bandwidth, etc.
> > As soon as the quality level is chosen, I suppose the decoder has to be
> > configured in a suitable way, using the CodecPrivateData information
> > provided.
> > The client then will start requesting fragments following the URL pattern
> > given in the Manifest.
> > To request the first fragment for the first quality level, it would
> follow
> > the <baseUrl>/QualityLevel(0)/Fragments(video=0).
> > To request the forth fragment for the second quality level, it would
> follow
> > the <baseUrl>/QualityLevel(1)/Fragments(video=60060000).
> > It is still possible to request just the audio following the same idea.
> For
> > instance: <baseUrl>/QualityLevels(0)/Fragments(audio=20201360).
> >
> > Each fragment received is arranged in PIFF wire format. In other words:
> > Contains exactly one moof box and exactly one mdat box and nothing
> > more (check MP4 specs for more info).
> > Of course there are internal boxes to those if applicable. It may contain
> > custom uuid boxes designed to allow DRM protection. Lets not consider
> them
> > here.
> > I'm not sure which information I can get from the moof boxes, but I
> assume
> > it would be relevant for the demuxer only, since the codec would only
> work
> > on the mdat contained opaque data. Correct me if I'm wrong, please.
> >
>
> > The client would apply some heuristics while requesting fragments and
> > sometime it may decide to switch to another quality level. I suppose it
> > would have to reconfigure the decoder and repeat it over and over until
> the
> > end of that.
>
> most likely no reconfiguration is needed, simply feeding the next
> "fragment" to the decoder might work fine.
> the decoder should detect changes and reconfigure itself.
>
>
Even better. Interaction between the (custom) demuxer and the decoder is
probably simpler than I thought.
>
> [...]
> > > > 2 - A very simple external code just request FFMpeg to play a smooth
> > > > streaming media. FFMpeg will detect this is a HTTP based media and
> will
> > > > request the manifest file for that (I believe I'd have to create a
> custom
> > > > HTTP based solution for that). By the time the manifest is available,
> > > ffmpeg
> > > > would configure the decoder. Then makes further HTTP requests same
> way as
> > > in
> > > > 1.
> > >
> > > There is already HTTP client code, as surely you know.
> > >
> > >
> > Yes. I've seen something about it. It looks suitable for the case.
> > It may be my starting point for studying. But I still feel like in need
> for
> > some big picture on how ffmpeg works in general.
>
> What we have basically are demuxers and protocols.
> Protocols are things that (for our purpose here) provide a bytestream
> from some url and may provide seeking support.
> Demuxers are things that on top of a protocol (or other things)
> produce data packets for various streams
>
> What you describe can be implemented either as protocol that works
> on top of a http protocol and which than feeds its data to a mp4
> demuxer (which possibly needs modifications to handle the data)
>
> or
>
> A demuxer that works on top of a http protocol and has a instance of
> a mp4 demuxer to which it passes the data.
>
> There are other ways too ...
>
>
I'm digging into the FFMpeg code to learn about that.
Maybe Baptiste comes up with some more information regarding this.
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> No human being will ever know the Truth, for even if they happen to say it
> by chance, they would not even known they had done so. -- Xenophanes
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
--
Marcus Nascimento
More information about the ffmpeg-devel
mailing list