[FFmpeg-devel] Integrating the mod engine into FFmpeg - what is the best design approach?

Stefano Sabatini stefano.sabatini-lala
Thu Aug 5 02:54:07 CEST 2010

On date Wednesday 2010-08-04 15:36:25 +0200, Sebastian Vater encoded:
> Yes, we can do this later. Anyway, in the meanwhile I made some thoughts
> on a more precise generic integration plan for lavseq.
> As you currently see, I have a directory lavseq containing:
> avsequencer.h (the connection to rest of FFmpeg).
> To give an overview of the AVSequencer:
> At root, we have avsequencer.h, i.e. the AVSequencerContext which is the
> only structure linking to remaining of FFmpeg.
> root:
> AVSequencerContext contains a list of modules (module.h and module
> handling is implemented in module.c), the playback handler and a list of
> available mixing engines.
> depth 1:
> AVSequencerModule contains songs, instruments, keyboard definitions,
> arpeggio definitions and envelope structures.
> depth 2:
> AVSequencerSong contains tracks, an order list which references the
> track numbers being played for each channel (since the sequencer is
> internally track instead of pattern based to allow different speeds.
> AVSequencerInstrument contains samples, which can be assigned using the
> keyboard definition. For example, I tell the instrument to use sample
> number 1 for C-5 but number 2 instead for C-6.
> Instruments also determine how envelopes are used (you can assign them
> to vibrato, tremolo, volume handling, etc.).
> AVSequencerEnvelope contains the actual envelope data and also it
> properties like loop points.
> AVSequencerKeyboard contains the octave/note -> sample mapping for all
> notes from C-0 to B-9 which are 120 entries (10 octaves * 12 notes per
> octave).
> AVSequencerArpeggio is mostly like AVSequencerEnvelope with the
> difference you can specify a custom arpeggio layout and the structure is
> designed for that.
> depth 3:
> AVSequencerSample contains the sample loop points, auto vibrato
> envelopes and also the PCM data (the PCM data should later be obtained
> by the lavc, so you can also directly use ogg/mp3/flac/wav/etc.). It
> also contains a reference to AVSequencerSynth if it is a programmable
> synth sound.
> depth 4:
> AVSequencerSynth contains a list of "machine code" instructions for
> programming the synth sound "DSP", a symbol table for human-readability
> and properties like initial variables (16 general purpose registers).
> Hierarchy overview:
> SequencerContext (avsequencer.[hc])
>     Module (module.[hc])
>         Song (song.[hc])
>             Track (track.[hc])
>                 TrackData
>                     TrackDataEffect
>             OrderList (order.[hc])
>                 OrderData
>         Instrument (instr.[hc])
>             Sample (sample.[hc])
>                 SynthSound (synth.[hc])
>         Envelope (instr.[hc])
>         Keyboard (instr.[hc])
>         Arpeggio (instr.[hc])
>     Mixer (allmixers.c, mixer.[hc])
>         Null mixer (null_mix.[hc])
>         Low quality PCM mixer (lq_mix.[hc])
>         High quality PCM mixer (hq_mix.[hc])
>         FUTURE: OPL2/3 (AdLib/etc.) FM synthesizer
>                         SID chip FM (as found in C64) synthesizer
>                         Floating point mixers
>     Player (player.[hc])

OK that's a nice description of the whole BSS design.

Sebastian is currently working on this git branch:

> Open discussion points are:
> 1. Best way of integration into rest of FFmpeg

I'm resuming some of the designs which has been already proposed:
please correct me if some information is missing / uncorrect.

 The MOD decoder does just one thing: decode a AVPacket to a BSS. It
 does not know anything about the player (it doesn't even know _if_ it
 will be played or converted to other format or fed to a visualization code).
 3- Libavsequencer does just one thing: transforming a BSS in PCM audio.
 It knows nothing about file formats (it don't care or know if the BSS
 was made from a MOD file or recorded from a MIDI keyboard).

 That's why we insist in starting with the implementation of MOD ->  XM
 conversion: it is much simpler than MOD ->  PCM conversion, it doesn't
 need an implementation of libavsequencer.

                           mod file - metadata                      BSS +
                                                              sequencer SAMPLES
 MOD file -->  MOD demuxer -------------------->  MOD decoder  ------------------>  application

 Advantages of this approach as follows:
 - Allows for conversion from a format with more features to one with
 less doing no mixing or sampling
 - Makes each file format very modular (just reading the bitstream and
 filling up BSS)
 - Better integration with the way FFmpeg works ATM

 The demuxer decodes the file to a BSS an output it in an
 AVPacket. It would them define a CODEC_ID_SEQUENCER, and the decoder
 would be just a wrapper to libavsequencer to make the BSS -> PCM

 The advantage of this approach is that the concept of demuxing/decoder
 does not make much sense for these formats, so this avoid the
 artificial distriction. Moreover, it makes a nice distinction of
 transcoding from one MOD format to other (with -acodec copy) to
 decoding it to PCM. The disadvantages is that API-wise it's less clear
 for external applications to get the BSS data (reading the AVPacket
 payload). Besides, all the bit-reading API is part of lavc.


 There are technical reasons for both solutions. I'll try to give more
 info tomorrow.

> 2. How to do the mixer so it finally can playback channels in a way like:
>     Channel 0: raw PCM
>     Channel 1: ogg file
>     Channel 2: mp3 file
>     [..]
>     Channel 62: ImpulseTracker instrument file
>     Channel 63: GUS patch sound
>     Channel 64: flac file
>     (while all these can be played with different loop points, volumes,
>     panning positions, etc.)
> 3. What's the best approach for writing demuxers / decoders using the
>     AVSequencer

FFmpeg = Fast & Forgiving Mere Philosofic Elitarian Game

More information about the ffmpeg-devel mailing list