[FFmpeg-devel] [WIP] libcodec2 wrapper + de/muxer in FFmpeg

Fri Aug 4 20:20:47 EEST 2017

On Thu, 2017-08-03 at 22:24 +0200, Reimar Döffinger wrote:
> On Thu, Aug 03, 2017 at 12:40:04AM +0200, Tomas Härdin wrote:
> 
> > 
> > +    //statically assert the size of avpriv_codec2_header
> > +    //putting it here because all codec2 things depend on
> > codec2utils
> > +    switch(0) {
> > +    case 0:
> > +    case sizeof(avpriv_codec2_header) == 7: //if false then the
> > compiler will complain
> > +        break;
> > +    }
> I was pretty sure we had some kind of static assert already,
> based on negative array size in a struct (so it can work outside
> functions)...
> But doesn't matter due to next comments...
> 
> > 
> > +        *((avpriv_codec2_header*)avctx->extradata) =
> > avpriv_codec2_make_header(mode);
> I am pretty sure this is a strict aliasing violation.
> Even ignoring the ugliness and performance issues with
> returning structs and assigning them causing lots of
> pointless copies.
> Also, since all struct elements are bytes, it isn't
> really any more readable than just assigning to the
> byte array, with the names you not have in the struct
> in comments instead.
> And that is without going into all the ways someone could
> change that struct (e.g. in case of new features) that
> would completely break this kind of code due to endianness,
> alignment, ...
> Even just supporting both the current and a potential future version
> that changes any of the fields would be hard with this kind of code.

Aliasing violation is a good point, but it also doesn't matter here.
extradata isn't used anywhere else in the function.

The struct is mostly a handy way of getting names for all the bytes, so
a enum with offsets could work just as well:

enum avpriv_codec2_header_bytes {
  AVPRIV_CODEC2_MAGIC0 = 0,
  AVPRIV_CODEC2_MAGIC1,
  AVPRIV_CODEC2_MAGIC2,
  AVPRIV_CODEC2_VERSION_MAJOR,
  ...
};

Endianness is also not an issue since it's all bytes. But: one nice way
to do this is perhaps a pair of functions for parsing / creating the
struct out of the byte array and vice versa.

> > 
> > +        if (avctx->extradata_size != sizeof(avpriv_codec2_header))
> > {
> > +            av_log(avctx, AV_LOG_ERROR, "must have exactly %zu
> > bytes of extradata (got %i)\n",
> > +                   sizeof(avpriv_codec2_header), avctx-
> > >extradata_size);
> > +        }
> I would think at least if it is less you wouldn't want to just
> continue? Even if extradata is required to be padded (is it?),
> blindly selecting 0 as mode doesn't seem very likely to be right.

This has been fixed in the updated patchset

> > 
> > +    output = (int16_t *)frame->data[0];
> The codec2 documentation seems pretty non-existant, so I
> assume you are right that this is always in native endianness.
> However from what I can see codec2 actually uses the "short" type,
> not int16_t.
> While it shouldn't make a difference in practice, if casting anyway
> I'd suggest using the type matching the API.

Some kind of static assert for sizeof(short) == sizeof(int16_t) would
be nicest I think. If someone's on a platform with non-16-bit shorts
and they want --enable-libcodec2 to work then they can probably write
and submit a patch.

> > 
> > +    if (nframes > 0) {
> > +        *got_frame_ptr = 1;
> > +    }
> Not just *got_frame_ptr = nframes > 0; ?

Sure.

> > 
> > +    int16_t *samples = (int16_t *)frame->data[0];
> You are casting the const away.
> Did they just forget to add the const to the API?
> If so, can you suggest it to be added?
> Otherwise if it's intentional you need to make a copy.

No const in the API, but adding it is of course easy enough. That won't
get into Debian for some time however, so users would have to build
their own libcodec2.

Version #defines in <codec2/codec2.h> could help with this, currently
there are none so that's one way to detect the non-const API. I'll have
to run that past the other guys.

> > +    int ret;
> > +
> > +    if ((ret = ff_alloc_packet2(avctx, avpkt, avctx->block_align,
> > 0)) < 0) {
> If you merge the declaration and assignment it you would get
> for free not having the assignment inside the if, which I
> still think is just horrible style. :)

Fair enough. I use this style in a few places when stacking a bunch of
inits and checks for ENOMEM and such. doc/developer.html doesn't seem
to have a preference, and I've seen the style used elsewhere in FFmpeg.
I went with splitting the assignments and testing in all the ifs in the
updated patchset

> > 
> > +    //file starts wih 0xC0DEC2
> > +    if (p->buf[0] == 0xC0 && p->buf[1] == 0xDE && p->buf[2] ==
> > 0xC2) {
> > +        return AVPROBE_SCORE_MAX;
> > +    }
> As mentioned, try to find a few more bits and reduce the score.
> If only these 3 bytes match, I would suggest a rather low score.
> (I doubt there are enough useful bits in this header to allow
> reaching MAX score, it's a shame they didn't invest a byte or
> 2 more, especially considering that while 0xC0DEC2 might look
> clever it doesn't exactly get maximum entropy out of those 3 bytes).

Maximum entropy by what measure? :)
The probing logic has been expanded, see the updated patchset I sent as
reply to Nicolas. A file extension check might also be in order, and
being one or two points away from MAX in all cases.

I'm also curious what format it could conceivably mis-probed as, beyond
purposeful polyglots like MXF. .au?

We also toyed with the idea of a very long format string on IRC
yesterday since things are still very much experimental and may change
rapidly

> > 
> > +    AVStream *st;
> > +
> > +    if (!(st = avformat_new_stream(s, NULL)) ||
> Can merge allocation and declaration again.

Fixed

> > 
> > +    //Read roughly 0.1 seconds worth of data.
> > +    n = st->codecpar->bit_rate / ((int)(8/0.1) * block_align) + 1;
> That doesn't seem overly readable to me...
> // Read about 1/10th of a second worth of data
> st->codecpar->bit_rate / 10 / 8 / block_align + 1
> 
> Seems not really worse and doesn't have doubles and casts...
> Otherwise
> blocks_per_second = st->codecpar->bit_rate / 8 / block_align;
> n = blocks_per_second / 10;
> might be even clearer.

Actually, I might go with an option and default to just demuxing one
frame at a time. So maximum seek fidelity and minimum delay unless the
user wants to transcode large amounts of speech

> > 
> > +    if (av_new_packet(pkt, size) < 0)
> > +        return AVERROR(ENOMEM);
> > +    //try to read desired number of bytes, recompute n from to
> > actual number of bytes read
> > +    pkt->pos= avio_tell(s->pb);
> > +    pkt->stream_index = 0;
> > +    ret = ffio_read_partial(s->pb, pkt->data, size);
> > +    if (ret < 0) {
> > +        av_packet_unref(pkt);
> > +        return ret;
> > +    }
> > +    av_shrink_packet(pkt, ret);
> Ouch, why are you not just using av_get_packet?

Well, I nicked it from ff_raw_read_partial_packet(). av_get_packet()
does not seem to respect block_align, which is worrying. Went with it
either way for now.

> > 
> > +    avio_write(s->pb, st->codecpar->extradata,
> > sizeof(avpriv_codec2_header));
> Error check?

avio_write() returns void :)

> > 
> > +    if (!(st = avformat_new_stream(s, NULL)) ||
> > +        ff_alloc_extradata(st->codecpar,
> > sizeof(avpriv_codec2_header))) {
> > +        return AVERROR(ENOMEM);
> Here and in the other read_header, this would preferably preserve the
> error that
> ff_alloc_extradata returned.

Fixed.

---

When setting up a minimal build for afl-fuzz:

  ./configure --cc=/usr/bin/afl-clang-fast \
    --extra-cflags='-DAFL_HARDEN=1' --disable-everything \
    --enable-demuxer=codec2 --enable-libcodec2 \
    --enable-decoder=libcodec2 --enable-protocol=file \
    --disable-demuxer=rtsp --disable-avfilter

I got AR complaining about not finding libavformat/codec2utils.o. So I
had to change the OBJS to explicitly point out lavc like so:

OBJS-$(CONFIG_CODEC2_DEMUXER)            += ../libavcodec/codec2utils.o codec2.o rawdec.o pcm.o
OBJS-$(CONFIG_CODEC2_MUXER)              += ../libavcodec/codec2utils.o codec2.o rawenc.o
OBJS-$(CONFIG_CODEC2RAW_DEMUXER)         += ../libavcodec/codec2utils.o codec2.o rawdec.o pcm.o

Am I missing something there? Surely more demuxer make use of lavc..
dv.c has this only, despite #including "libavcodec/dv.h":

OBJS-$(CONFIG_DV_DEMUXER)                += dv.o

Despite depending on lavc-- ohhh! I may need a codec2utils target after
all, for it to be built into lavc which lavf already depends on :) Or?

TODO:

* have -mode be an integer and use the CONST system Nicolas mentioned
* option for demuxing multiple frames at a time
* sort the extradata aliasing thing
* address API issues in libcodec2, possibly modify the format (there
should still be time)

New patchset attached. Has the new probe logic and all that in it. 85
cycles of afl-fuzz without hangs or crashes.

/Tomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Don-t-complain-about-codec2-s-700-bit-s-modes-in-ffm.patch
Type: text/x-patch
Size: 1127 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170804/a03b675c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-codec2-muxers-demuxers-and-en-decoder-via-libcod.patch
Type: text/x-patch
Size: 34345 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170804/a03b675c/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170804/a03b675c/attachment.sig>