[FFmpeg-devel] [PATCH] lavf/md5enc: New streammd5 muxer

Fri Feb 19 17:51:00 CET 2016

Le primidi 1er ventôse, an CCXXIV, Remko Tronçon a écrit :
> Signed-off-by: Remko Tronçon <remko at el-tramo.be>
> ---
> This patch introduces a new 'streammd5' muxer.
> Similarly to the 'md5' and 'framemd5' muxer, it is used to compute
> the hash of media. However, I needed something in between both, to
> be able to quickly see which streams are identical across different
> files.
> 
> I did not add any tests yet, because I was not sure if there was
> general interest in this patch.

Thanks for the patch.

I have no objection theoretically, but I am rather dubious about the
usefulness of that: you could mux the streams independently to the md5
muxer. Can you share your exact use case in more details?

Below comments about the code:

> 
>  Changelog                |  1 +
>  doc/muxers.texi          | 40 ++++++++++++++++++++--
>  libavformat/Makefile     |  1 +
>  libavformat/allformats.c |  1 +
>  libavformat/md5enc.c     | 87 ++++++++++++++++++++++++++++++++++++++++++++++++
>  libavformat/version.h    |  4 +--
>  6 files changed, 130 insertions(+), 4 deletions(-)
> 
> diff --git a/Changelog b/Changelog
> index 7d672fb..42ee96a 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -6,6 +6,7 @@ version <next>:
>  - fieldhint filter
>  - loop video filter and aloop audio filter
>  - Bob Weaver deinterlacing filter

> +- streammd5 format

Changing the name of existing muxers is annoying, but for new features I
would like much better if we did not wave around the name of a hash function
that has known vulnerabilities since 20 years: "streamhash" would be an
obvious choice.

>  
>  
>  version 3.0:
> diff --git a/doc/muxers.texi b/doc/muxers.texi
> index f2ad7fe..1521f37 100644
> --- a/doc/muxers.texi
> +++ b/doc/muxers.texi
> @@ -207,7 +207,7 @@ To print the information to stdout, use the command:
>  ffmpeg -i INPUT -f framemd5 -
>  @end example
>  
> -See also the @ref{md5} muxer.
> +See also the @ref{md5} and @ref{streammd5} muxers.
>  
>  @anchor{gif}
>  @section gif
> @@ -641,7 +641,7 @@ You can print the MD5 to stdout with the command:
>  ffmpeg -i INPUT -f md5 -
>  @end example
>  
> -See also the @ref{framemd5} muxer.
> +See also the @ref{framemd5} and @ref{streammd5} muxers.
>  
>  @section mov, mp4, ismv
>  
> @@ -1307,6 +1307,42 @@ Specify whether to remove all fragments when finished. Default 0 (do not remove)
>  
>  @end table
>  
> + at anchor{streammd5}
> + at section streammd5
> +

> +Per-stream MD5 testing format.
> +
> +This muxer computes and prints the MD5 hash for each stream.
> +By default audio streams are converted to signed
> +16-bit raw audio and video streams to raw video before computing the
> +hash.

This was copied more or less as is from existing documentation, but it is
misleading. First, these muxers can compute any supported hash function.
Second, there is no default conversion in these muxers, just codecs chosen
by default by the command line tool.

> +
> +The output of the muxer consists of a line for each stream of the form:
> + at example
> + at var{stream_index}, @var{MD5}
> + at end example
> +
> + at var{MD5} is a hexadecimal number representing the computed MD5 hash
> +for the packet.
> +
> + at subsection Examples
> +
> +For example to compute the MD5 of all the audio and video streams in
> + at file{INPUT}, converted to raw audio and video packets, and store it
> +in the file @file{out.md5}:
> + at example
> +ffmpeg -i INPUT -f streammd5 out.md5
> + at end example
> +
> +To compute the MD5 of all the streams in @file{INPUT} without any
> +conversion, and print it to stdout:
> + at example
> +ffmpeg -i INPUT -f streammd5 -codec copy -map 0 -
> + at end example
> +
> +See also the @ref{md5} and @ref{framemd5} muxers.
> +
> +
>  @section tee
>  
>  The tee muxer can be used to write the same data to several files or any
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index 001b3f1..fde64dc 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -430,6 +430,7 @@ OBJS-$(CONFIG_SRT_DEMUXER)               += srtdec.o subtitles.o
>  OBJS-$(CONFIG_SRT_MUXER)                 += srtenc.o
>  OBJS-$(CONFIG_STL_DEMUXER)               += stldec.o subtitles.o
>  OBJS-$(CONFIG_STR_DEMUXER)               += psxstr.o
> +OBJS-$(CONFIG_STREAMMD5_MUXER)           += md5enc.o
>  OBJS-$(CONFIG_SUBVIEWER1_DEMUXER)        += subviewer1dec.o subtitles.o
>  OBJS-$(CONFIG_SUBVIEWER_DEMUXER)         += subviewerdec.o subtitles.o
>  OBJS-$(CONFIG_SUP_DEMUXER)               += supdec.o
> diff --git a/libavformat/allformats.c b/libavformat/allformats.c
> index 02bb16a..df4c8eb 100644
> --- a/libavformat/allformats.c
> +++ b/libavformat/allformats.c
> @@ -295,6 +295,7 @@ void av_register_all(void)
>      REGISTER_MUXDEMUX(SPDIF,            spdif);
>      REGISTER_MUXDEMUX(SRT,              srt);
>      REGISTER_DEMUXER (STR,              str);
> +    REGISTER_MUXER   (STREAMMD5,        streammd5);
>      REGISTER_DEMUXER (STL,              stl);
>      REGISTER_DEMUXER (SUBVIEWER1,       subviewer1);
>      REGISTER_DEMUXER (SUBVIEWER,        subviewer);
> diff --git a/libavformat/md5enc.c b/libavformat/md5enc.c
> index 8433be4..b114a9b 100644
> --- a/libavformat/md5enc.c
> +++ b/libavformat/md5enc.c
> @@ -169,3 +169,90 @@ AVOutputFormat ff_framemd5_muxer = {
>      .priv_class        = &framemd5_class,
>  };
>  #endif
> +
> +
> +#if CONFIG_STREAMMD5_MUXER
> +
> +struct StreamMD5Context {
> +    const AVClass *avclass;
> +    struct AVHashContext** hashes;
> +    char *hash_name;
> +    int format_version;
> +};
> +
> +static const AVClass streammd5enc_class = {
> +    .class_name = "stream hash encoder class",
> +    .item_name  = av_default_item_name,

> +    .option     = hash_options,

I am rather surprised to see that the existing code uses the same options
array for both md5 and framemd5, I thought it was not allowed.

But anyways, md5 and framemd5 both use the same MD5Context. Using the same
options array with a different context structure is a big no.

> +    .version    = LIBAVUTIL_VERSION_INT,
> +};
> +
> +static int streammd5_write_header(struct AVFormatContext *s)
> +{

> +    unsigned int i;

Nit: the code base usually uses "unsigned" without "int". Same below.

> +    struct StreamMD5Context *ctx = s->priv_data;

> +    ctx->hashes = av_calloc(s->nb_streams, sizeof(struct AVHashContext*));
> +    for (i = 0; i < s->nb_streams; ++i) {
> +        int res = av_hash_alloc(&ctx->hashes[i], ctx->hash_name);
> +        if (res < 0) {
> +            return res;
> +        }
> +        av_hash_init(ctx->hashes[i]);

This leaks in case of error.

> +    }
> +    return 0;
> +}
> +
> +static int streammd5_write_packet(struct AVFormatContext *s, AVPacket *pkt)
> +{
> +    struct StreamMD5Context *ctx = s->priv_data;
> +    av_hash_update(ctx->hashes[pkt->stream_index], pkt->data, pkt->size);
> +    return 0;
> +}
> +
> +static int streammd5_write_trailer(struct AVFormatContext *s)
> +{
> +    struct StreamMD5Context *ctx = s->priv_data;
> +    unsigned int i;

> +    for (i = 0; i < s->nb_streams; ++i) {

Nit: i++ for consistency.

> +        int j = 0;
> +        int len = 0;
> +        int offset = 0;
> +        char buf[256];
> +        uint8_t md5[AV_HASH_MAX_SIZE];
> +
> +        snprintf(buf, sizeof(buf), "%d, ", i);
> +
> +        offset = strlen(buf);
> +        len = av_hash_get_size(ctx->hashes[i]);

> +        av_assert0(len > 0 && len <= sizeof(md5));

Should substract offset and the size required for the final octets.

> +        av_hash_final(ctx->hashes[i], md5);
> +        for (j = 0; j < len; j++) {
> +            snprintf(buf + offset, 3, "%02"PRIx8, md5[j]);
> +            offset += 2;
> +        }

There is a av_hash_final_hex() to do that.

> +        buf[offset] = '\n';
> +        buf[offset+1] = 0;
> +
> +        avio_write(s->pb, buf, strlen(buf));

> +        avio_flush(s->pb);

Flushing in a tight loop like that seems unnecessary.

> +
> +        av_hash_freep(&ctx->hashes[i]);
> +    }
> +    av_freep(ctx->hashes);
> +    return 0;
> +}
> +
> +AVOutputFormat ff_streammd5_muxer = {
> +    .name              = "streammd5",
> +    .long_name         = NULL_IF_CONFIG_SMALL("Per-stream MD5 testing"),
> +    .priv_data_size    = sizeof(struct StreamMD5Context),
> +    .audio_codec       = AV_CODEC_ID_PCM_S16LE,
> +    .video_codec       = AV_CODEC_ID_RAWVIDEO,
> +    .write_header      = streammd5_write_header,
> +    .write_packet      = streammd5_write_packet,
> +    .write_trailer     = streammd5_write_trailer,
> +    .flags             = AVFMT_VARIABLE_FPS | AVFMT_TS_NONSTRICT |
> +                         AVFMT_TS_NEGATIVE,
> +    .priv_class        = &streammd5enc_class,
> +};
> +#endif
> diff --git a/libavformat/version.h b/libavformat/version.h
> index 62050a2..8fc8e77 100644
> --- a/libavformat/version.h
> +++ b/libavformat/version.h
> @@ -30,8 +30,8 @@
>  #include "libavutil/version.h"
>  
>  #define LIBAVFORMAT_VERSION_MAJOR  57
> -#define LIBAVFORMAT_VERSION_MINOR  25
> -#define LIBAVFORMAT_VERSION_MICRO 101
> +#define LIBAVFORMAT_VERSION_MINOR  26
> +#define LIBAVFORMAT_VERSION_MICRO 100 
>  
>  #define LIBAVFORMAT_VERSION_INT AV_VERSION_INT(LIBAVFORMAT_VERSION_MAJOR, \
>                                                 LIBAVFORMAT_VERSION_MINOR, \

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160219/4dc41039/attachment.sig>