[FFmpeg-devel] [PATCH] Extract QP from h264 encoded videos

Mark Thompson sw at jkqxz.net
Wed Aug 7 02:59:33 EEST 2019


On 05/08/2019 20:20, Juan De León wrote:
> AVQuantizationParams data structure for extracting qp and storing as AV_FRAME_DATA_QUANTIZATION_PARAMS AVFrameSideDataType
> design doc: https://docs.google.com/document/d/1WClt3EqhjwdGXhEw386O0wfn3IBQ1Ib-_5emVM1gbnA/edit?usp=sharing
> 
> Signed-off-by: Juan De León <juandl at google.com>
> ---
>  libavutil/Makefile              |   2 +
>  libavutil/frame.h               |   6 ++
>  libavutil/quantization_params.c |  83 ++++++++++++++++++++++++
>  libavutil/quantization_params.h | 110 ++++++++++++++++++++++++++++++++
>  4 files changed, 201 insertions(+)
>  create mode 100644 libavutil/quantization_params.c
>  create mode 100644 libavutil/quantization_params.h

This should be in libavcodec, not libavutil - it relates directly to codecs.  (Indeed, you've ended up having to define a special non-libavcodec enum of codecs below to make it work in libavutil at all.)

> diff --git a/libavutil/quantization_params.h b/libavutil/quantization_params.h
> new file mode 100644
> index 0000000000..1c1b474dca
> --- /dev/null
> +++ b/libavutil/quantization_params.h
> @@ -0,0 +1,110 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVUTIL_QUANTIZATION_PARAMS_H
> +#define AVUTIL_QUANTIZATION_PARAMS_H
> +
> +/**
> + * Supported decoders for extraction and filter
> + */
> +enum AVExtractQPSupportedCodecs {
> +    AV_EXTRACT_QP_CODEC_ID_H264 = 0,
> +    AV_EXTRACT_QP_CODEC_ID_VP9,
> +    AV_EXTRACT_QP_CODEC_ID_AV1,
> +};
> +
> +/**
> + * Enums for different codecs to store qp in the type array
> + * Each enum must have an array of strings describing each field
> + * declared in libavutil/quantization_params.c
> + */
> +
> +enum AVQPArrIndexesH264 {  // varaible names in spec document

I don't think giving these enums a tag has any value?

> +    AV_QP_Y_H264 = 0,      // QPy
> +    AV_QP_U_H264,          // QPcb

There is no variable in the standard with this name, which will confuse attempts to search for its meaning.  I think you mean QPc for the Cb component, as described in section 8.5.8.

> +    AV_QP_V_H264,          // QPcr

Likewise here.

> +    AV_QP_ARR_SIZE_H264
> +};
> +
> +enum AVQPArrIndexesVP9 {   // variable names in spec document
> +    AV_QP_YAC_VP9 = 0,     // get_dc_quant[][base_q_idx]

This isn't right - get_dc_quant() is a function of one variable, not a two-dimensional array (confused with dc_qlookup[][] somehow?).

> +    AV_QP_YDC_VP9,         // get_dc_quant[][base_q_idx+delta_q_y_dc]
> +    AV_QP_UVDC_VP9,        // get_dc_quant[][base_q_idx+delta_q_uv_dc]
> +    AV_QP_UVAC_VP9,        // get_ac_quant[][base_q_idx+delta_q_uv_ac]
> +    AV_QP_INDEX_YAC_VP9,   // base_q_idx
> +    AV_QP_INDEX_YDC_VP9,   // base_q_idx+delta_q_y_dc
> +    AV_QP_INDEX_UVDC_VP9,  // base_q_idx+delta_q_uv_dc
> +    AV_QP_INDEX_UVAC_VP9,  // base_q_idx+delta_q_uv_ac

Why are you including higher-level frame values for VP9 and AV1, but not including similar ones for H.264?

> +    AV_QP_ARR_SIZE_VP9
> +};
> +
> +enum AVQPArrIndexesAV1 {  // variable names in spec document
> +    AV_QP_YAC_AV1 = 0,    // dc_q(base_q_idx)

What is the motivation for AV1 returning the exponential coefficient scaling values (range 4..29247) rather than the linear parameter (range 0..255) as you do for H.264?

> +    AV_QP_YDC_AV1,        // dc_q(base_q_idx+DeltaQYDc)
> +    AV_QP_UDC_AV1,        // dc_q(base_q_idx+DeltaQUDc)
> +    AV_QP_UAC_AV1,        // dc_q(base_q_idx+DeltaQUAc)
> +    AV_QP_VDC_AV1,        // dc_q(base_q_idx+DeltaQVDc)
> +    AV_QP_VAC_AV1,        // dc_q(base_q_idx+DeltaQVAc)
> +    AV_QP_INDEX_YAC_AV1,  // base_q_idx
> +    AV_QP_INDEX_YDC_AV1,  // base_q_idx+DeltaQYDc
> +    AV_QP_INDEX_UDC_AV1,  // base_q_idx+DeltaQUDc
> +    AV_QP_INDEX_UAC_AV1,  // base_q_idx+DeltaQUAc
> +    AV_QP_INDEX_VDC_AV1,  // base_q_idx+DeltaQVDc
> +    AV_QP_INDEX_VAC_AV1,  // base_q_idx+DeltaQVAc
> +    AV_QP_ARR_SIZE_AV1
> +};

The term "QP" is never used in VP9 or AV1.  Maybe these could have a less H.26[45]-oriented name?

> +
> +/**
> + * Update AV_QP_ARR_MAX_SIZE when a new enum is defined that
> + * exceeds the current max size.
> + */
> +
> +#define AV_QP_ARR_MAX_SIZE AV_QP_ARR_SIZE_AV1

Fixing this for all time for a particular codec which happens to need the most space when it is defined doesn't seem like a good idea.  E.g. you can't support JPEG with only this number (it would need all entries in up to four tables).

It might be better if the structure size wasn't fixed forever by the first version of the API/ABI.  Perhaps an approach something like that used for AVRegionOfInterest would work?

> +
> +/**
> + * Data structure for extracting Quantization Parameters, codec independent
> + */
> +typedef struct AVQuantizationParams {
> +    /**
> +     * x and y coordinates of the block in pixels
> +     */
> +    int x, y;

Don't call these x/y coordinates because it not clear exactly what that means (what is the scale, where is the origin, which direction is positive, where in the block is being referred to, etc.).

Instead follow the same convention as other structures in FFmpeg and define them as the distance in pixels from the left or the top edge of the picture to the top-left corner of the block.

On 30/07/2019 03:19, Juan De León wrote:
> On Mon, Jul 29, 2019 at 12:48 PM Mark Thompson <sw at jkqxz.net> wrote: 
>>
>> How do these values interact with cropping?
> 
> I'm not sure I understand, could you elaborate?

For codecs which include cropping such as H.26[45], the decoder may directly apply cropping from the stream (controlled by AVCodecContext.apply_cropping), possibly modified by alignment (with AV_CODEC_FLAG_UNALIGNED), and then sets the AVFrame cropping fields to reflect the remainder.

For example, in H.264 try setting the frame_crop_left_offset/frame_crop_top_offset fields in a stream to large values (h264_metadata can do this for an existing stream).  What do your x/y values then refer to in the result?  They could be negative to indicate macroblocks which are off the edges of the cropped picture, or they might be relative to the uncropped picture in which case you would need additional information to reconstruct which blocks they refer to in the frame you actually have.

> +    /**
> +     * width and height of the block in pixels
> +     * set to 0 for the last block in the array
> +     */
> +    int w, h;

Just call them width, height - there's no benefit to brevity here.

> +    /**
> +     * qp_type array indexed using the enum corresponding
> +     * to the codec extracting the QP
> +     * AV_QP_ARR_MAX_SIZE sould always be set to
> +     * the largest size of the supported codecs
> +     */
> +    int qp_type[AV_QP_ARR_MAX_SIZE];

Give this a less misleading name.  It's not an array of QP types, rather it's an array of quantiser values which is indexed by the codec-dependent type.

> +    /**
> +     * Stores an id corresponding to one of the supported codecs
> +     */
> +    enum AVExtractQPSupportedCodecs codec_id;

enum AVCodecID, with this in libavcodec.

> +} AVQuantizationParams;
> +
> +/**
> + * Get the string describing the qp type for the given codec
> + */
> +const char* av_get_qp_type_string(enum AVExtractQPSupportedCodecs codec_id, int index);

I'm not sure there is a good reason to embed this in the public API - what user is ever going to call this function?  Anyone using the enum values must already know exactly what each of them mean to do anything with them at all, so if they need string names they'll already have clearer ones than the cryptic short names you provide here.

I think it would probably be better to just include your string names in the showinfo filter (or some other) and not have it in the public API.

> +
> +#endif /* AVUTIL_QUANTIZATION_PARAMS_H */
> 

- Mark


More information about the ffmpeg-devel mailing list