[FFmpeg-devel] [PATCH v9 03/13] vvcdec: add cabac decoder
Nuo Mi
nuomi2021 at gmail.com
Tue Jan 2 15:44:08 EET 2024
On Tue, Jan 2, 2024 at 1:35 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:
> On Mon, Jan 01, 2024 at 10:12:29PM +0800, Nuo Mi wrote:
> > add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder
> >
> > Co-authored-by: Xu Mu <toxumu at outlook.com>
> > Co-authored-by: Frank Plowman <post at frankplowman.com>
> > Co-authored-by: Shaun Loo <shaunloo10 at gmail.com>
> > Co-authored-by: Wu Jianhua <toqsxw at outlook.com>
> > ---
> > libavcodec/vvc/Makefile | 4 +-
> > libavcodec/vvc/vvc_cabac.c | 2478 ++++++++++++++++++++++++++++++++++++
> > libavcodec/vvc/vvc_cabac.h | 126 ++
> > libavcodec/vvc/vvc_ctu.c | 32 +
> > libavcodec/vvc/vvc_ctu.h | 464 +++++++
> > libavcodec/vvc/vvcdec.h | 7 +
> > 6 files changed, 3110 insertions(+), 1 deletion(-)
> > create mode 100644 libavcodec/vvc/vvc_cabac.c
> > create mode 100644 libavcodec/vvc/vvc_cabac.h
> > create mode 100644 libavcodec/vvc/vvc_ctu.c
> > create mode 100644 libavcodec/vvc/vvc_ctu.h
>
> [...]
>
> > +static int residual_ts_coding_subblock(VVCLocalContext *lc,
> ResidualCoding* rc, const int i)
> > +{
> > + const CodingUnit *cu = lc->cu;
> > + TransformBlock *tb = rc->tb;
> > + const int bdpcm_flag = cu->bdpcm_flag[tb->c_idx];
> > + const int xs = rc->sb_scan_x_off[i];
> > + const int ys = rc->sb_scan_y_off[i];
> > + uint8_t *sb_coded_flag = rc->sb_coded_flag + ys * rc->width_in_sbs
> + xs;
> > + int infer_sb_sig_coeff_flag = 1;
> > + int last_scan_pos_pass1 = -1, last_scan_pos_pass2 = -1, n;
> > + int abs_level_gtx_flag[MAX_SUB_BLOCK_SIZE * MAX_SUB_BLOCK_SIZE];
> > + int abs_level_pass2[MAX_SUB_BLOCK_SIZE * MAX_SUB_BLOCK_SIZE];
> ///< AbsLevelPass2
> > +
> > + if (i != rc->last_sub_block || !rc->infer_sb_cbf)
> > + *sb_coded_flag = sb_coded_flag_decode(lc, sb_coded_flag, rc,
> xs, ys);
> > + else
> > + *sb_coded_flag = 1;
> > + if (*sb_coded_flag && i < rc->last_sub_block)
> > + rc->infer_sb_cbf = 0;
> > +
> > + //first scan pass
> > + for (n = 0; n < rc->num_sb_coeff && rc->rem_bins_pass1 >= 4; n++) {
> > + const int xc = (xs << rc->log2_sb_w) + rc->scan_x_off[n];
> > + const int yc = (ys << rc->log2_sb_h) + rc->scan_y_off[n];
> > + const int off = yc * tb->tb_width + xc;
> > + int *sig_coeff_flag = rc->sig_coeff_flag + off;
> > + int *abs_level_pass1 = rc->abs_level_pass1 + off;
> > + int *coeff_sign_level = rc->coeff_sign_level + off;
> > + int par_level_flag = 0;
> > +
> > + abs_level_gtx_flag[n] = 0;
> > + last_scan_pos_pass1 = n;
> > + if (*sb_coded_flag && (n != rc->num_sb_coeff - 1 ||
> !infer_sb_sig_coeff_flag)) {
> > + *sig_coeff_flag = sig_coeff_flag_decode(lc, rc, xc, yc);
> > + rc->rem_bins_pass1--;
> > + if (*sig_coeff_flag)
> > + infer_sb_sig_coeff_flag = 0;
> > + } else {
> > + *sig_coeff_flag = (n == rc->num_sb_coeff - 1) &&
> infer_sb_sig_coeff_flag && *sb_coded_flag;
> > + }
> > + *coeff_sign_level = 0;
> > + if (*sig_coeff_flag) {
> > + *coeff_sign_level = 1 - 2 * coeff_sign_flag_ts_decode(lc,
> cu, rc, xc, yc);
> > + abs_level_gtx_flag[n] = abs_level_gt1_flag_ts_decode(lc,
> cu, rc, xc, yc);
> > + rc->rem_bins_pass1 -= 2;
> > + if (abs_level_gtx_flag[n]) {
> > + par_level_flag = par_level_flag_ts_decode(lc);
> > + rc->rem_bins_pass1--;
> > + }
> > + }
> > + *abs_level_pass1 = *sig_coeff_flag + par_level_flag +
> abs_level_gtx_flag[n];
> > + }
> > +
> > + //greater than x scan pass
> > + for (n = 0; n < rc->num_sb_coeff && rc->rem_bins_pass1 >= 4; n++) {
> > + const int xc = (xs << rc->log2_sb_w) + rc->scan_x_off[n];
> > + const int yc = (ys << rc->log2_sb_h) + rc->scan_y_off[n];
> > + const int off = yc * tb->tb_width + xc;
> > +
> > + abs_level_pass2[n] = rc->abs_level_pass1[off];
> > + for (int j = 1; j < 5 && abs_level_gtx_flag[n]; j++) {
> > + abs_level_gtx_flag[n] = abs_level_gtx_flag_ts_decode(lc, j);
> > + abs_level_pass2[n] += abs_level_gtx_flag[n] << 1;
> > + rc->rem_bins_pass1--;
> > + }
> > + last_scan_pos_pass2 = n;
> > + }
> > +
> > + /* remainder scan pass */
> > + for (n = 0; n < rc->num_sb_coeff; n++) {
> > + const int xc = (xs << rc->log2_sb_w) + rc->scan_x_off[n];
> > + const int yc = (ys << rc->log2_sb_h) + rc->scan_y_off[n];
> > + const int off = yc * tb->tb_width + xc;
> > + const int *abs_level_pass1 = rc->abs_level_pass1 + off;
> > + int *abs_level = rc->abs_level + off;
> > + int *coeff_sign_level = rc->coeff_sign_level + off;
> > + int abs_remainder = 0;
> > +
> > + if ((n <= last_scan_pos_pass2 && abs_level_pass2[n] >= 10) ||
> > + (n > last_scan_pos_pass2 && n <= last_scan_pos_pass1 &&
> > + *abs_level_pass1 >= 2) ||
> > + (n > last_scan_pos_pass1 && *sb_coded_flag))
> > + abs_remainder = abs_remainder_ts_decode(lc, rc, xc, yc);
> > + if (n <= last_scan_pos_pass2) {
> > + *abs_level = abs_level_pass2[n] + 2 * abs_remainder;
> > + } else if (n <= last_scan_pos_pass1) {
> > + *abs_level = *abs_level_pass1 + 2 * abs_remainder;
> > + } else {
> > + *abs_level = abs_remainder;
> > + if (abs_remainder) {
> > + //n > lastScanPosPass1
> > + *coeff_sign_level = 1 - 2 * coeff_sign_flag_decode(lc);
> > + }
> > + }
> > + if (!bdpcm_flag && n <= last_scan_pos_pass1) {
> > + const int left = xc > 0 ? abs_level[-1] : 0;
> > + const int above = yc > 0 ? abs_level[-tb->tb_width] : 0;
> > + const int pred = FFMAX(left, above);
> > +
> > + if (*abs_level == 1 && pred > 0)
> > + *abs_level = pred;
> > + else if (*abs_level > 0 && *abs_level <= pred)
> > + (*abs_level)--;
> > + }
>
> > + if (*abs_level) {
> > + tb->coeffs[off] = *coeff_sign_level * *abs_level;
> > + tb->max_scan_x = FFMAX(xc, tb->max_scan_x);
> > + tb->max_scan_y = FFMAX(yc, tb->max_scan_y);
> > + tb->min_scan_x = FFMIN(xc, tb->min_scan_x);
> > + tb->min_scan_y = FFMIN(yc, tb->min_scan_y);
> > + } else {
> > + tb->coeffs[off] = 0;
> > + }
>
> Is this just for optimization ?
>
Yes. see
https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc/vvc_itx_1d.c#L66
>
> computing the max/min x/y indexes of non zero coeffs to later only process
> them is likely more expensive than to just do the dequantization here
> where its
> known what is non zero, also probably the non zero coeffs do not cluster
> well
> in a rectangle so there will likely still be alot of 0 in that
>
> If this is just for optimization, its a strange direction at such an early
> stage
> dequantization can be done directly here when we already have a seperate
> branch for
> non zero coefficients.
>
good idea
>
> and for transform it knowing for example that rows 1 and 3 are all 0 is
> probably
> more usefull than knowing that all non zero elements are in rows 0-2
>
Min may not be as useful since we usually have DC.
Max is important for transforms since we can skip multiplication for all
tail zeros.
Perhaps we don't need to compare every coefficient to get Max.
Let me use https://github.com/ffvvc/FFmpeg/issues/179 to track and revisit
it later.
Thank you.
>
> thx
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Many things microsoft did are stupid, but not doing something just because
> microsoft did it is even more stupid. If everything ms did were stupid they
> would be bankrupt already.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list