[FFmpeg-devel] [PATCH v6 14/14] vvcdec: add full vvc decoder

Sun Dec 10 14:54:06 EET 2023

On Sat, Dec 9, 2023 at 1:13 PM Andreas Rheinhardt <
andreas.rheinhardt at outlook.com> wrote:

> Nuo Mi:
> > Hi Andreas,
> > thank you for the review.
> > On Fri, Dec 8, 2023 at 8:17 PM Andreas Rheinhardt <
> > andreas.rheinhardt at outlook.com> wrote:
> >
> >>
> >>> +
> >>> +static int min_pu_arrays_init(VVCFrameContext *fc, const int
> >> pic_size_in_min_pu)
> >>> +{
> >>> +    if (fc->tab.pic_size_in_min_pu != pic_size_in_min_pu) {
> >>> +        min_pu_arrays_free(fc);
> >>> +        fc->tab.msf  = av_mallocz(pic_size_in_min_pu);
> >>> +        fc->tab.iaf  = av_mallocz(pic_size_in_min_pu);
> >>> +        fc->tab.mmi  = av_mallocz(pic_size_in_min_pu);
> >>> +        fc->tab.mvf  = av_mallocz(pic_size_in_min_pu *
> >> sizeof(*fc->tab.mvf));
> >>
> >> Do these have to be separate allocations? If there were allocated
> >> jointly, one memset below would suffice.
> >>
> > They are separate flags, if we combine them. We can't use memset to set
> > flags for a block.
> >
>
> I disagree: You would still be able to use different pointers for
> different parts of the large allocated block, it is just that you also
> save some unnecessary allocations (and frees and errors checks for the
> allocations) and also gain the ability to memset them via one memset
> call in case one wants to set them to the same value.
>
Good idea. done

>
> >>
> >>> +
> >>> +static int init_slice_context(SliceContext *sc, VVCFrameContext *fc,
> >> const H2645NAL *nal, const CodedBitstreamUnit *unit)
> >>> +{
> >>> +    const VVCSH *sh             = &sc->sh;
> >>> +    const H266RawSlice *slice   = (const H266RawSlice *)unit->content;
> >>
> >> Please no pointless casts. Also, why is there unnecessary whitespace in
> >> front of '='?
> >>
> > Fix here and serval other places
> > The whitespace will make all = in a col.
> >
>
> But there is nothing that needs that much whitespace.
>
> >>> +
> >>> +static av_cold int frame_context_init(VVCFrameContext *fc,
> >> AVCodecContext *avctx)
> >>> +{
> >>> +
> >>> +    fc->avctx = av_memdup(avctx, sizeof(*avctx));
> >>
> >> When I read this, I presumed you are using multiple AVCodecContexts to
> >> store the ever changing state of the AVCodecContext fields similarly to
> >> update_context_from_thread() in pthread_frame.c. But it seems you don't.
> >> These contexts are only used as a) logcontexts (where the actual
> >> user-facing AVCodecContext should be used, so that the user can make
> >> sense of the logmessages!), b) in ff_thread_get_buffer() and c) in
> >> export_frame_params() where only some basic fields
> >> (dimension-related+pix_fmt) is set. Presumably c) is done for b).
> >>
> > I remember if i did not use a local AVCodecContext  it would trigger some
> > assert when resolution changed.
> >
>
> Can you be more specific about what assert has been triggered? And have
> you set the AVFrame fields directly before ff_thread_get_buffer()?
>
hmm, this has not happened now.
Let us remove the memdup

>
> >>
> >> But the user is allowed to change the provided callbacks in the master
> >> context at any time. E.g. the call to ff_thread_get_buffer() in
> >> vvc_refs.c currently uses the VVCFrameContext and therefore uses the
> >> get_buffer2 callback in place now (during av_memdup()). This is wrong.
> >>
> > This will not happen. av_memdup only happens in vvc_decode_init.
> > Nobody will call ff_thread_get_buffer at this time
> >
>
> You missed the point: If the user changes the get_buffer2 callback after
> init, the new callback will not be used at all.
>
fixed.

>
> >>
> >> I think you can just remove VVCFrameContext.avctx and use the
> >> user-facing AVCodecContext if you set the AVFrame properties that are
> >> normally derived from the AVCodecContext directly on the AVFrame before
> >> ff_thread_get_buffer().
> >
> > Could you explain more about how to create a user-facing  AVCodecContext?
> >
>
> You do not create a user-facing AVCodecContext, the user does (and calls
> avcodec_send_packet()/avcodec_receive_frame() with it).

> >>
> >>> +
> >>> +static int decode_nal_unit(VVCContext *s, VVCFrameContext *fc, const
> >> H2645NAL *nal, const CodedBitstreamUnit *unit)
> >>> +{
> >>> +    int  ret;
> >>> +
> >>> +    s->temporal_id   = nal->temporal_id;
> >>> +
> >>> +    switch (unit->type) {
> >>> +    case VVC_VPS_NUT:
> >>> +    case VVC_SPS_NUT:
> >>> +    case VVC_PPS_NUT:
> >>> +        /* vps, sps, sps cached by s->cbc */
> >>> +        break;
> >>> +    case VVC_TRAIL_NUT:
> >>> +    case VVC_STSA_NUT:
> >>> +    case VVC_RADL_NUT:
> >>> +    case VVC_RASL_NUT:
> >>> +    case VVC_IDR_W_RADL:
> >>> +    case VVC_IDR_N_LP:
> >>> +    case VVC_CRA_NUT:
> >>> +    case VVC_GDR_NUT:
> >>> +        ret = decode_slice(s, fc, nal, unit);
> >>> +        if (ret < 0)
> >>> +            goto fail;
> >>> +        break;
> >>> +    case VVC_PREFIX_APS_NUT:
> >>> +    case VVC_SUFFIX_APS_NUT:
> >>> +        ret = ff_vvc_decode_aps(&s->ps, unit);
> >>> +        if (ret < 0)
> >>> +            goto fail;
> >>> +        break;
> >>> +    default:
> >>> +        av_log(s->avctx, AV_LOG_INFO,
> >>> +               "Skipping NAL unit %d\n", unit->type);
> >>
> >> This will probably be very noisy (and warn for every SEI). I don't think
> >> it is even needed, as h2645_parse.c already contains debug log messages
> >> to display the unit type.
> >>
> > It's copied from hevcdec. It means we did not handle the nal diffrent
> than
> > h2645_parser.c messages
> >
>
> 1. The message is unnecessary, because a user who wants to know which
> NAL units have been handled or not can get the info about which units
> are present from h2645_parse.c and then look up in this list whether
> this type is processed.
> 2. hevcdec.c does "handle" quite a lot more NAL units; e.g. it actually
> handles SEI messages and it ignores e.g. Access unit delimiters as well
> as HEVC_NAL_UNSPEC62. Whereas you do not.
>
removed

>
> > A fail that is only "return ret" is pointless (not only here).
> >>
> > At someday if we need to add some cleanup code. we do not need to change
> > all returns to goto.
> >
>
> IMO a goto fail should be added if and when it is actually beneficial.
>
fixed

>
> >>> +
> >>> +    if (output) {
> >>> +        while (s->nb_delayed) {
> >>> +            wait_delayed_frame(s, output, &got_output);
> >>> +            if (got_output) {
> >>> +                av_frame_unref(output);
> >>> +            }
> >>> +        }
> >>> +        av_frame_free(&output);
> >>> +    }
> >>>  }
> >>>
> >>>  static av_cold int vvc_decode_free(AVCodecContext *avctx)
> >>>  {
> >>> +    VVCContext *s = avctx->priv_data;
> >>> +    int i;
> >>> +
> >>> +    ff_cbs_fragment_free(&s->current_frame);
> >>
> >> Is it sure that the fragment is not in use (given that other threads may
> >> be running now before vvc_decode_flush())?
> >>
> > Do you mean the executor threads? If they want to use some data, they
> will
> > take their own hip.
> > see ff_refstruct_replace(&sps->r, rsps);
> >
>
> "hip"?
>
:)

>
> I have now noticed that the SliceContexts contain a reference to the
> packet's data via VVCSH->H266RawSliceHeader, which in reality points to
> a H266RawSlice which contains a reference to the actual data, so this
> point is moot. But using H266RawSliceHeader* for a H266RawSlice* is not
> nice.
>
Ok, let me take a  reference to H266RawSlice.

>
> >>
> >>> +
> >>> +    s->nb_fcs = (avctx->flags & AV_CODEC_FLAG_LOW_DELAY) ? 1 :
> >> FFMIN(av_cpu_count(), VVC_MAX_FRMAE_DELAY);
> >>
> >> This may evaluate av_cpu_count() multiple times. Furthermore I don't
> >> know why this define is used here at all: With frame threading, the
> >> number of frame threads is not limited by the delay/number of reordering
> >> frames at all (we even have frame-threading for decoders without
> >> frame-reordering at all).
> >>
> >  vvc_decode_frame only allows 1 frame in 1 frame out. We can remove the
> > delay if we switch to FFCodec->receive_frame,
> >
>
> I do not get how this is supposed to address my point.
>
> >>
> >> But worst of this is that you do not check avctx->thread_count at all.
> >>
> > we do not use avctx->thread_count.  we use the executor to manage
> threads.
> >
>
> This is complete nonsense: It is the user who specifies how many threads
> to use, regardless of which mechanism is used to manage threads.
>
fixed

>
> >>
> >>> +    s->fcs = av_calloc(s->nb_fcs, sizeof(*s->fcs));
> >>> +    if (!s->fcs)
> >>> +        goto fail;
> >>> +
> >>> +    for (int i = 0; i < s->nb_fcs; i++) {
> >>> +        VVCFrameContext *fc = s->fcs + i;
> >>> +        ret = frame_context_init(fc, avctx);
> >>> +        if (ret < 0)
> >>> +            goto fail;
> >>> +    }
> >>> +
> >>> +    s->executor = ff_vvc_executor_alloc(s, s->nb_fcs);
> >>> +    if (!s->executor)
> >>> +        goto fail;
> >>> +
> >>> +    s->eos = 1;
> >>> +    GDR_SET_RECOVERED(s);
> >>> +    memset(&ff_vvc_default_scale_m, 16,
> sizeof(ff_vvc_default_scale_m));
> >>
> >> This needs to be done once (i.e. protected by an AVOnce) and not every
> >> time a decoder is set up. Otherwise there might be data races.
> >>
> > It's not read and set, it will no data races:), I can change it to
> AVOnce
> > .
>
> This is wrong: It is set here and presumably it will be read somewhere,
> so there absolutely can be data races.
> If you believe that there is no data race because every memset sets it
> to the same value, then you should be aware that the C specification
> disagrees with you (all references are to the C11 spec):
>
> a) 5.1.2.4 25: "The execution of a program contains a data race if it
> contains two conflicting actions in different threads, at least one of
> which is not atomic, and neither happens before the other. Any such data
> race results in undefined behavior."
> b) 5.1.2.4 4: "Two expression evaluations conflict if one of them
> modifies a memory location and the other one reads or modifies the same
> memory location."
> c) Note 2 in 3.1 (to the definition of "access"):
> "‘‘Modify’’ includes the case where the new value being stored is the
> same as the previous value."
>
> With the current code data races will happen if a) two different decoder
> instances are initialized without synchronisation (given that lavc does
> not serialize initialization of codecs (except in rare cases based upon
> a flag which this decoder does not set), this synchronization would have
> to be performed by the user, but we do not require our users to do this)
> or b) a decoder is initialized while another decoder runs and reads from
> ff_vvc_default_scale_m:
>
> Because the accesses performed by the initing thread are always
> modifications according to c), the accesses by the different threads
> conflict by definition b). memset() is not required to perform atomic
> modifications (and according to the standard atomic modifications can
> only happen with atomic objects, which ff_vvc_default_scale_m is not)
> and by our assumption there is no synchronisation between these actions,
> so it is a data race according to a). And data races are undefined
> behaviour.
>
> This clause allows compilers to optimize lots of code as if the program
> were single-threaded (because concurrent accesses were UB, so presumably
> they don't happen). In particular, speculative writes are legal (and
> happen sometimes, but probably not in memset). In fact, it would be
> legal for memset to always zero the memory it is supposed to set and
> then overwrite it with the final value.
>

Thanks for the explanation. fixed

>
> - Andreas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>