[FFmpeg-devel] [PATCH WIP 0/9] Refactor DNN

Tue Apr 30 17:07:14 EEST 2024

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Chen,
> Wenbin
> Sent: Tuesday, April 30, 2024 10:55 AM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH WIP 0/9] Refactor DNN
> 
> > > On Apr 29, 2024, at 18:29, Guo, Yejun
> > > <yejun.guo-at-intel.com at ffmpeg.org>
> > wrote:
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> > Zhao
> > >> Zhili
> > >> Sent: Sunday, April 28, 2024 6:55 PM
> > >> To: FFmpeg development discussions and patches <ffmpeg-
> > >> devel at ffmpeg.org>
> > >> Subject: Re: [FFmpeg-devel] [PATCH WIP 0/9] Refactor DNN
> > >>
> > >>
> > >>
> > >>> On Apr 28, 2024, at 18:34, Guo, Yejun <yejun.guo-at-
> > >> intel.com at ffmpeg.org> wrote:
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org
> > >>>> <mailto:ffmpeg-devel-bounces at ffmpeg.org>> On Behalf Of Zhao Zhili
> > >>>> Sent: Sunday, April 28, 2024 12:42 AM
> > >>>> To: ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
> > >>>> Cc: Zhao Zhili <zhilizhao at tencent.com
> > <mailto:zhilizhao at tencent.com>>
> > >>>> Subject: [FFmpeg-devel] [PATCH WIP 0/9] Refactor DNN
> > >>>>
> > >>>> From: Zhao Zhili <zhilizhao at tencent.com>
> > >>>>
> > >>>> During the refactor progress, I have found some serious issues,
> > >>>> which is not resolved by the patchset:
> > >>>>
> > >>>> 1. Tensorflow backend is broken.
> > >>>>
> > >>>> I think it doesn't work since 2021 at least. For example, it
> > >>>> destroy a thread and create a new thread for each frame, and it
> > >>>> destroy an invalid thread at the first
> > >>>> frame:
> > >>>
> > >>> It works from the day that code is merged, till today. It is by
> > >>> design to keep the code simplicity by using the feature that
> > >>> pthread_join accepts a parameter that is not a joinable thread.
> > >>>
> > >>> Please share more info if you experienced a real case that it does
> > >>> not
> > work.
> > >>
> > >> It will abort if ASSERT_LEVEL > 1.
> > >>
> > >> #define ASSERT_PTHREAD_ABORT(func, ret) do {                            \
> > >>    char errbuf[AV_ERROR_MAX_STRING_SIZE] = "";                         \
> > >>    av_log(NULL, AV_LOG_FATAL, AV_STRINGIFY(func)                       \
> > >>           " failed with error: %s\n",                                  \
> > >>           av_make_error_string(errbuf, AV_ERROR_MAX_STRING_SIZE,       \
> > >>                                AVERROR(ret)));                         \
> > >>    abort();                                                            \
> > >> } while (0)
> > >>
> > >> I think the check is there just to prevent call pthread_join(0,
> > >> &ptr) by
> > accident,
> > >> so we shouldn’t do that on purpose.
> > >>
> > > Nice catch with configure assert level > 1, will fix, and patch also
> > > welcome,
> > thanks.
> >
> > If I read the code correctly, it destroy a thread and create a new
> > thread for each frame. I think this “async” mode isn’t common in
> > ffmpeg’s design. Create new thread for each frame can be heavy on some
> > platforms. We use slice threading to improve parallel, and thread with
> > command queue to improve throughput. In this case with tensorflow do
> > the heavy lift, if it doesn’t support async operation, simple
> > synchronous operation with tensorflow backend should be find. The
> > “async” mode is unnecessary and use more resource  over the benefit it
> > provides.
> 
> I think we need to keep async support.
> 1. Some model cannot make full use of resource. This may be caused by
> tensorflow implementation or by model design. Async has benefit on this
> situation.
> 2. Async helps to build pipeline. You don't need to wait the output. If a
> "synchronous" filter followed by another "synchronous" filter, it can be the
> bottle neck of the whole pipeline.
> 
> The benefit on these two situations will be more obvious if model is running
> on GPU.( Tensorflow has not added device support yet.)

Yes, the async mode (even with current vanilla implementation) helps performance 
with the overlap of CPU and GPU. By offloading the dnn filter to GPU, the CPU can 
do other things at the same time.

For tensorflow backend, to running on GPU, just download and use the GPU version
of tensorflow c lib, no need to set with any option.