[FFmpeg-devel] [PATCH 1/3][GSoC] Add mutithread function for dnn_backend_native_layer_conv2d.c

Paul B Mahol onemda at gmail.com
Tue Sep 1 17:46:54 EEST 2020


On 9/1/20, Xu Jun <xujunzz at sjtu.edu.cn> wrote:
> Hi, Mark
>
> ----- Original Message -----
>> From: "Mark Thompson" <sw at jkqxz.net>
>> To: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org>
>> Sent: Tuesday, September 1, 2020 4:41:06 AM
>> Subject: Re: [FFmpeg-devel] [PATCH 1/3][GSoC] Add mutithread function for
>> dnn_backend_native_layer_conv2d.c
>
>> On 31/08/2020 18:03, xujunzz at sjtu.edu.cn wrote:
>>> From: Xu Jun <xujunzz at sjtu.edu.cn>
>>>
>>> Use pthread to multithread dnn_execute_layer_conv2d.
>>> Can be tested with command "./ffmpeg_g -i input.png -vf \
>>> format=yuvj420p,dnn_processing=dnn_backend=native:model= \
>>> espcn.model:input=x:output=y -y sr_native.jpg -benchmark"
>>>
>>> before patch: utime=11.238s stime=0.005s rtime=11.248s
>>> after patch:  utime=20.817s stime=0.047s rtime=1.051s
>>
>> Can you explain why it uses almost twice as much total CPU time after the
>> patch?
>> That seems rather more than can be explained away as scheduling overhead.
>>
>> If it's actually doing significantly more then maybe you want to document
>> somewhere that enabling threading will improve latency at the cost of
>> throughput.
>
> I have done some test and find that utime is strongly correlated with CPU
> HyperThreading technology.
>
> When I turn off my CPU HyperThreading technology using command "echo off >
> /sys/devices/system/cpu/smt/control" in root user, the utime gets stable
> whatever the number of threads I have created, and is same to that before
> patch.
>
> When CPU HyperThreading technology is on, once the number of threads I
> create gets close to physical cores' number my cpu has, or even bigger, the
> utime will get bigger simultaneously. When I use as many threads as the
> logical cores' number of my cpu, the utime will be twice of that before
> patch.
>
> Therefore, I think HyperThreading technology make the logical cores twice
> the physical cores while the counting power is not twiced. And for ffmpeg
> utime, it sums all logical cores' runtime. So it seems to be twice of that
> before patch.
>
> In the next version, I will open an API for user to choose how many threads
> to use in native backend. And I'm going to set the default threads number to
> physical cores' number - 1 in order to get better performance while not
> increasing utime much on the plantforms which support HyperThreading.

-threads option is already available for filters that use slice threads.

Make sure that your threads do not share same memory for reading/writting.

>
> As for the rtime, setting threads' number to logical cores - 1 will get
> about 20%-30% performance improvement over setting threads' number to
> physical cores - 1 in my test.
>
> - Xu Jun
>
>>
>> - Mark
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list