[FFmpeg-devel] [PATCH 1/3][GSoC] Add mutithread function for dnn_backend_native_layer_conv2d.c

Xu Jun xujunzz at sjtu.edu.cn
Tue Sep 1 17:35:39 EEST 2020


Hi, Mark

----- Original Message -----
> From: "Mark Thompson" <sw at jkqxz.net>
> To: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org>
> Sent: Tuesday, September 1, 2020 4:41:06 AM
> Subject: Re: [FFmpeg-devel] [PATCH 1/3][GSoC] Add mutithread function for dnn_backend_native_layer_conv2d.c

> On 31/08/2020 18:03, xujunzz at sjtu.edu.cn wrote:
>> From: Xu Jun <xujunzz at sjtu.edu.cn>
>> 
>> Use pthread to multithread dnn_execute_layer_conv2d.
>> Can be tested with command "./ffmpeg_g -i input.png -vf \
>> format=yuvj420p,dnn_processing=dnn_backend=native:model= \
>> espcn.model:input=x:output=y -y sr_native.jpg -benchmark"
>> 
>> before patch: utime=11.238s stime=0.005s rtime=11.248s
>> after patch:  utime=20.817s stime=0.047s rtime=1.051s
> 
> Can you explain why it uses almost twice as much total CPU time after the patch?
> That seems rather more than can be explained away as scheduling overhead.
> 
> If it's actually doing significantly more then maybe you want to document
> somewhere that enabling threading will improve latency at the cost of
> throughput.

I have done some test and find that utime is strongly correlated with CPU HyperThreading technology.

When I turn off my CPU HyperThreading technology using command "echo off > /sys/devices/system/cpu/smt/control" in root user, the utime gets stable whatever the number of threads I have created, and is same to that before patch.

When CPU HyperThreading technology is on, once the number of threads I create gets close to physical cores' number my cpu has, or even bigger, the utime will get bigger simultaneously. When I use as many threads as the logical cores' number of my cpu, the utime will be twice of that before patch.

Therefore, I think HyperThreading technology make the logical cores twice the physical cores while the counting power is not twiced. And for ffmpeg utime, it sums all logical cores' runtime. So it seems to be twice of that before patch.

In the next version, I will open an API for user to choose how many threads to use in native backend. And I'm going to set the default threads number to physical cores' number - 1 in order to get better performance while not increasing utime much on the plantforms which support HyperThreading.

As for the rtime, setting threads' number to logical cores - 1 will get about 20%-30% performance improvement over setting threads' number to physical cores - 1 in my test.

- Xu Jun

> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".



More information about the ffmpeg-devel mailing list