[FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

Tue Jan 12 23:13:01 EET 2021

That's great! Any way for me to pull that branch or otherwise 
contribute?

Have been using FFmpeg for a few years now, so hopping to be able to 
give back.

On Tue, Jan 12, 2021 at 5:55 am, Lynne <dev at lynne.ee> wrote:
> Jan 11, 2021, 23:27 by felix.leclair123 at hotmail.com 
> <mailto:felix.leclair123 at hotmail.com>:
> 
>>  Hi guys and gals, first post on this mailing list, apologies for 
>> any formatting/stylistic snafus
>> 
>>  TLDR; we currently have tone mapping filters (typically used to map 
>> content from a 10bit HDR source to an 8bit SDR output) that are done 
>> on CPU with Zscale from Zlib, or hardware implementations using 
>> VAAPI or OpenCL. Having a version implemented in CUDA would round 
>> out the main HWaccels types.
>> 
>>  Context:
>>   I'm a computer engineering student up in Canada with an interest 
>> in high efficiency distributed processing. As a personal project I'm 
>> trying to build a cluster of Nvidia Jetson Nano's to be able to 
>> handle a few dozen streams (mix of SD, HD, FHD, UHD, 4kHDR) at once 
>> while drawing south of 100W at peak. These little devices can do 
>> anywhere from 1 to 9 streams of content at a time depending on 
>> resolution/framerate in hardware in any mix of HEVC or H.264, so 3 
>> of them should get me most of the way to where I want to go (this 
>> would be a 30W package capable of ~12 2160p30 at 10 bit -> 1080p30 8bit 
>> streams).
>> 
>>  The issue is that, 4 little arm64 cores are just not going to be 
>> able to tonemap using Zscale in real time, even with the encoder and 
>> decoders sharing memory with the CPU (so no PCIe memcopy penalty). 
>> On the other hand, the built in GPU and the relative simplicity of 
>> most tone mapping algorithms (say hable) should make quick work of 
>> this. Unfortunately (or fortunately for me to learn with?) there 
>> isn't a CUDA version of the filter.
>> 
>>  Question/guidance:
>>  I've read through the doc on how to write filters, as well as 
>> looking at the other cuda filters currently in the source and have a 
>> general idea of where I'm going, but haven't been able to fully nail 
>> down how to access frames from hwupload_cuda passed to 
>> vf_tonemap_cuda.c which in turn passes that frame to 
>> vf_tonemap_cuda.cu for processing. I have a repo with everything 
>> I've been pulling together for my project, but the piece of interest 
>> is under */cuda_filter/ in the source tree. 
>> <<https://github.com/Camofelix/Jetson_ffmpeg_trancode_cluster/>>
>> 
>>  Would anyone mind helping me out with how to architect this?
>> 
> 
> The tonemap filter is just a (very old by now) copy of libplacebo's 
> tonemapping.
> No one has bothered to keep it in sync.
> I'm working on a libplacebo wrapper currently, so once that's merged 
> there
> will be up to date hardware tonemapping.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
> <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org 
> <mailto:ffmpeg-devel-request at ffmpeg.org> with subject "unsubscribe".