[FFmpeg-devel] [PATCH v1] lavfi: add nlmeans CUDA filter

Timo Rothenpieler timo at rothenpieler.org
Sat Aug 14 16:02:51 EEST 2021


On 14.08.2021 07:49, Dylan Fernando wrote:
> On Sat, Aug 14, 2021 at 9:11 AM Timo Rothenpieler <timo at rothenpieler.org>
> wrote:
> 
>> On 13.08.2021 10:42, Dylan Fernando wrote:
>>> Any update on this?
>>>
>>> Kind Regards,
>>> Dylan
>>
>> Also, are you sure that exp() function is correct?
>>
>> The CUDA-Function exp() is defined as "double exp(double x)" and
>> calculates the base e exponential.
>>
>> While __nvvm_ex2_approx_f reads to me like it does so for floats, and
>> for base 2. For which the CUDA equivalent would be "float exp2f(float)".
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
> 
> I wasn't sure about the exp() function. Is there a function like
> __nvvm_exp_approx_d? I can't seem to find a function for this.

Looking into it some more, that's simply because there is no other fast 
approx exp function than ex2.
If I use __expf() with nvcc, it spawns the following code:

	ld.param.f32 	%f1, [param];
	mul.f32 	%f2, %f1, 0f3FB8AA3B;
	ex2.approx.f32 	%f3, %f2;

So it multiplies the input value by some factor, and then runs it 
through it.
Given by math, this value must be log2(euler_constant), or log2(exp(1)), 
for lack of the constant being defined.

So the implementation of __expf() would look like this:

> static inline __device__ float __expf(float a) { return __nvvm_ex2_approx_f(a * (float)__builtin_log2(__builtin_exp(1))); }

With llvm, this now spawns the exact same code:

	ld.param.f32 	%f1, [param];
	mul.f32 	%f2, %f1, 0f3FB8AA3B;
	ex2.approx.f32 	%f3, %f2;


I will push that function soon, so you can just use __expf() in your 
code. Assuming you want exp to base e.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4494 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210814/ecbb0cb7/attachment.bin>


More information about the ffmpeg-devel mailing list