[FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
Timo Rothenpieler
timo at rothenpieler.org
Wed Jul 30 01:59:48 EEST 2025
On 7/30/2025 12:39 AM, James Almer wrote:
> On 7/29/2025 5:56 PM, Kacper Michajlow wrote:
>> On Tue, 29 Jul 2025 at 22:11, James Almer <jamrial at gmail.com> wrote:
>>>
>>> On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
>>>> Hello,
>>>>
>>>> It seem there is strong evidence that AI wrote TLS code as part of the
>>>> WHIP patch. It goes without saying why this is bad. Further discussion
>>>> here:
>>>> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
>>>>
>>>> This patch was pushed without ML review.
>>>>
>>>> I think this code should be removed before the FFmpeg release. I
>>>> include TC in this email for that reason.
>>>
>>> The UTF8 dashes are not so much an indication of LLM output but one that
>>> it was written with an unusual locale, I'd say.
>>
>> I disagree. I wouldn't call out AI if there wouldn't be a good
>> indication that this is where those hyphens came from. I tested many
>> LLMs to evaluate their usefulness and this is the kind of thing that
>> they love to insert even in code. I would expect any developer (even
>> natively using different locale) to use - in the .c file, after all
>> this is a common token in the code too.
>>
>> Additionally, now I see there is also an ’ (0x2019) few lines below in
>> `to a av_malloc’d PEM string.` Which is also something that LLMs love
>> to insert. I can even just now remove those comments and ask one of
>> the biggest LLM to comment on the code to reproduce the same 0x2019
>> being inserted.
>
> Alright, i was not aware this was common behavior of LLMs.
>
>>
>> Lastly, the strong indication of LLM are dummy comments for every
>> operation. LLMs love to explain themselves. Comments in code are very
>> useful tools, but you don't have to comment every function call and
>> every label. IMHO it adds more noise than information, SNR is
>> important. It's harmless, but look at pkey_to_pem_string() and tell me
>> it really is organic to add `// Copy data & NUL-terminate` to a memcpy
>> call. Again I can reproduce this with quaring LLM to do so.
>
> This one i know is common.
>
>>
>> I'm not saying we should revert this code, but a good review would be
>> in-order to ensure we are not shipping something bad in there.
> I however am saying we should revert it in the release/8.0 branch after
> it's made and before the release is tagged. A proper review can happen
> in the master branch without the risk of realizing we shipped dubious
> code in a tarball.
It has been heavily modified since, so I'm not sure reverting it is even
realistic.
There have been at least two major patch-sets on top of it by now.
I cleaned up tls_openssl, and then ePirat did as well (potentially
introducing a bug with the index exhaustion).
More information about the ffmpeg-devel
mailing list