[FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch

James Almer jamrial at gmail.com
Wed Jul 30 01:39:18 EEST 2025


On 7/29/2025 5:56 PM, Kacper Michajlow wrote:
> On Tue, 29 Jul 2025 at 22:11, James Almer <jamrial at gmail.com> wrote:
>>
>> On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
>>> Hello,
>>>
>>> It seem there is strong evidence that AI wrote TLS code as part of the
>>> WHIP patch. It goes without saying why this is bad. Further discussion
>>> here:
>>> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
>>>
>>> This patch was pushed without ML review.
>>>
>>> I think this code should be removed before the FFmpeg release. I
>>> include TC in this email for that reason.
>>
>> The UTF8 dashes are not so much an indication of LLM output but one that
>> it was written with an unusual locale, I'd say.
> 
> I disagree. I wouldn't call out AI if there wouldn't be a good
> indication that this is where those hyphens came from. I tested many
> LLMs to evaluate their usefulness and this is the kind of thing that
> they love to insert even in code. I would expect any developer (even
> natively using different locale) to use - in the .c file, after all
> this is a common token in the code too.
> 
> Additionally, now I see there is also an ’ (0x2019) few lines below in
> `to a av_malloc’d PEM string.` Which is also something that LLMs love
> to insert. I can even just now remove those comments and ask one of
> the biggest LLM to comment on the code to reproduce the same 0x2019
> being inserted.

Alright, i was not aware this was common behavior of LLMs.

> 
> Lastly, the strong indication of LLM are dummy comments for every
> operation. LLMs love to explain themselves. Comments in code are very
> useful tools, but you don't have to comment every function call and
> every label. IMHO it adds more noise than information, SNR is
> important. It's harmless, but look at pkey_to_pem_string() and tell me
> it really is organic to add `// Copy data & NUL-terminate` to a memcpy
> call. Again I can reproduce this with quaring LLM to do so.

This one i know is common.

> 
> I'm not saying we should revert this code, but a good review would be
> in-order to ensure we are not shipping something bad in there.
I however am saying we should revert it in the release/8.0 branch after 
it's made and before the release is tagged. A proper review can happen 
in the master branch without the risk of realizing we shipped dubious 
code in a tarball.

> 
> Note that my intention was not to start some big discussion, just
> clean the file from unnecessary similar looking utf-8 characters. I'm
> not opposed to AI/LLM use, but their output should be heavily
> sanitized as they are not reliable on their own.
> 
> - Kacper
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20250729/54f6a619/attachment.sig>


More information about the ffmpeg-devel mailing list