[FFmpeg-devel] [PATCH] Whisper audio filter

Vittorio Palmisano vpalmisano at gmail.com
Fri Jul 11 11:41:04 EEST 2025


> > +
> > +    memcpy(wctx->audio_buffer, wctx->audio_buffer + end_pos,
> > +           end_pos * sizeof(float));
>
> sizeof(*wctx->audio_buffer) is more robust than float

But end_pos is not necessarily equal to the audio_buffer size, it
could be lower.

>
> not sure how others think of this, but i would ignore the 80 char limit and format this like:
>
> static const AVOption whisper_options[] = {
>     { "model"   , "Path to the whisper.cpp model file"                 , OFFSET(model_path), AV_OPT_TYPE_STRING,.flags = FLAGS },
>     { "language", "Language for transcription ('auto' for auto-detect)", OFFSET(language)  , AV_OPT_TYPE_STRING, {.str = "auto"},             .flags = FLAGS },

I've used `indent -i4 -kr -nut` to format the code.

>
> Also it seems, this is alot slower than whisper-cli
>
> time whisper-cli  matrix.wav -m ~/whisper.cpp/models/ggml-base.en.bin  --output-srt
> real    0m16,283s
> user    1m3,644s
> sys     0m0,581s
>
>
> time ./ffmpeg -v 99 -i matrix.wav -af "aformat=sample_rates=16000:channel_layouts=mono,whisper=model=/home/michael/whisper.cpp/models/ggml-base.en.bin:language=en:queue=3000:destination=output.srt:format=srt" -f null - 2> /tmp/log
> real    1m30,827s
> user    6m0,590s
> sys     0m0,756s
>

Tested with: https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/kt.mp4
(and you need to increase the queue param to obtain a fair
comparison):

ffmpeg -loglevel info -i ~/Videos/kt.mp4 -vn -af
"aformat=sample_rates=16000:channel_layouts=mono,whisper=model=../whisper.cpp/models/ggml-medium.bin:language=en:queue=60000:destination=/tmp/output.srt:format=srt"
-f null -
real    0m7.998s
user    0m7.552s
sys 0m0.776s

whisper-cli  ~/Videos/kt.mp4 -m ../whisper.cpp/models/ggml-medium.bin
--output-srt
real    0m8.067s
user    0m8.282s
sys 0m0.887s


More information about the ffmpeg-devel mailing list