[FFmpeg-devel] [PATCH] Whisper audio filter
Vittorio Palmisano
vpalmisano at gmail.com
Fri Jul 11 11:41:04 EEST 2025
> > +
> > + memcpy(wctx->audio_buffer, wctx->audio_buffer + end_pos,
> > + end_pos * sizeof(float));
>
> sizeof(*wctx->audio_buffer) is more robust than float
But end_pos is not necessarily equal to the audio_buffer size, it
could be lower.
>
> not sure how others think of this, but i would ignore the 80 char limit and format this like:
>
> static const AVOption whisper_options[] = {
> { "model" , "Path to the whisper.cpp model file" , OFFSET(model_path), AV_OPT_TYPE_STRING,.flags = FLAGS },
> { "language", "Language for transcription ('auto' for auto-detect)", OFFSET(language) , AV_OPT_TYPE_STRING, {.str = "auto"}, .flags = FLAGS },
I've used `indent -i4 -kr -nut` to format the code.
>
> Also it seems, this is alot slower than whisper-cli
>
> time whisper-cli matrix.wav -m ~/whisper.cpp/models/ggml-base.en.bin --output-srt
> real 0m16,283s
> user 1m3,644s
> sys 0m0,581s
>
>
> time ./ffmpeg -v 99 -i matrix.wav -af "aformat=sample_rates=16000:channel_layouts=mono,whisper=model=/home/michael/whisper.cpp/models/ggml-base.en.bin:language=en:queue=3000:destination=output.srt:format=srt" -f null - 2> /tmp/log
> real 1m30,827s
> user 6m0,590s
> sys 0m0,756s
>
Tested with: https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/kt.mp4
(and you need to increase the queue param to obtain a fair
comparison):
ffmpeg -loglevel info -i ~/Videos/kt.mp4 -vn -af
"aformat=sample_rates=16000:channel_layouts=mono,whisper=model=../whisper.cpp/models/ggml-medium.bin:language=en:queue=60000:destination=/tmp/output.srt:format=srt"
-f null -
real 0m7.998s
user 0m7.552s
sys 0m0.776s
whisper-cli ~/Videos/kt.mp4 -m ../whisper.cpp/models/ggml-medium.bin
--output-srt
real 0m8.067s
user 0m8.282s
sys 0m0.887s
More information about the ffmpeg-devel
mailing list