[FFmpeg-user] Programmatically detecting 'busiest' parts of a video

inhahe inhahe at gmail.com
Tue Aug 6 18:19:44 EEST 2024


On Tue, Aug 6, 2024 at 8:41 AM Rob Hallam <ffmpeg at roberthallam.com> wrote:

> Hi,
>
> I'd like to programmatically detect the 'busiest' parts of a video- ie
> the most visually active areas. I am leaving audio aside for the
> purposes of considering this.
>
> I figured it might be possible by looking at one / more of:
>
>  - the bitrate for VBR videos -- a higher bitrate for a given segment
> will tend to be associated with more activity (other things being
> equal)
>  - frame differences -- count pixels/blocks which differ, average over
> a time segment
>  - optical flow? (higher flow values = more activity) I'm not too
> familiar with this
>
> I'm not sure if there are better approaches, or if here is even the
> right place to ask but figured the expertise of folks on the list
> would be a good place to start.
>
> Are there ways of doing this efficiently with ffmpeg or libav? I would
> trade precision for speed!
>
> Thanks for your time,
> Rob
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
>

The first thing that came to mind when I read your question was your first
suggestion - observing the bitrate over time of the VBR-encoded video. That
seemed to me to be the simplest way that would be pretty accurate. (I know
nothing about this field of programming, though.)
The second thing I thought of was your second suggestion, but I figured
it's inadequate because "superficial" changes can cause every pixel or
large proportions of pixels to be different over time. For example, the
camera moves by a tenth of a degree, or an object moves a half an inch, and
all those pixels will be totally different. I wouldn't try this way.
I just looked up "optical flow," and it seems to be more or less a
description of why the second approach wouldn't work, the reasons I just
mentioned. It seems to be more or less the very thing you'd want to remove
from the equation when determining how busy a part of a video is, the
effect that you'd eliminate when applying the first method mentioned,
observing encoded bitrate...


More information about the ffmpeg-user mailing list