[FFmpeg-devel] [PATCH] avfilter/alimiter: Add "flush_buffer" option to flush the remaining valid data to the output

Paul B Mahol onemda at gmail.com
Tue Apr 12 22:42:16 EEST 2022


On Mon, Apr 11, 2022 at 10:59 PM Wang Cao <wangcao-at-google.com at ffmpeg.org>
wrote:

> On Sat, Apr 9, 2022 at 5:35 AM Paul B Mahol <onemda at gmail.com> wrote:
>
> > On Fri, Apr 8, 2022 at 10:41 PM Wang Cao <
> wangcao-at-google.com at ffmpeg.org
> > >
> > wrote:
> >
> > > On Fri, Apr 8, 2022 at 11:40 AM Paul B Mahol <onemda at gmail.com> wrote:
> > >
> > > > On Thu, Apr 7, 2022 at 11:56 PM Wang Cao <
> > > wangcao-at-google.com at ffmpeg.org
> > > > >
> > > > wrote:
> > > >
> > > > > On Thu, Apr 7, 2022 at 12:44 AM Paul B Mahol <onemda at gmail.com>
> > wrote:
> > > > >
> > > > > > On Wed, Apr 6, 2022 at 1:49 PM Paul B Mahol <onemda at gmail.com>
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 5, 2022 at 8:57 PM Wang Cao <
> > > > > > wangcao-at-google.com at ffmpeg.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> On Mon, Apr 4, 2022 at 3:28 PM Marton Balint <cus at passwd.hu>
> > > wrote:
> > > > > > >>
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Mon, 4 Apr 2022, Paul B Mahol wrote:
> > > > > > >> >
> > > > > > >> > > On Sun, Mar 27, 2022 at 11:41 PM Marton Balint <
> > cus at passwd.hu
> > > >
> > > > > > wrote:
> > > > > > >> > >
> > > > > > >> > >>
> > > > > > >> > >>
> > > > > > >> > >> On Sat, 26 Mar 2022, Wang Cao wrote:
> > > > > > >> > >>
> > > > > > >> > >>> The change in the commit will add some samples to the
> end
> > of
> > > > the
> > > > > > >> audio
> > > > > > >> > >>> stream. The intention is to add a "zero_delay" option
> > > > eventually
> > > > > > to
> > > > > > >> not
> > > > > > >> > >>> have the delay in the begining the output from alimiter
> > due
> > > to
> > > > > > >> > >>> lookahead.
> > > > > > >> > >>
> > > > > > >> > >> I was very much suprised to see that the alimiter filter
> > > > actually
> > > > > > >> delays
> > > > > > >> > >> the audio - as in extra samples are inserted in the
> > beginning
> > > > and
> > > > > > >> some
> > > > > > >> > >> samples are cut in the end. This trashes A-V sync, so it
> > is a
> > > > bug
> > > > > > >> IMHO.
> > > > > > >> > >>
> > > > > > >> > >> So unless somebody has some valid usecase for the legacy
> > way
> > > of
> > > > > > >> > operation
> > > > > > >> > >> I'd just simply change it to be "zero delay" without any
> > > > > additional
> > > > > > >> user
> > > > > > >> > >> option, in a single patch.
> > > > > > >> > >>
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > This is done by this patch in very complicated way and
> also
> > it
> > > > > > really
> > > > > > >> > > should be optional.
> > > > > > >> >
> > > > > > >> > But why does it make sense to keep the current (IMHO buggy)
> > > > > > operational
> > > > > > >> > mode which adds silence in the beginning and trims the end?
> I
> > > > > > understand
> > > > > > >> > that the original implementation worked like this, but
> > > libavfilter
> > > > > has
> > > > > > >> > packet timestamps and N:M filtering so there is absolutely
> no
> > > > reason
> > > > > > to
> > > > > > >> > use an 1:1 implementation and live with its limitations.
> > > > > > >> >
> > > > > > >> Hello Paul and Marton, thank you so much for taking time to
> > review
> > > > my
> > > > > > >> patch.
> > > > > > >> I totally understand that my patch may seem a little bit
> > > complicated
> > > > > > but I
> > > > > > >> can
> > > > > > >> show with a FATE test that if we set the alimiter to behave
> as a
> > > > > > >> passthrough filter,
> > > > > > >> the output frames will be the same from "framecrc" with my
> > patch.
> > > > The
> > > > > > >> existing
> > > > > > >> behavior will not work for all gapless audio processing.
> > > > > > >>
> > > > > > >> The complete patch to fix this issue is at
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220330210314.2055201-1-wangcao@google.com/
> > > > > > >>
> > > > > > >> Regarding Paul's concern, I personally don't have any
> preference
> > > > > whether
> > > > > > >> to
> > > > > > >> put
> > > > > > >> the patch as an extra option or not. With respect to the
> > > > > implementation,
> > > > > > >> the patch
> > > > > > >> is the best I can think of by preserving as much information
> as
> > > > > possible
> > > > > > >> from input
> > > > > > >> frames. I also understand it may break concept that
> > "filter_frame"
> > > > > > outputs
> > > > > > >> one frame
> > > > > > >> at a time. For alimiter with my patch, depending on the size
> of
> > > the
> > > > > > >> lookahead buffer,
> > > > > > >> it may take a few frames before one output frame can be
> > generated.
> > > > > This
> > > > > > is
> > > > > > >> inevitable
> > > > > > >> to compensate for the delay of the lookahead buffer.
> > > > > > >>
> > > > > > >> Thanks again for reviewing my patch and I'm looking forward to
> > > > hearing
> > > > > > >> from
> > > > > > >> you :)
> > > > > > >>
> > > > > > >
> > > > > > > Better than (because its no more 1 frame X nb_samples in, 1
> > frame X
> > > > > > > nb_samples out) to replace .filter_frame/.request_frame with
> > > > .activate
> > > > > > > logic.
> > > > > > >
> > > > > > > And make this output delay compensation filtering optional.
> > > > > > >
> > > > > > > In this process make sure that output PTS frame timestamps are
> > > > > unchanged
> > > > > > > from input one, by keeping reference of needed frames in filter
> > > > queue.
> > > > > > >
> > > > > > > Look how speechnorm/dynaudnorm does it.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Alternatively, use current logic in ladspa filter, (also add
> option
> > > > with
> > > > > > same name).
> > > > > >
> > > > > > This would need less code, and probably better approach, as there
> > is
> > > no
> > > > > > extra latency introduced.
> > > > > >
> > > > > > Than mapping 1:1 between same number of samples per frame is not
> > hold
> > > > any
> > > > > > more, but I think that is not much important any more.
> > > > > >
> > > > > Thank you for replying to me with your valuable feedback! I have
> > > checked
> > > > > af_ladspa
> > > > > and the "request_frame" function in af_ladspa looks similar to what
> > I'm
> > > > > doing. The
> > > > > difference comes from the fact that I need an internal frame buffer
> > to
> > > > keep
> > > > > track of
> > > > > output frames. Essentially I add a frame to the internal buffer
> when
> > an
> > > > > input frame
> > > > > comes in. The frames in this buffer will be the future output
> frames.
> > > We
> > > > > start writing
> > > > > these output frame data buffers only when the internal lookahead
> > buffer
> > > > has
> > > > > been filled.
> > > > > When there is no more input frames, we flushing all the remaining
> > data
> > > in
> > > > > the internal
> > > > > frame buffers and lookahead buffers. Can you advise on my approach
> > > here?
> > > > > Thank you!
> > > > >
> > > > > I can put my current implementation of "filter_frame" and
> > > "request_frame"
> > > > > into the "activate" approach as you suggested with
> > > speechnorm/dynaudnorm.
> > > > >
> > > >
> > > > No need to wait for all buffers to fill up, only lookahead buffer.
> > > >
> > > > Just trim initial samples that is size of lookahead, and than start
> > > > outputing samples
> > > > just once you get whatever modulo of current frame number of samples.
> > > >
> > > > So there should not be need for extra buffers to keep audio samples.
> > > > Just buffers to keep input pts and number of samples of previous
> input
> > > > frames, like in ladspa filter.
> > > >
> > > Thank you for the suggestion! From my understanding, we have two ways
> to
> > > achieve
> > > "zero_delay" functionality here.
> > >
> > > Option 1: as you mentioned, we can trim the initial samples of
> lookahead
> > > size.
> > > The size of the lookahead buffer can be multiple frames. For example
> when
> > > the
> > > attack is 0.08 second, sample rate is 44100 and frame size is 1024, the
> > > lookahead
> > > buffer size is about 3 frames so the filter needs to see at least 3
> input
> > > audio frames
> > > before it can output one audio frame. We also need to make assumptions
> > > about the
> > > size of the audio frame (meaning the number of audio samples per frame)
> > > when flushing.
> > > The frame is probably 1024 conventionally but I think it's better to
> make
> > > less assumption
> > > as possible to allow the filter to be used as flexible as possible.
> > >
> > > Option 2: this is what I proposed before. We basically map the same
> > number
> > > of input
> > > frames to the output and we also make sure everything about the frame
> the
> > > same as
> > > the input except for the audio signal data itself, which will be
> changed
> > by
> > > whatever
> > > processing alimiter has to do with that. I think it is safer to make
> the
> > > filter only work on
> > > the signal itself. It can help other people who use this filter without
> > > worrying about
> > > any unexpected change on the frame. The downside is that the filter
> > > internally needs to
> > > store some empty frames, which will be written as the lookahead buffer
> is
> > > filled.
> > >
> > > I don't see any performance difference between these two options but
> > option
> > > 2 looks
> > > better to me because it works solely on the signals without any changes
> > on
> > > the frame
> > >
> >
> > option 1 does not add extra delay in processing chain at all, and option
> 2
> > adds extra delay.
> >
> > Just look at latest version of af_ladspa.c filter code.
> >
> Hi Paul, I have checked af_ladspa filter. Option 1 certainly doesn't
> introduce any delay.
> Option 2 will at most introduce one audio frame of delay in the processing
> chain. Option
> 2 needs to fill the lookahead buffer and some frames of samples to push the
> data in the
> lookahead buffer out for one output audio frame. The processing delay is
> probably not a
> big concern.
>
> The reason I proposed option 2 is about the sync of metadata for the output
> frames:
>
> It appears to me that the only metadata we need to worry about is PTS and
> the number of
> samples in af_ladspa. However, AVFrame has other fields that also seem to
> specify some metadata
> for the frame:
> 1. AVDictionary *metadata
> 2. void *opaque (this seems to be client-specific)
> 3. AVFrameSideData** side_data (this is not handled by
> av_frame_copy_props).
> If we go for option 1, it is likely these fields in the input frame will
> not be mapped to the corresponding
> output frames. I believe this is also an unexpected behavior for other
> users who rely on these fields. I
> understand other filters may not consider this as important but
> conceptually I believe it is better to make
> the filter focus on changing what it is supposed to, which is the audio
> signal itself.
>
> Looking forward to hearing your opinion on this, thanks!
>


AVFrame Metadata is mostly useful for cases when input frame after filtering
with filter that adds metadata no longer changes output frames later in
filter graph.
Thus using something like astats=metadata=1,alimiter is not very useful and
logical.

Why by, lightly insisting on no extra delay/latency introduction, and
prefer 1, option for alimiter filter
is mostly because in audio domain, latency is relevant and important
subject for most audio processing specially for limiters.
So it is very important to keep it at minimum if possible.


> --
>
> Wang Cao |  Software Engineer |  wangcao at google.com |  650-203-7807
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>


More information about the ffmpeg-devel mailing list