[FFmpeg-devel] soundtouch filter?

Tue May 15 08:30:32 CEST 2012

On 4/29/12 2:56 PM, Pavel Koshevoy wrote:
> On 04/29/2012 09:50 AM, Reimar Döffinger wrote:
>> Hello Pavel,
>> it seems I can't send anything to the list at the moment, so direct
>> answer instead...
>>
>> On Sun, Apr 29, 2012 at 08:36:02AM -0600, Pavel Koshevoy wrote:
>
> <snip>
>
>>> SoundTouch is LGPL.  However, I am fine with GPL.  What existing
>>> avfilter code is a good reference for me to use for scaletempo port?
>> I don't really know, but af_aresample does the kind of pts fiddling you
>> asked about and is not that large since most of the "heavy lifting" 
>> code is in
>> libswresample.
>> So that might be a good starting point.
>
>
> OK, I had a quick look at scaletempo, it implements WSOLA.  I had a 
> look at SoundTouch, it appears to implement SOLA.  I looked at WSOLA 
> paper, it's not terribly enlightening.  This article  
> http://www.surina.net/article/time-and-pitch-scaling.html by 
> SoundTouch author is much easier to understand.  I am thinking I may 
> not need to port this filter from mplayer, I'll try to implement it 
> from scratch first, maybe use ffmpeg rdft functions for cross 
> correlation calculation.
>
> It also occurs to me that this filter needs to modify timestamps not 
> only for audio but also for video and subtitles, otherwise they'll go 
> out of sync.  Is this going to be a problem?
>
> Thank you,
>     Pavel.

OK, I now have my own implementation of WSOLA filter.  It didn't use 
cross-correlation for audio fragment alignment, I've used a 
multi-resolution pyramid registration approach instead for performance 
reasons -- O(N).

My implementation is just a couple of C++ template classes parameterized 
by the sample type (unsigned char, short int, int, float).  The filter 
supports multi-channel audio.  I've already integrated it into my ffmpeg 
based video player, but not as an avfilter (because I don't want that 
overhead).

The filter files are here:
http://apprenticevideo.svn.sourceforge.net/viewvc/apprenticevideo/trunk/apprenticevideo/yaeAudioTempoFilter.h
http://apprenticevideo.svn.sourceforge.net/viewvc/apprenticevideo/trunk/apprenticevideo/yaeAudioFragment.h

So, is it worth trying to wrap it as an avfilter and add it to ffmpeg?

BTW, my filter doesn't sound the same as mplayers scaletempo.  At 0.5 
tempo I think mine sounds better (that's subjective), but at 0.9 
scaletempo sounds better, so that's something else to consider.  The 
difference may be due to the segment alignment algorithm choices.  I may 
try to implement cross-correlation via FFT some time later to make it a 
fair comparison.

     Pavel.