Following up a question on StackOverflow which explains the "problem" in 
more details:

Short version: extracting audio from a non-interleaved AVI is slow, the 
whole file (i.e. video stream) has to be read to get to the audio (I 
assume this is what's happening).
On a properly interleaved AVI it's near instantaneous.
The question revolves around the fact that AviSynth can do it almost 
immediately, even on non-interleaved files.

I'm curious to know if anyone has knowledge about this and if it would 
be conceivable for FFmpeg to do it as quickly as AviSynth.
Of course I know non-interleaved AVI are not ideal but the fact is I 
cannot do anything about my source of files.

