[MPlayer-dev-eng] [PATCH] Use posix_fadvise in stream_file if available
Reimar Döffinger
Reimar.Doeffinger at gmx.de
Mon Nov 9 15:26:16 CET 2009
On Mon, Nov 09, 2009 at 12:15:17PM +0100, Tobias Diedrich wrote:
> Benchmark:
> IO bound actions like stream copy or dumpstream are candidates for
> negative effects due to extra syscall overhead, so:
>
> dumpvideo with patch:
> Run1: 1m37.205s real, 1m20.357s system, 0m2.746s user
> Run2: 1m31.223s real, 1m19.923s system, 0m2.726s user
> This one is actually CPU-bound on this system.
>
> dumpvideo without patch:
> Run1: 0m54.749s real, 0m7.221s system, 0m4.676s user
> Run2: 0m55.752s real, 0m7.122s system, 0m4.811s user
> IO-bound
>
> So, the "don't call fadvise every time with the whole 1MB area" I
> had in it at first was not premature.
Yes it was, the fadvise at most doubles the number of syscalls, so
that alone could at most double system time.
Anything beyond that obviously _must_ be inherently due to fadvise
itself.
> Let's try a slightly modified patch that looks ahead PREFETCH_LEN
> and uses max_len for the length...
Is there a _really_ good reasoning for that and a complete explanation
for the numbers or are you just unknowingly optimizing for the interleaving
pattern of that specific file you tested?
Does the Linux kernel possibly handle overlapping fadives by re-reading
the data over and over again or something equally stupid?
Or did you not have added the fadvise after seek when you got those numbers?
Because I can't really see why the sequence
fadvise(start, PREFETCH_LEN)
fadvise(start + 2048, PREFETCH_LEN)
fadvise(start + 4096, PREFETCH_LEN)
....
should behave much worse than
fadvise(start, PREFETCH_LEN)
fadvise(start + PREFETCH_LEN, 2048)
fadvise(start + PREFETCH_LEN + 2048, 2048)
...
I guess increasing STREAM_BUFFER_SIZE to 4kB and using that instead of
max_len in fadvise would be ok and possibly more readable, too.
The name PREFETCH_LEN is bad though, it should probably be READAHEAD_DISTANCE.
> Unpatched:
> Run1: 0m52.213s real, 0m4.361s system, 0m2.019s user
> Run2: 0m51.481s real, 0m4.258s system, 0m2.093s user
>
> Patched:
> Run1: 0m48.986s real, 0m4.970s system, 0m2.056s user
> Run2: 0m48.376s real, 0m5.005s system, 0m1.975s user
>
> Yep, that looks more reasonable.
It makes the patch look unneeded.
And how does this compare to -cache? If it isn't better in
all cases I suspect there's something wrong...
Playing something from a slow disk/device while also decoding
might give something useful (sshfs over wireless maybe?).
More information about the MPlayer-dev-eng
mailing list