[FFmpeg-devel] [RFC] FFmpeg libavcodec/crystalhd.c: Optimize for reduced latency

Philip Langdale philipl at overt.org
Wed Jan 30 18:06:00 CET 2013


On Wed, 30 Jan 2013 00:25:14 +0100
thomas schorpp <thomas.schorpp at gmail.com> wrote:

> Hello,
> 
> 1.
> Are You all with vdpau now or was it the kernel crashing driver bug
> why most crystal HD projects dropped now for about a year? Patch
> attached. Seems to work, no more kernel crashes so far. Broadcom
> authors and owner of crystalhd @git.linuxtv.org (latest codebase) not
> responding, google group left, anybody out there?

Hi Tom,

Yes, it's a bit of a dead community. I haven't had the time to look
seriously at CrystalHD work for a while. It's very time consuming and
frustrating at this point - as most of what you're doing is trying to
work around lack of information. You can't trust the driver/library to
tell you when a frame a ready, or when a new input packet is needed, or
whether an output frame is interlaced or not. 80%+ of crystalhd.c is
interlaced heuristics!

I had some contact with one of the Broadcom engineers back in 2011 and
he had said he'd try and get help for all this stuff but he never got
back to me and I had less and less time to work on it, so my efforts
tailed off. I got the 70015 working well enough for my needs so my
personal motivation to fix 70012, which has such different behaviour,
was low. I know it can be made to work (see gstreamer plugin, vlc, etc)
but it would be a lot of work to adapt the multi-threaded design to
an ffmpeg codec.

> 2.
> BCM70012 is running at ~30...40% cpu load with Latencytop measured up
> to 70% "Application requested delay" using ffmpeg but only at ~10%
> cpu load under winxp here , so the device is not the problem, not
> much better with BCM70015, so the current performance of crystal HD
> support is only about 20% above the ffmpeg software h.264 decoder on
> a SSE2/3 2 core cpu and ~50% *below* at transcoding and filtering and
> does not work with Bino, VLC, gstreamer or the NOAD VDR addon, VDR
> markad seems to work.

This is not too surprisingly, the sleep based approach I took will lead
to that kind of profile, but as I've discussed before, and somewhat in
the thread you quote, I was never able to get a delay-free
implementation to work due to not being able to trust the hardware to
say when input is needed or output is ready.

> 3.
> http://article.gmane.org/gmane.comp.video.ffmpeg.devel/126546 (PATCH
> V3 is too outdated, ~2 years, new thread, cc'd Philip and Benoit on
> responding their public comments requests in that old thread.)
> 
> 
> > From: Philip Langdale <philipl <at> overt.org>
> > Subject: Re: [PATCH] CrystalHD decoder support v3
> > Newsgroups: gmane.comp.video.ffmpeg.devel
> > Date: 2011-02-08 05:44:18 GMT (1 year, 50 weeks, 6 days, 9 hours
> > and 7 minutes ago)
> >
> > On Mon, 7 Feb 2011 09:01:23 +0100
> > Benoit Fouet <benoit.fouet <at> free.fr> wrote:
> >
> [...]
> >>
> >> I'd like to hear from other people about those waits.
> 
> Responded.
> 
> >
> > As would I. One thing worth mentioning is that the main challenge
> > here is keeping the pipeline depth constant. That means it's really
> > important to be able to return a field/frame for every decode()
> > call that consumes a packet, once the pipeline has been established.
> 
> Achieved(?) yesterday
> 
> [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay:
> 19000 us ReadyListCount: 11 output_ready: 2 [h264_crystalhd @
> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 11
> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
> delay: 19000 us ReadyListCount: 12 output_ready: 2
> 
> by
> 
> if (priv->output_ready < 29) {
> 
> moving sleeps around, trying conditional wait for ReadyListCount and
> a few more changes, like
> 
> /** Timeout parameter passed to DtsProcOutput() in ms */
> #define OUTPUT_PROC_TIMEOUT <16(driver doc suggested, but breaks
> every use case)...50> /** Step between fake timestamps passed to
> hardware in units of 100ns */ #define TIMESTAMP_UNIT 100000
> /** Initial value in us of the wait in decode() */
> #define BASE_WAIT <1000-10000>
> /** Increment in us to adjust wait in decode() */
> #define WAIT_UNIT <10...100>
> 
> but this breaks MPlayer(1) by "Too many buffered pts."

That means the pipeline is getting too long - MPlayer can handle
waiting 32 pictures before output (which is theoretically what you
will have with interlaced h.264 and 16 b-frames). Setting your
start point to 29 is why its hitting this.


> 
> > If we don't use the
> > waits as they currently are, we'd probably need to wait in a loop
> > for a frame to become ready, so it ultimately adds up to the same
> > thing.
> 
> I didn't see that in verbose logs with extra traces, looks like the
> code is likely very out of sync with the driver, xrun it and cause
> unessecary latency and first trying to "close" the pipeline "control
> loop" trivial with
> 
>      if (priv->decode_wait > BASE_WAIT)
>          priv->decode_wait -= (WAIT_UNIT/<20...100>);
>      return len;
> }
> 
> #define BASE_WAIT 1000
> /** Increment in us to adjust wait in decode() */
> #define WAIT_UNIT <10...100>
> 
> I believed first to have found out that the Broadcom kernel
> driver/lib needs a call to DtsGetDriverStatus for every subsequent
> usage of DtsProcInput/Output, etc, otherwise it would not update
> ReadyListCount, but the above log should show ReadyListCount does not
> work reliable with
> 
> if (priv->output_ready < 5 ) {
> 
> > The
> > one advantage of the current approach is that it doesn't require an
> > explicit decision to be made as to when the pipeline is
> > established, it naturally stabilises.
> 
> At the price of far too high latency, the control loop needs not only
> to be closed we need something from the driver to sync "in phase"
> like a PLL in electronics.
> 
> Then I've tried
> 
> -        if (len < tx_free - 1024 ) {
> +       if ( FreeListCount > 0 ) {
>             /*
>               * Despite being notionally opaque, either libcrystalhd
> or
>               * the hardware itself will mangle pts values that are
> too
> 
> 	uint8_t		FreeListCount;	/* Number of
> frame buffers free.  (reported by driver) */
> 
> 
> but failed too.
> 
> I'll try my luck with one of
> 
> 	uint32_t	InputBusyCount;	/* Times compressed
> video has attempted to be sent to the HW
> 					 * but the input FIFO was
> full. (reported by DIL) */
> 
> 	uint32_t	PIBMissCount;	/* Amount of times a PIB
> is invalid. (reported by DIL) */
> 
> 	uint32_t	cpbEmptySize;	/* supported only for
> H.264, specifically changed for
> 					 * SingleThreadedAppMode.
> Report size of CPB buffer available.
> 					 * Reported by DIL */
> 
> and looking at the output of
> 
> grep -nir sleep /usr/local/src/crystalhd
> 
> shows a similar "design", so further "analysis" may result in that
> I'm possibly getting this issue on from the wrong end. I fear we've
> to port the Broadcom driver to the new staging kernel driver and
> possibly rewrite/optimize libcrystalhd3, too,

Honestly, it's a little hard for me to understand all the various
things you've tried and the results of your experiments, but at least
you're looking at the right kind of areas to try and make the driver
work more reasonably.

> If someone wants more details on my workflow, tell my how to get the
> redo/undo history from gedit, I'm not really "used" to git yet.
> 
> Is something like pthread_cond_timedwait() or more efficient thread
> control supported in FFmpeg?

You can instantiate threads and try to do work on them, but the codec
API is synchronous - a single call passes in an input packet and expects
an output picture to be returned. Your wiggle room is limited to picture
reordering and non 1:1 due to interlacing. I'm pretty sure this means
that any attempt at multi-threading will not really lead very far
because you'll have to synchronize very aggressively, and can't do what
you really want.

> My current "MPlayer/Bino/ffmpeg, etc, compromise" version ist
> attached, intended mismatched patch header so noone (auto-)applies it
> accidently.

I'll try it out but I don't expect I'll have much to say any time soon,
given my other commitments.

> 4.
> Is the LGPL issue still resolved? But never for this kernel driver
> (besides possible "quality" and coding style issues for kernel
> maintainers?)?

The LGPL issue is resolved. Broadcom agreed to adjust the licence header
and I replaced the code I wanted to copy from them with calls into
existing ffmpeg code.

Thanks,

--phil


More information about the ffmpeg-devel mailing list