[FFmpeg-devel] [RFC] FFmpeg libavcodec/crystalhd.c: Optimize for reduced latency

thomas schorpp thomas.schorpp at gmail.com
Wed Jan 30 00:25:14 CET 2013


Hello,

1.
Are You all with vdpau now or was it the kernel crashing driver bug why most crystal HD projects dropped now for about a year?
Patch attached. Seems to work, no more kernel crashes so far.
Broadcom authors and owner of crystalhd @git.linuxtv.org (latest codebase) not responding, google group left, anybody out there?

2.
BCM70012 is running at ~30...40% cpu load with Latencytop measured up to 70% "Application requested delay" using ffmpeg but
only at ~10% cpu load under winxp here , so the device is not the problem, not much better with BCM70015,
so the current performance of crystal HD support is only about 20% above the ffmpeg software h.264 decoder on a SSE2/3 2 core cpu
and ~50% *below* at transcoding and filtering and does not work with Bino, VLC, gstreamer or the NOAD VDR addon, VDR markad seems to work.

3.
http://article.gmane.org/gmane.comp.video.ffmpeg.devel/126546 (PATCH V3 is too outdated, ~2 years, new thread, cc'd Philip and Benoit
on responding their public comments requests in that old thread.)


> From: Philip Langdale <philipl <at> overt.org>
> Subject: Re: [PATCH] CrystalHD decoder support v3
> Newsgroups: gmane.comp.video.ffmpeg.devel
> Date: 2011-02-08 05:44:18 GMT (1 year, 50 weeks, 6 days, 9 hours and 7 minutes ago)
>
> On Mon, 7 Feb 2011 09:01:23 +0100
> Benoit Fouet <benoit.fouet <at> free.fr> wrote:
>
[...]
>>
>> I'd like to hear from other people about those waits.

Responded.

>
> As would I. One thing worth mentioning is that the main challenge here
> is keeping the pipeline depth constant. That means it's really important
> to be able to return a field/frame for every decode() call that consumes
> a packet, once the pipeline has been established.

Achieved(?) yesterday

[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 11 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 11 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2
[h264_crystalhd @ 0xa78160] CrystalHD: Pipeline length: 27
[h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12 output_ready: 2

by

if (priv->output_ready < 29) {

moving sleeps around, trying conditional wait for ReadyListCount and a few more changes, like

/** Timeout parameter passed to DtsProcOutput() in ms */
#define OUTPUT_PROC_TIMEOUT <16(driver doc suggested, but breaks every use case)...50>
/** Step between fake timestamps passed to hardware in units of 100ns */
#define TIMESTAMP_UNIT 100000
/** Initial value in us of the wait in decode() */
#define BASE_WAIT <1000-10000>
/** Increment in us to adjust wait in decode() */
#define WAIT_UNIT <10...100>

but this breaks MPlayer(1) by "Too many buffered pts."


> If we don't use the
> waits as they currently are, we'd probably need to wait in a loop for a
> frame to become ready, so it ultimately adds up to the same thing.

I didn't see that in verbose logs with extra traces, looks like the code is likely very out of sync with the driver, xrun it and cause unessecary
latency and first trying to "close" the pipeline "control loop" trivial with

     if (priv->decode_wait > BASE_WAIT)
         priv->decode_wait -= (WAIT_UNIT/<20...100>);
     return len;
}

#define BASE_WAIT 1000
/** Increment in us to adjust wait in decode() */
#define WAIT_UNIT <10...100>

I believed first to have found out that the Broadcom kernel driver/lib needs a call to DtsGetDriverStatus for every subsequent usage of
DtsProcInput/Output, etc, otherwise it would not update ReadyListCount, but the above log should show ReadyListCount does not work reliable with

if (priv->output_ready < 5 ) {

> The
> one advantage of the current approach is that it doesn't require an
> explicit decision to be made as to when the pipeline is established, it
> naturally stabilises.

At the price of far too high latency, the control loop needs not only to be closed we need something from the driver to sync "in phase" like a PLL in electronics.

Then I've tried

-        if (len < tx_free - 1024 ) {
+       if ( FreeListCount > 0 ) {
            /*
              * Despite being notionally opaque, either libcrystalhd or
              * the hardware itself will mangle pts values that are too

	uint8_t		FreeListCount;	/* Number of frame buffers free.  (reported by driver) */


but failed too.

I'll try my luck with one of

	uint32_t	InputBusyCount;	/* Times compressed video has attempted to be sent to the HW
					 * but the input FIFO was full. (reported by DIL) */

	uint32_t	PIBMissCount;	/* Amount of times a PIB is invalid. (reported by DIL) */

	uint32_t	cpbEmptySize;	/* supported only for H.264, specifically changed for
					 * SingleThreadedAppMode. Report size of CPB buffer available.
					 * Reported by DIL */

and looking at the output of

grep -nir sleep /usr/local/src/crystalhd

shows a similar "design", so further "analysis" may result in that I'm possibly getting this issue on from the wrong end.
I fear we've to port the Broadcom driver to the new staging kernel driver and possibly rewrite/optimize libcrystalhd3, too,

If someone wants more details on my workflow, tell my how to get the redo/undo history from gedit, I'm not really "used" to git yet.

Is something like pthread_cond_timedwait() or more efficient thread control supported in FFmpeg?

My current "MPlayer/Bino/ffmpeg, etc, compromise" version ist attached, intended mismatched patch header so noone (auto-)applies it accidently.

4.
Is the LGPL issue still resolved? But never for this kernel driver (besides possible "quality" and coding style issues for kernel maintainers?)?

If someone (Media kernel driver or ffmpeg skilled main profession "C- coder") want's to lead but got no hardware, I've got 1 BCM70012 left over,
get it at registered postage cost (Europe only), or get one for ~10US$ at Ebay China/HK, *none* of 3 ordered arrived DOA here, someone claimed on this list, BTW.

y
tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crystalhd-nullpointer-bugfix.schorpp.01.patch
Type: text/x-diff
Size: 819 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130130/8232dce9/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crystalhd-latencyopt-experimental.01.schorpp.patch
Type: text/x-diff
Size: 6019 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130130/8232dce9/attachment-0001.bin>


More information about the ffmpeg-devel mailing list