[FFmpeg-devel] [RFC] FFmpeg libavcodec/crystalhd.c: Optimize for reduced latency

thomas schorpp thomas.schorpp at gmail.com
Thu Jan 31 13:39:22 CET 2013


Hi Philip,

On 30.01.2013 18:06, Philip Langdale wrote:
> On Wed, 30 Jan 2013 00:25:14 +0100
> thomas schorpp <thomas.schorpp at gmail.com> wrote:
>
>> Hello,
>>
>> 1.
>> Are You all with vdpau now or was it the kernel crashing driver bug
>> why most crystal HD projects dropped now for about a year? Patch
>> attached. Seems to work, no more kernel crashes so far. Broadcom
>> authors and owner of crystalhd @git.linuxtv.org (latest codebase) not
>> responding, google group left, anybody out there?
>
> Hi Tom,
>
> Yes, it's a bit of a dead community. I haven't had the time to look
> seriously at CrystalHD work for a while. It's very time consuming and
> frustrating at this point - as most of what you're doing is trying to
> work around lack of information. You can't trust the driver/library to
> tell you when a frame a ready, or when a new input packet is needed, or
> whether an output frame is interlaced or not. 80%+ of crystalhd.c is
> interlaced heuristics!

Yes, looks like You can fill the pipeline with 100s of frames but ReadyListCount
does not increase >1 unless DtsProcOutputNoCopy is called.

My last control loop experiment is attached. -patch-

I've managed to raise transcoding performance with -vf mp=stereo3d from
15 up to 25 in a few cases with some settings here

//    if (priv->output_ready < 2) {
     if (decoder_status.FreeListCount > 2 && decoder_status.ReadyListCount < 1) {
//        if (decoder_status.ReadyListCount != 0)
//            priv->output_ready++;
         usleep(BASE_WAIT);
     av_log(avctx, AV_LOG_WARNING, "Filling pipeline. Current delay: %"PRIu64" us  FreeListCount %d ReadyListCount: %d \n", priv->decode_wait,  decoder_status.FreeListCount, decoder_status.ReadyListCount);
         return len;
     } else if (decoder_status.FreeListCount > 0 && decoder_status.ReadyListCount < 1) {
         /*
          * After the pipeline is established, if we encounter a lack of frames
          * that probably means we're not giving the hardware enough time to
          * decode them, so start increasing the wait time at the end of a
          * decode call.
          */
         usleep(BASE_WAIT);
         priv->decode_wait += WAIT_UNIT;
         av_log(avctx, AV_LOG_WARNING, "No frames ready. Current delay: %"PRIu64" us  FreeListCount %d ReadyListCount: %d \n", priv->decode_wait,  decoder_status.FreeListCount, decoder_status.ReadyListCount);
         return len;
     }

but video output stream was broken by interrupts and picture "salad" then in most cases.

>
> I had some contact with one of the Broadcom engineers back in 2011 and
> he had said he'd try and get help for all this stuff but he never got
> back to me and I had less and less time to work on it, so my efforts
> tailed off. I got the 70015 working well enough for my needs so my
> personal motivation to fix 70012, which has such different behaviour,
> was low. I know it can be made to work (see gstreamer plugin, vlc, etc)
> but it would be a lot of work to adapt the multi-threaded design to
> an ffmpeg codec.

All the Ebay Chinese sell Azurewave models, but nothing on their website, looks like off-market.

>
>> 2.
>> BCM70012 is running at ~30...40% cpu load with Latencytop measured up
>> to 70% "Application requested delay" using ffmpeg but only at ~10%
>> cpu load under winxp here , so the device is not the problem, not
>> much better with BCM70015, so the current performance of crystal HD
>> support is only about 20% above the ffmpeg software h.264 decoder on
>> a SSE2/3 2 core cpu and ~50% *below* at transcoding and filtering and
>> does not work with Bino, VLC, gstreamer or the NOAD VDR addon, VDR
>> markad seems to work.
>
> This is not too surprisingly, the sleep based approach I took will lead
> to that kind of profile, but as I've discussed before, and somewhat in
> the thread you quote, I was never able to get a delay-free
> implementation to work due to not being able to trust the hardware to
> say when input is needed or output is ready.

Yeah, I should read more in the driver and lib code, we'll see.

>
>> 3.
>> http://article.gmane.org/gmane.comp.video.ffmpeg.devel/126546 (PATCH
>> V3 is too outdated, ~2 years, new thread, cc'd Philip and Benoit on
>> responding their public comments requests in that old thread.)
>>
>>
>>> From: Philip Langdale <philipl <at> overt.org>
>>> Subject: Re: [PATCH] CrystalHD decoder support v3
>>> Newsgroups: gmane.comp.video.ffmpeg.devel
>>> Date: 2011-02-08 05:44:18 GMT (1 year, 50 weeks, 6 days, 9 hours
>>> and 7 minutes ago)
>>>
>>> On Mon, 7 Feb 2011 09:01:23 +0100
>>> Benoit Fouet <benoit.fouet <at> free.fr> wrote:
>>>
>> [...]
>>>>
>>>> I'd like to hear from other people about those waits.
>>
>> Responded.
>>
>>>
>>> As would I. One thing worth mentioning is that the main challenge
>>> here is keeping the pipeline depth constant. That means it's really
>>> important to be able to return a field/frame for every decode()
>>> call that consumes a packet, once the pipeline has been established.
>>
>> Achieved(?) yesterday
>>
>> [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current delay:
>> 19000 us ReadyListCount: 11 output_ready: 2 [h264_crystalhd @
>> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
>> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
>> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
>> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
>> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
>> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
>> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
>> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
>> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
>> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
>> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
>> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 11
>> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
>> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
>> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
>> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
>> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
>> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
>> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
>> delay: 19000 us ReadyListCount: 12 output_ready: 2 [h264_crystalhd @
>> 0xa78160] CrystalHD: Pipeline length: 27 [h264_crystalhd @ 0xa78160]
>> CrystalHD: Decoded OK. Current delay: 19000 us ReadyListCount: 12
>> output_ready: 2 [h264_crystalhd @ 0xa78160] CrystalHD: Pipeline
>> length: 27 [h264_crystalhd @ 0xa78160] CrystalHD: Decoded OK. Current
>> delay: 19000 us ReadyListCount: 12 output_ready: 2
>>
>> by
>>
>> if (priv->output_ready < 29) {
>>
>> moving sleeps around, trying conditional wait for ReadyListCount and
>> a few more changes, like
>>
>> /** Timeout parameter passed to DtsProcOutput() in ms */
>> #define OUTPUT_PROC_TIMEOUT <16(driver doc suggested, but breaks
>> every use case)...50> /** Step between fake timestamps passed to
>> hardware in units of 100ns */ #define TIMESTAMP_UNIT 100000
>> /** Initial value in us of the wait in decode() */
>> #define BASE_WAIT <1000-10000>
>> /** Increment in us to adjust wait in decode() */
>> #define WAIT_UNIT <10...100>
>>
>> but this breaks MPlayer(1) by "Too many buffered pts."
>
> That means the pipeline is getting too long - MPlayer can handle
> waiting 32 pictures before output (which is theoretically what you
> will have with interlaced h.264 and 16 b-frames). Setting your
> start point to 29 is why its hitting this.

I know. I've read all Your history on gmane.

>
>
>>
>>> If we don't use the
>>> waits as they currently are, we'd probably need to wait in a loop
>>> for a frame to become ready, so it ultimately adds up to the same
>>> thing.
>>
>> I didn't see that in verbose logs with extra traces, looks like the
>> code is likely very out of sync with the driver, xrun it and cause
>> unessecary latency and first trying to "close" the pipeline "control
>> loop" trivial with
>>
>>       if (priv->decode_wait > BASE_WAIT)
>>           priv->decode_wait -= (WAIT_UNIT/<20...100>);
>>       return len;
>> }
>>
>> #define BASE_WAIT 1000
>> /** Increment in us to adjust wait in decode() */
>> #define WAIT_UNIT <10...100>
>>
>> I believed first to have found out that the Broadcom kernel
>> driver/lib needs a call to DtsGetDriverStatus for every subsequent
>> usage of DtsProcInput/Output, etc, otherwise it would not update
>> ReadyListCount, but the above log should show ReadyListCount does not
>> work reliable with
>>
>> if (priv->output_ready < 5 ) {
>>
>>> The
>>> one advantage of the current approach is that it doesn't require an
>>> explicit decision to be made as to when the pipeline is
>>> established, it naturally stabilises.
>>
>> At the price of far too high latency, the control loop needs not only
>> to be closed we need something from the driver to sync "in phase"
>> like a PLL in electronics.
>>
>> Then I've tried
>>
>> -        if (len < tx_free - 1024 ) {
>> +       if ( FreeListCount > 0 ) {
>>              /*
>>                * Despite being notionally opaque, either libcrystalhd
>> or
>>                * the hardware itself will mangle pts values that are
>> too
>>
>> 	uint8_t		FreeListCount;	/* Number of
>> frame buffers free.  (reported by driver) */
>>
>>
>> but failed too.
>>
>> I'll try my luck with one of
>>
>> 	uint32_t	InputBusyCount;	/* Times compressed
>> video has attempted to be sent to the HW
>> 					 * but the input FIFO was
>> full. (reported by DIL) */
>>
>> 	uint32_t	PIBMissCount;	/* Amount of times a PIB
>> is invalid. (reported by DIL) */
>>
>> 	uint32_t	cpbEmptySize;	/* supported only for
>> H.264, specifically changed for
>> 					 * SingleThreadedAppMode.
>> Report size of CPB buffer available.
>> 					 * Reported by DIL */
>>
>> and looking at the output of
>>
>> grep -nir sleep /usr/local/src/crystalhd
>>
>> shows a similar "design", so further "analysis" may result in that
>> I'm possibly getting this issue on from the wrong end. I fear we've
>> to port the Broadcom driver to the new staging kernel driver and
>> possibly rewrite/optimize libcrystalhd3, too,
>
> Honestly, it's a little hard for me to understand all the various
> things you've tried and the results of your experiments, but at least
> you're looking at the right kind of areas to try and make the driver
> work more reasonably.

I've got lost in it myself already^^, but I'm on the way to more reasoned "trial+error".

Target 1: Transcoding performance
Target 2: Compromise or switch beetwen transcoding and mplayer etc playback optimization.

>
>> If someone wants more details on my workflow, tell my how to get the
>> redo/undo history from gedit, I'm not really "used" to git yet.
>>
>> Is something like pthread_cond_timedwait() or more efficient thread
>> control supported in FFmpeg?
>
> You can instantiate threads and try to do work on them, but the codec
> API is synchronous - a single call passes in an input packet and expects
> an output picture to be returned. Your wiggle room is limited to picture
> reordering and non 1:1 due to interlacing. I'm pretty sure this means
> that any attempt at multi-threading will not really lead very far
> because you'll have to synchronize very aggressively, and can't do what
> you really want.

OK, and when we have to be synchronous, buffer "pipes" are hard to sync.

>
>> My current "MPlayer/Bino/ffmpeg, etc, compromise" version ist
>> attached, intended mismatched patch header so noone (auto-)applies it
>> accidently.
>
> I'll try it out but I don't expect I'll have much to say any time soon,
> given my other commitments.
>

Forget it, pure trash. Last version attached.

>> 4.
>> Is the LGPL issue still resolved? But never for this kernel driver
>> (besides possible "quality" and coding style issues for kernel
>> maintainers?)?
>
> The LGPL issue is resolved. Broadcom agreed to adjust the licence header
> and I replaced the code I wanted to copy from them with calls into
> existing ffmpeg code.
>
> Thanks,
>
> --phil
>

I've to thank You,
y
tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: crystalhd-latencyopt-experimental.02.schorpp.patch
Type: text/x-diff
Size: 8000 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130131/47fd4b86/attachment.bin>


More information about the ffmpeg-devel mailing list