[FFmpeg-devel] [PATCH 1/6] Frame-based multithreading framework using pthreads
Alexander Strange
astrange
Fri Jan 21 11:51:34 CET 2011
On Mon, Dec 27, 2010 at 10:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Nov 15, 2010 at 08:37:01AM -0500, Alexander Strange wrote:
>> See doc/multithreading.txt for details on use in codecs.
>> ---
>> doc/APIchanges | 5 +
>> doc/multithreading.txt | 63 +++++
>> libavcodec/avcodec.h | 80 ++++++-
>> libavcodec/options.c | 4 +
>> libavcodec/pthread.c | 656 +++++++++++++++++++++++++++++++++++++++++++++++-
>> libavcodec/thread.h | 98 +++++++
>> libavcodec/utils.c | 64 +++++-
>> libavcodec/w32thread.c | 6 +
>> libavformat/utils.c | 5 +
>> libavutil/internal.h | 11 +
>> 10 files changed, 979 insertions(+), 13 deletions(-)
>> create mode 100644 doc/multithreading.txt
>> create mode 100644 libavcodec/thread.h
>>
>> diff --git a/doc/APIchanges b/doc/APIchanges
>> index b6806f8..c8892d9 100644
>> --- a/doc/APIchanges
>> +++ b/doc/APIchanges
>> @@ -13,6 +13,11 @@ libavutil: 2009-03-08
>>
>> API changes, most recent first:
>>
>> +2010-11-XX - rX - lavc 52.xx.0 - threading API
>> + Add CODEC_CAP_FRAME_THREADS with new restrictions on get_buffer()/
>> + release_buffer()/draw_horiz_band() callbacks for appropriate codecs.
>> + Add thread_type and active_thread_type fields to AVCodecContext.
>> +
>> 2010-11-13 - r25745 - lavc 52.95.0 - AVCodecContext
>> Add AVCodecContext.subtitle_header and AVCodecContext.subtitle_header_size
>> fields.
>> diff --git a/doc/multithreading.txt b/doc/multithreading.txt
>> new file mode 100644
>> index 0000000..65bdfb1
>> --- /dev/null
>> +++ b/doc/multithreading.txt
>> @@ -0,0 +1,63 @@
>> +FFmpeg multithreading methods
>> +==============================================
>> +
>> +FFmpeg provides two methods for multithreading codecs.
>> +
>> +Slice threading decodes multiple parts of a frame at the same time, using
>> +AVCodecContext execute() and execute2().
>> +
>> +Frame threading decodes multiple frames at the same time.
>> +It accepts N future frames and delays decoded pictures by N-1 frames.
>> +The later frames are decoded in separate threads while the user is
>> +displaying the current one.
>> +
>> +Restrictions on clients
>> +==============================================
>> +
>> +Slice threading -
>> +* The client's draw_horiz_band() must be thread-safe according to the comment
>> + in avcodec.h.
>> +
>> +Frame threading -
>> +* Restrictions with slice threading also apply.
>> +* The client's get_buffer() and release_buffer() must be thread-safe as well.
>> +* There is one frame of delay added for every thread beyond the first one.
>> + Clients using dts must account for the delay; pts sent through reordered_opaque
>> + will work as usual.
>> +
>> +Restrictions on codec implementations
>> +==============================================
>> +
>> +Slice threading -
>> + None except that there must be something worth executing in parallel.
>> +
>> +Frame threading -
>> +* Codecs can only accept entire pictures per packet.
>> +* Codecs similar to ffv1, whose streams don't reset across frames,
>> + will not work because their bitstreams cannot be decoded in parallel.
>> +
>
>> +* The contents of buffers must not be read before ff_thread_await_progress()
>> + has been called on them. reget_buffer() and buffer age optimizations no longer work.
>
> why does the buffer age optimization not work anymore?
> if a MB is skiped in 100 frames and you have 10 frame decoding threads then
> the optim should allow some skiping still. And actually this should be easy
> And it does happen in reality (the black bars on top and bottom of films)
>
> [...]
>> @@ -1203,7 +1219,8 @@ typedef struct AVCodecContext {
>> * If non NULL, 'draw_horiz_band' is called by the libavcodec
>> * decoder to draw a horizontal band. It improves cache usage. Not
>> * all codecs can do that. You must check the codec capabilities
>> - * beforehand.
>> + * beforehand. May be called by different threads at the same time,
>> + * so implementations must be reentrant.
>
> thread safe or reentrant ?
> for many video filters, slice N has to be done before N+1 starts, and they have
> to be called in order even if you mix calls from several frames.
>
>
>> * The function is also used by hardware acceleration APIs.
>> * It is called at least once during frame decoding to pass
>> * the data needed for hardware render.
>> @@ -1457,6 +1474,9 @@ typedef struct AVCodecContext {
>> * if CODEC_CAP_DR1 is not set then get_buffer() must call
>> * avcodec_default_get_buffer() instead of providing buffers allocated by
>> * some other means.
>> + * May be called from a different thread if thread_type==FF_THREAD_FRAME
>> + * is set, but not by more than one thread at once, so does not need to be
>> + * reentrant.
>
> isnt it thread_type&FF_THREAD_FRAME ?
> and it can be called from more than one thread at once i think
>
>
> [...]
>> @@ -2811,6 +2863,26 @@ typedef struct AVCodec {
>> const int64_t *channel_layouts; ///< array of support channel layouts, or NULL if unknown. array is terminated by 0
>> uint8_t max_lowres; ///< maximum value for lowres supported by the decoder
>> AVClass *priv_class; ///< AVClass for the private context
>> +
>> + /**
>> + * @defgroup framethreading Frame-level threading support functions.
>> + * @{
>> + */
>> + /**
>> + * If defined, called on thread contexts when they are created.
>> + * If the codec allocates writable tables in init(), re-allocate them here.
>> + * priv_data will be set to a copy of the original.
>> + */
>> + int (*init_thread_copy)(AVCodecContext *);
>
>> + /**
>> + * Copy necessary context variables from a previous thread context to the current one.
>> + * If not defined, the next thread will start automatically; otherwise, the codec
>> + * must call ff_thread_finish_setup().
>> + *
>> + * dst and src will (rarely) point to the same context, in which case memcpy should be skipped.
>> + */
>> + int (*update_thread_context)(AVCodecContext *dst, AVCodecContext *src);
>
> src could be const
>
>
> [...]
>> +int ff_thread_decode_frame(AVCodecContext *avctx,
>> + AVFrame *picture, int *got_picture_ptr,
>> + AVPacket *avpkt)
>> +{
>> + FrameThreadContext *fctx = avctx->thread_opaque;
>> + int finished = fctx->next_finished;
>> + PerThreadContext *p;
>> + int err;
>> +
>> + /*
>> + * Submit a frame to the next decoding thread.
>> + */
>> +
>> + p = &fctx->threads[fctx->next_decoding];
>> + update_context_from_user(p->avctx, avctx);
>> + err = submit_frame(p, avpkt);
>> + if (err) return err;
>> +
>> + fctx->next_decoding++;
>> +
>> + /*
>> + * If we're still receiving the initial frames, don't return a picture.
>> + */
>> +
>> + if (fctx->delaying && avpkt->size) {
>> + if (fctx->next_decoding >= (avctx->thread_count-1)) fctx->delaying = 0;
>> +
>> + *got_picture_ptr=0;
>> + return 0;
>> + }
>> +
>> + /*
>> + * Return the next available picture from the oldest thread.
>> + * If we're at the end of the stream, then we have to skip threads that
>> + * didn't output a picture, because we don't want to accidentally signal
>> + * EOF (avpkt->size == 0 && *got_picture_ptr == 0).
>> + */
>> +
>> + do {
>> + p = &fctx->threads[finished++];
>> +
>> + if (p->state != STATE_INPUT_READY) {
>> + pthread_mutex_lock(&p->progress_mutex);
>> + while (p->state != STATE_INPUT_READY)
>> + pthread_cond_wait(&p->output_cond, &p->progress_mutex);
>> + pthread_mutex_unlock(&p->progress_mutex);
>> + }
>> +
>> + *picture = p->picture;
>> + *got_picture_ptr = p->got_picture;
>> +
>> + avcodec_get_frame_defaults(&p->picture);
>> + p->got_picture = 0;
>> +
>> + if (finished >= avctx->thread_count) finished = 0;
>> + } while (!avpkt->size && !*got_picture_ptr && finished != fctx->next_finished);
>> +
>> + update_context_from_thread(avctx, p->avctx, 1);
>> +
>> + if (fctx->next_decoding >= avctx->thread_count) fctx->next_decoding = 0;
>> +
>> + fctx->next_finished = finished;
>> +
>> + return p->result;
>> +}
>
> I think this design has some issues.
> 1. (minor) I dont see why you dont return frames when they are ready at the
> begin
> 2. Frames can have variable duration, if the picture doesnt change. A timespan
> of 2 seconds without changes is quite realistic, 32 frames make one minute
> delay, this will kill AV sync with some players and could also underflow
> video buffers if they are time based (iam thinking of RT*P here).
> Thus i think frames should be returned when they are ready
> 3. This code blocks when no frame is available to be returned, thats ok but
> we should try to also support a non blocking EAGAIN returning form,
> its usefull for players that try to also do other things in the decoding
> thread and dont want to be blocked for half a second in unlucky cases
Forgot to reply to this part.
It really simplified development to do it like this, because it made the frame delay reliable and made sure all the threads allocated got started. When you're decoding stuff read off disk, multiple calls to avcodec_decode_video are practically instantaneous, so it doesn't help speed to return things faster. The only case I can think of where less decode delay _really_ helps is where it receives new encoded frames rarely.
And of course the important part is how fast it receives the encoded frames, not how fast they're displayed, so I'm not sure of any cases where this happens - do streams transmit that slowly?
It would be possible to do EAGAIN but I think it'd almost always return something, most frames take <1/30 sec to decode.
It's also possible to request decoded frames without sending new ones in the current API, just by not passing a packet, and I think it'd be good to do this for codecs even without threading on top. But I think most codecs destroy their internal state when you do this. So fixing them is a new project?
More information about the ffmpeg-devel
mailing list