[FFmpeg-devel] [PATCH 1/6] Frame-based multithreading framework using pthreads
Alexander Strange
astrange
Fri Jan 21 12:30:45 CET 2011
On Jan 21, 2011, at 5:29 AM, Alexander Strange wrote:
>
> On Mon, Dec 27, 2010 at 10:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Mon, Nov 15, 2010 at 08:37:01AM -0500, Alexander Strange wrote:
>>> See doc/multithreading.txt for details on use in codecs.
>>> ---
>>> doc/APIchanges | 5 +
>>> doc/multithreading.txt | 63 +++++
>>> libavcodec/avcodec.h | 80 ++++++-
>>> libavcodec/options.c | 4 +
>>> libavcodec/pthread.c | 656 +++++++++++++++++++++++++++++++++++++++++++++++-
>>> libavcodec/thread.h | 98 +++++++
>>> libavcodec/utils.c | 64 +++++-
>>> libavcodec/w32thread.c | 6 +
>>> libavformat/utils.c | 5 +
>>> libavutil/internal.h | 11 +
>>> 10 files changed, 979 insertions(+), 13 deletions(-)
>>> create mode 100644 doc/multithreading.txt
>>> create mode 100644 libavcodec/thread.h
>>>
>>> diff --git a/doc/APIchanges b/doc/APIchanges
>>> index b6806f8..c8892d9 100644
>>> --- a/doc/APIchanges
>>> +++ b/doc/APIchanges
>>> @@ -13,6 +13,11 @@ libavutil: 2009-03-08
>>>
>>> API changes, most recent first:
>>>
>>> +2010-11-XX - rX - lavc 52.xx.0 - threading API
>>> + Add CODEC_CAP_FRAME_THREADS with new restrictions on get_buffer()/
>>> + release_buffer()/draw_horiz_band() callbacks for appropriate codecs.
>>> + Add thread_type and active_thread_type fields to AVCodecContext.
>>> +
>>> 2010-11-13 - r25745 - lavc 52.95.0 - AVCodecContext
>>> Add AVCodecContext.subtitle_header and AVCodecContext.subtitle_header_size
>>> fields.
>>> diff --git a/doc/multithreading.txt b/doc/multithreading.txt
>>> new file mode 100644
>>> index 0000000..65bdfb1
>>> --- /dev/null
>>> +++ b/doc/multithreading.txt
>>> @@ -0,0 +1,63 @@
>>> +FFmpeg multithreading methods
>>> +==============================================
>>> +
>>> +FFmpeg provides two methods for multithreading codecs.
>>> +
>>> +Slice threading decodes multiple parts of a frame at the same time, using
>>> +AVCodecContext execute() and execute2().
>>> +
>>> +Frame threading decodes multiple frames at the same time.
>>> +It accepts N future frames and delays decoded pictures by N-1 frames.
>>> +The later frames are decoded in separate threads while the user is
>>> +displaying the current one.
>>> +
>>> +Restrictions on clients
>>> +==============================================
>>> +
>>> +Slice threading -
>>> +* The client's draw_horiz_band() must be thread-safe according to the comment
>>> + in avcodec.h.
>>> +
>>> +Frame threading -
>>> +* Restrictions with slice threading also apply.
>>> +* The client's get_buffer() and release_buffer() must be thread-safe as well.
>>> +* There is one frame of delay added for every thread beyond the first one.
>>> + Clients using dts must account for the delay; pts sent through reordered_opaque
>>> + will work as usual.
>>> +
>>> +Restrictions on codec implementations
>>> +==============================================
>>> +
>>> +Slice threading -
>>> + None except that there must be something worth executing in parallel.
>>> +
>>> +Frame threading -
>>> +* Codecs can only accept entire pictures per packet.
>>> +* Codecs similar to ffv1, whose streams don't reset across frames,
>>> + will not work because their bitstreams cannot be decoded in parallel.
>>> +
>>
>>> +* The contents of buffers must not be read before ff_thread_await_progress()
>>> + has been called on them. reget_buffer() and buffer age optimizations no longer work.
>>
>> why does the buffer age optimization not work anymore?
>> if a MB is skiped in 100 frames and you have 10 frame decoding threads then
>> the optim should allow some skiping still. And actually this should be easy
>> And it does happen in reality (the black bars on top and bottom of films)
>
> The problem is that each decoding thread has its own copy of the AVCodecContext, and when get_buffer() is called it uses that copy. So they end up with their own internal_buffer, and the ages don't make sense when the AVFrames travel across threads.
>
> This can be fixed by sharing the same internal_buffer between threads. But I tried it and everything broke; I spent too much time looking at it so I've punted it into a branch:
>
> http://gitorious.org/~astrange/ffmpeg/ffmpeg-mt/commit/e5622e99c9608e612e9f386c1711d76ce08268e1
>
> The nice thing is that mt adds so much speed (clock time) than skipping that I found it didn't really matter on mpeg4 I tried, so I think this can be put off. VP3 and H264 don't use it at all.
>
> The new patch works with mplayer dr + custom get_buffer() by default, as it calls get_buffer() on the client's thread by default.
> Setting thread_safe_callbacks restores the old behavior and ffplay uses this.
> !thread_safe_callbacks is slower, and dr is not any faster than !dr for me, so it can probably improve a little by disabling dr.
>
>> [...]
>>> @@ -1203,7 +1219,8 @@ typedef struct AVCodecContext {
>>> * If non NULL, 'draw_horiz_band' is called by the libavcodec
>>> * decoder to draw a horizontal band. It improves cache usage. Not
>>> * all codecs can do that. You must check the codec capabilities
>>> - * beforehand.
>>> + * beforehand. May be called by different threads at the same time,
>>> + * so implementations must be reentrant.
>>
>> thread safe or reentrant ?
>> for many video filters, slice N has to be done before N+1 starts, and they have
>> to be called in order even if you mix calls from several frames.
>
> Unfortunately the patch doesn't enforce this. I looked at it and I believe this was already not enforced (just decode mpeg2 with threads on and it will be called out of order), but nothing ever seemed to use draw_horiz_band() with 4:2:0 video, so nobody noticed.
>
> Wrote a longer comment about what you can expect. (which is that anything at all can happen)
>
> Note that playing 4:2:2 Theora video with this patch series will break in mplayer because of this, but it's better to turn it off in mplayer than to fix it in here IMO.
>
>>>
>>> * The function is also used by hardware acceleration APIs.
>>> * It is called at least once during frame decoding to pass
>>> * the data needed for hardware render.
>>> @@ -1457,6 +1474,9 @@ typedef struct AVCodecContext {
>>> * if CODEC_CAP_DR1 is not set then get_buffer() must call
>>> * avcodec_default_get_buffer() instead of providing buffers allocated by
>>> * some other means.
>>> + * May be called from a different thread if thread_type==FF_THREAD_FRAME
>>> + * is set, but not by more than one thread at once, so does not need to be
>>> + * reentrant.
>>
>> isnt it thread_type&FF_THREAD_FRAME ?
>
> Yes.
>
>> and it can be called from more than one thread at once i think
>
> There's a mutex around it (buffer_mutex) so it can't be.
>
>> [...]
>>> @@ -2811,6 +2863,26 @@ typedef struct AVCodec {
>>> const int64_t *channel_layouts; ///< array of support channel layouts, or NULL if unknown. array is terminated by 0
>>> uint8_t max_lowres; ///< maximum value for lowres supported by the decoder
>>> AVClass *priv_class; ///< AVClass for the private context
>>> +
>>> + /**
>>> + * @defgroup framethreading Frame-level threading support functions.
>>> + * @{
>>> + */
>>> + /**
>>> + * If defined, called on thread contexts when they are created.
>>> + * If the codec allocates writable tables in init(), re-allocate them here.
>>> + * priv_data will be set to a copy of the original.
>>> + */
>>> + int (*init_thread_copy)(AVCodecContext *);
>>
>>> + /**
>>> + * Copy necessary context variables from a previous thread context to the current one.
>>> + * If not defined, the next thread will start automatically; otherwise, the codec
>>> + * must call ff_thread_finish_setup().
>>> + *
>>> + * dst and src will (rarely) point to the same context, in which case memcpy should be skipped.
>>> + */
>>> + int (*update_thread_context)(AVCodecContext *dst, AVCodecContext *src);
>>
>> src could be const
>
> Done.
>
>>
>>
>> [...]
>>> +int ff_thread_decode_frame(AVCodecContext *avctx,
>>> + AVFrame *picture, int *got_picture_ptr,
>>> + AVPacket *avpkt)
>>> +{
>>> + FrameThreadContext *fctx = avctx->thread_opaque;
>>> + int finished = fctx->next_finished;
>>> + PerThreadContext *p;
>>> + int err;
>>> +
>>> + /*
>>> + * Submit a frame to the next decoding thread.
>>> + */
>>> +
>>> + p = &fctx->threads[fctx->next_decoding];
>>> + update_context_from_user(p->avctx, avctx);
>>> + err = submit_frame(p, avpkt);
>>> + if (err) return err;
>>> +
>>> + fctx->next_decoding++;
>>> +
>>> + /*
>>> + * If we're still receiving the initial frames, don't return a picture.
>>> + */
>>> +
>>> + if (fctx->delaying && avpkt->size) {
>>> + if (fctx->next_decoding >= (avctx->thread_count-1)) fctx->delaying = 0;
>>> +
>>> + *got_picture_ptr=0;
>>> + return 0;
>>> + }
>>> +
>>> + /*
>>> + * Return the next available picture from the oldest thread.
>>> + * If we're at the end of the stream, then we have to skip threads that
>>> + * didn't output a picture, because we don't want to accidentally signal
>>> + * EOF (avpkt->size == 0 && *got_picture_ptr == 0).
>>> + */
>>> +
>>> + do {
>>> + p = &fctx->threads[finished++];
>>> +
>>> + if (p->state != STATE_INPUT_READY) {
>>> + pthread_mutex_lock(&p->progress_mutex);
>>> + while (p->state != STATE_INPUT_READY)
>>> + pthread_cond_wait(&p->output_cond, &p->progress_mutex);
>>> + pthread_mutex_unlock(&p->progress_mutex);
>>> + }
>>> +
>>> + *picture = p->picture;
>>> + *got_picture_ptr = p->got_picture;
>>> +
>>> + avcodec_get_frame_defaults(&p->picture);
>>> + p->got_picture = 0;
>>> +
>>> + if (finished >= avctx->thread_count) finished = 0;
>>> + } while (!avpkt->size && !*got_picture_ptr && finished != fctx->next_finished);
>>> +
>>> + update_context_from_thread(avctx, p->avctx, 1);
>>> +
>>> + if (fctx->next_decoding >= avctx->thread_count) fctx->next_decoding = 0;
>>> +
>>> + fctx->next_finished = finished;
>>> +
>>> + return p->result;
>>> +}
>>
>> I think this design has some issues.
>> 1. (minor) I dont see why you dont return frames when they are ready at the
>> begin
>> 2. Frames can have variable duration, if the picture doesnt change. A timespan
>> of 2 seconds without changes is quite realistic, 32 frames make one minute
>> delay, this will kill AV sync with some players and could also underflow
>> video buffers if they are time based (iam thinking of RT*P here).
>> Thus i think frames should be returned when they are ready
>
>
>
>> 3. This code blocks when no frame is available to be returned, thats ok but
>> we should try to also support a non blocking EAGAIN returning form,
>> its usefull for players that try to also do other things in the decoding
>> thread and dont want to be blocked for half a second in unlucky cases
>
>
>
>
>> [...]
>>> + /*
>>> + * Buffer age is difficult to keep track of between
>>> + * multiple threads, and the optimizations it allows
>>> + * are not worth the effort. It is disabled for now.
>>> + */
>>> + f->age = INT_MAX;
>>
>> I see nothing hard in relation to age and threads.
>> Please elaborate what is difficult
>
> Commented up there.
>
>> [...]
>>> +/**
>>> + * Set the threading algorithms used.
>>> + *
>>> + * Threading requires more than one thread.
>>> + * Frame threading requires entire frames to be passed to the codec,
>>> + * and is incompatible with low_delay.
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +static void validate_thread_parameters(AVCodecContext *avctx)
>>> +{
>>> + int frame_threading_supported = (avctx->codec->capabilities & CODEC_CAP_FRAME_THREADS)
>>> + && !(avctx->flags & CODEC_FLAG_TRUNCATED)
>>> + && !(avctx->flags & CODEC_FLAG_LOW_DELAY)
>>> + && !(avctx->flags2 & CODEC_FLAG2_CHUNKS);
>>> + if (avctx->thread_count == 1)
>>> + avctx->active_thread_type = 0;
>>> + else if (frame_threading_supported && (avctx->thread_type & FF_THREAD_FRAME))
>>> + avctx->active_thread_type = FF_THREAD_FRAME;
>>> + else
>>> + avctx->active_thread_type = FF_THREAD_SLICE;
>>
>> style nitpick if{}else if{}else
>
> Done.
>
>
>> [...]
>>> +/**
>>> + * Waits for decoding threads to finish and resets internal
>>> + * state. Called by avcodec_flush_buffers().
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +void ff_thread_flush(AVCodecContext *avctx);
>>> +
>>> +/**
>>> + * Submits a new frame to a decoding thread.
>>> + * Returns the next available frame in picture. *got_picture_ptr
>>> + * will be 0 if none is available.
>>> + *
>>> + * Parameters are the same as avcodec_decode_video2().
>>> + */
>>> +int ff_thread_decode_frame(AVCodecContext *avctx, AVFrame *picture,
>>> + int *got_picture_ptr, AVPacket *avpkt);
>>> +
>>> +/**
>>> + * Codecs which define update_thread_context should call this
>>> + * when they are ready for the next thread to start decoding
>>> + * the next frame. After calling it, do not change any variables
>>> + * read by the update_thread_context method.
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +void ff_thread_finish_setup(AVCodecContext *avctx);
>>> +
>>
>>
>>
>>> +/**
>>> + * Call this when some part of the picture is finished decoding.
>>> + * Later calls with lower values of progress have no effect.
>>> + *
>>> + * @param f The picture being decoded.
>>> + * @param progress Value, in arbitrary units, of how much of the picture has decoded.
>>> + * @param field The field being decoded, for field-picture codecs.
>>> + * 0 for top field or frame pictures, 1 for bottom field.
>>> + */
>>> +void ff_thread_report_progress(AVFrame *f, int progress, int field);
>>> +
>>> +/**
>>> + * Call this before accessing some part of a picture.
>>> + *
>>> + * @param f The picture being referenced.
>>> + * @param progress Value, in arbitrary units, to wait for.
>>> + * @param field The field being referenced, for field-picture codecs.
>>> + * 0 for top field or frame pictures, 1 for bottom field.
>>> + */
>>> +void ff_thread_await_progress(AVFrame *f, int progress, int field);
>>> +
>>> +/**
>>> + * Call this function instead of avctx->get_buffer(f).
>>> + *
>>> + * @param avctx The current context.
>>> + * @param f The frame to write into.
>>> + */
>>> +int ff_thread_get_buffer(AVCodecContext *avctx, AVFrame *f);
>>> +
>>> +/**
>>> + * Call this function instead of avctx->release_buffer(f).
>>> + *
>>> + * @param avctx The current context.
>>> + * @param f The picture being released.
>>> + */
>>> +void ff_thread_release_buffer(AVCodecContext *avctx, AVFrame *f);
>>
>> You should document what the functions "do" not when to call them.
>> Well you should document when to call them too but what they do is more
>> important IMHO. And i mean from a high level view not low level implementation
>> dox of course
>
> Wrote more comments.
>
>>>
>>> +
>>> +#endif /* AVCODEC_THREAD_H */
>>> diff --git a/libavcodec/utils.c b/libavcodec/utils.c
>>> index 3875140..c931484 100644
>>> --- a/libavcodec/utils.c
>>> +++ b/libavcodec/utils.c
>>> @@ -36,6 +36,7 @@
>>> #include "dsputil.h"
>>> #include "libavutil/opt.h"
>>> #include "imgconvert.h"
>>> +#include "thread.h"
>>> #include "audioconvert.h"
>>> #include "internal.h"
>>> #include <stdlib.h>
>>> @@ -256,6 +257,11 @@ int avcodec_default_get_buffer(AVCodecContext *s, AVFrame *pic){
>>> (*picture_number)++;
>>>
>>> if(buf->base[0] && (buf->width != w || buf->height != h || buf->pix_fmt != s->pix_fmt)){
>>> + if(s->active_thread_type&FF_THREAD_FRAME) {
>>> + av_log_missing_feature(s, "Width/height changing with frame threads is", 0);
>>> + return -1;
>>> + }
>>> +
>>> for(i=0; i<4; i++){
>>> av_freep(&buf->base[i]);
>>> buf->data[i]= NULL;
>>> @@ -517,18 +523,29 @@ int attribute_align_arg avcodec_open(AVCodecContext *avctx, AVCodec *codec)
>>> goto free_and_end;
>>> }
>>> avctx->frame_number = 0;
>>> +
>>> + if (HAVE_THREADS && !avctx->thread_opaque) {
>>> + ret = avcodec_thread_init(avctx, avctx->thread_count);
>>> + if (ret < 0) {
>>> + goto free_and_end;
>>> + }
>>> + }
>>> +
>>> if (avctx->codec->max_lowres < avctx->lowres) {
>>> av_log(avctx, AV_LOG_ERROR, "The maximum value for lowres supported by the decoder is %d\n",
>>> avctx->codec->max_lowres);
>>> goto free_and_end;
>>> }
>>>
>>> - if(avctx->codec->init){
>>> - ret = avctx->codec->init(avctx);
>>> - if (ret < 0) {
>>> - goto free_and_end;
>>> + if(avctx->codec->init && !(avctx->active_thread_type&FF_THREAD_FRAME)){
>>> + if(avctx->codec->init){
>>> + ret = avctx->codec->init(avctx);
>>> + if (ret < 0) {
>>> + goto free_and_end;
>>> + }
>>> }
>>
>>> }
>>> +
>>> ret=0;
>>
>> cosmetic
>
> Removed. (there was a funny merge glitch there too)
>
>>
>>
>> [...]
>>> @@ -737,7 +757,7 @@ av_cold int avcodec_close(AVCodecContext *avctx)
>>>
>>> if (HAVE_THREADS && avctx->thread_opaque)
>>> avcodec_thread_free(avctx);
>>> - if (avctx->codec && avctx->codec->close)
>>> + if (avctx->codec && avctx->codec->close && !(avctx->active_thread_type&FF_THREAD_FRAME))
>>> avctx->codec->close(avctx);
>>> avcodec_default_free_buffers(avctx);
>>> avctx->coded_frame = NULL;
>>
>> Moving init/close elsewhere is a bit ugly, is this unavoidable?
>
> close: avoidable, removed.
> init: not avoidable, I just rediscovered why it was and then forgot because it's 5 AM.
>
> If it is avoidable, the change would involve more than one line.
>
>
>>>
>>> @@ -745,6 +765,7 @@ av_cold int avcodec_close(AVCodecContext *avctx)
>>> if(avctx->codec && avctx->codec->encode)
>>> av_freep(&avctx->extradata);
>>> avctx->codec = NULL;
>>> + avctx->active_thread_type = 0;
>>> entangled_thread_counter--;
>>>
>>> /* Release any user-supplied mutex. */
>>> @@ -985,6 +1006,8 @@ void avcodec_init(void)
>>>
>>> void avcodec_flush_buffers(AVCodecContext *avctx)
>>> {
>>> + if(HAVE_PTHREADS && avctx->active_thread_type&FF_THREAD_FRAME)
>>> + ff_thread_flush(avctx);
>>> if(avctx->codec->flush)
>>> avctx->codec->flush(avctx);
>>> }
>>> @@ -1185,3 +1208,30 @@ unsigned int ff_toupper4(unsigned int x)
>>> + (toupper((x>>16)&0xFF)<<16)
>>> + (toupper((x>>24)&0xFF)<<24);
>>> }
>>> +
>> [...]
>>
>>> diff --git a/libavformat/utils.c b/libavformat/utils.c
>>> index 8fcbe96..0b5d89c 100644
>>> --- a/libavformat/utils.c
>>> +++ b/libavformat/utils.c
>>> @@ -913,6 +913,11 @@ static void compute_pkt_fields(AVFormatContext *s, AVStream *st,
>>> /* do we have a video B-frame ? */
>>> delay= st->codec->has_b_frames;
>>> presentation_delayed = 0;
>>> +
>>> + // ignore delay caused by frame threading
>>> + if (delay && st->codec->active_thread_type&FF_THREAD_FRAME)
>>> + delay -= st->codec->thread_count-1;
>>> +
>>> /* XXX: need has_b_frame, but cannot get it if the codec is
>>> not initialized */
>>> if (delay &&
>>
>> This looks wrong
>> Please elaborate on what you are trying to do here
>
> I removed the change and everything still appears to work.
> I think it may have been fixed by using pkt_dts??
>
> The problem before was that threading increased has_b_frames, which caused it to print the "invalid dts/pts combination" warning on codecs that had dts==pts all the time like vp3. But I don't see it now.
Turns out it only happened on some ogg files. big_buck_bunny_720p_stereo.ogg still has the problem, whereas controlled_60.ogv didn't. Added it back with slightly longer comment.
This is only used to avoid the dts/pts combination warning, not anything else in that function, so the warning could be turned off instead. But I do think setting has_b_frames is the right thing for threading to do.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Frame-based-multithreading-framework-using-pthreads.patch
Type: application/octet-stream
Size: 47606 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110121/2253412f/attachment.obj>
-------------- next part --------------
More information about the ffmpeg-devel
mailing list