[FFmpeg-devel] [PATCH] Generic part of frame multithreading
Michael Niedermayer
michaelni
Tue Aug 26 17:26:49 CEST 2008
On Tue, Aug 26, 2008 at 03:23:06AM -0400, Alexander Strange wrote:
>
> On Aug 20, 2008, at 2:26 PM, Michael Niedermayer wrote:
>
>> On Tue, Aug 19, 2008 at 10:55:21PM -0400, Alexander Strange wrote:
[...]
>> * your decoder returns "no frames" while it actually still has frames
>> at EOF
>> -> If i understand this correctly its a bug in the affected decoder ...
>
> In this case, it returns whatever the codec's decode() would have returned
> 3 frames ago (assuming 3 threads). Since h264 returns "no picture" if
> passed a top field, it returns that. I think "no picture" isn't really the
> same thing as "no frames", and if the threading dropped "no picture"
> results from the codec it might confuse clients.
Frame threading can already confuse clients and clearly breaks the API/ABI
by introducing extra delay.
For example current API _requires_ DTS from the demuxer to match the
timestamps of each outputted frame from the decoder.
a quick grep shows no match for "dts" in your patch so it seems this hasnt
been considered or documented at all.
If decode_video would take AVPackets and AVFrame has a AVPacket field then
this may be trivial to solve ...
[...]
>>
>>
>> [...]
>>> Index: libavcodec/w32thread.c
>>> ===================================================================
>>> --- libavcodec/w32thread.c (revision 14856)
>>> +++ libavcodec/w32thread.c (working copy)
>>> @@ -104,7 +104,10 @@
>>> ThreadContext *c;
>>> uint32_t threadid;
>>>
>>> + if(s->thread_opaque) return 0;
>>> +
>>
>> why is this needed?
>
> avcodec_thread_init needs to see the codec capabilities field, so I call it
> from avcodec_open. This means it's called twice, so it needs to handle
sounds like a hack ....
> that. Maybe this could be removed when slice+frame threading is supported,
> but I've never understood why clients need to call avcodec_thread_init
> instead of just setting thread_count.
Feel free to change it but dont hack it please!
Especially not with something like
if(opaque) return 0
and no comment at all, why such hack would be needed and be unavoidable
>
>>> s->thread_count= thread_count;
>>> + s->thread_algorithm= FF_THREAD_MULTISLICE;
>>
>> i suspect this should rather be a if( != ) error, at least when the user
>> explicitly requested frame threading
>
> Done.
>
>> [...]
>>> @@ -847,7 +852,17 @@
>>> avctx->codec = codec;
>>> avctx->codec_id = codec->id;
>>> avctx->frame_number = 0;
>>> - if(avctx->codec->init){
>>> +
>>> + if (ENABLE_THREADS) {
>>> + ret = avcodec_thread_init(avctx, avctx->thread_count);
>>> + if (ret < 0) {
>>> + av_freep(&avctx->priv_data);
>>> + avctx->codec= NULL;
>>> + goto end;
>>> + }
>>> + }
>>> +
>>
>>> + if(avctx->codec->init && !USE_FRAME_THREADING(avctx)){
>>
>> iam against the use of weird macros, exceptions to this are only
>> when it improves readablity or speed. (it cannot improve speed in
>> functions that are irrelevant to speed)
>> And considering that this macro is not defined anywhere or a simple
>> grep is not enough to find it, as is it significnatly reduces
>> readablity as well.
>
> This expands to (ENABLE_THREADS && avctx->active_thread_algorithm ==
> FF_THREAD_MULTIFRAME). The first part isn't needed, but leads to a very
> small optimization since we can have any codec threading code under if (0)
> when threads aren't compiled in. This is only about 3% size reduction for
> h264.o, so it's not really important - the only real reason is because the
> real code is a bit long. I think the name of this macro is pretty clear,
> but I can remove it if you want.
iam somewhat against the macro you can write
if(ENABLE_THREADS && avctx->frame_threads)
its almost as terse and much cleaner
It would even allow to control the exact amount of frame and slice threading
though thats seperate of course and not critical ...
>
>> [...]
>>> @@ -980,6 +998,7 @@
>>>
>>> int avcodec_close(AVCodecContext *avctx)
>>> {
>>> + int threaded = USE_FRAME_THREADING(avctx);
>>> entangled_thread_counter++;
>>> if(entangled_thread_counter != 1){
>>> av_log(avctx, AV_LOG_ERROR, "insufficient thread locking around
>>> avcodec_open/close()\n");
>>> @@ -989,7 +1008,7 @@
>>>
>>> if (ENABLE_THREADS && avctx->thread_opaque)
>>> avcodec_thread_free(avctx);
>>> - if (avctx->codec->close)
>>> + if (avctx->codec->close && !threaded)
>>> avctx->codec->close(avctx);
>>> avcodec_default_free_buffers(avctx);
>>> av_freep(&avctx->priv_data);
>>
>> why?
>
> codec close has to be called inside frame thread close. If it's called
> before, another thread might be accessing it; if it's called after, it
> might call some threading support function after the threading context has
> been freed. This is avoidable, but I think the current way is simplest.
The current way is ugly, but when the alternaives are less simple then lets
leave it for now, maybe i can think of some solution later ...
>
>> and the the threaded var is redundant
>
> It was needed there (not obvious because of the missing header); I
> simplified it.
>
>>
>>> @@ -1251,7 +1270,9 @@
>>>
>>> void avcodec_flush_buffers(AVCodecContext *avctx)
>>> {
>>> - if(avctx->codec->flush)
>>> + if(USE_FRAME_THREADING(avctx))
>>> + ff_frame_thread_flush(avctx);
>>> + else if(avctx->codec->flush)
>>> avctx->codec->flush(avctx);
>>> }
>>
>> why not
>> if(frame_thraading)
>> ff_frame_thread_flush(avctx);
>> if(avctx->codec->flush)
>> avctx->codec->flush(avctx);
>> ?
>>
>> having ff_frame_thread_flush call avctx->codec->flush behind is not good
>> for
>> sake of readablity
>
> Same reason as above; doing it that way will work, but because of how
> release_buffer works with threading, the frames won't end up actually being
> released for a while, which is surprising. Now that I look at it, maybe I
> should comment that.
I really dislike this kind of special casing of the frame thread code
all over the place
Besides i do not understand why buffer releases are delayed
When each frame is finished it should be signalled to the user app with a
callback similar to release_buffer/get_buffer, the user app can then if it
needs to, place it in a que/mark it to prevent get_buffer from reusing it.
The default return_buffer() implementation could do something similar,
this wouldnt work with user apps that override get_buffer but not
return_buffer and enable frame threads, but then i really would say the
user app should be fixed instead of hacks being added to lavc.
Especially if these hacks cause performance loss ...
[...]
>>
>>> * - encoding: unused
>>> * - decoding: Set by user.
>>> * @param height the height of the slice
>>> @@ -951,7 +969,7 @@
>>> * Samples per packet, initialized when calling 'init'.
>>> */
>>> int frame_size;
>>> - int frame_number; ///< audio or video frame number
>>> + int frame_number; ///< Number of audio or video frames returned so
>>> far
>>> int real_pict_num; ///< Returns the real picture number of previous
>>> encoded frame.
>>>
>>> /**
>>
>>> @@ -1166,6 +1184,7 @@
>>> * If pic.reference is set then the frame will be read later by
>>> libavcodec.
>>> * avcodec_align_dimensions() should be used to find the required
>>> width and
>>> * height, as they normally need to be rounded up to the next
>>> multiple of 16.
>>> + * May be called by different threads, but not more than one at the
>>> same time.
>>> * - encoding: unused
>>> * - decoding: Set by libavcodec., user can override.
>>> */
>>> @@ -1175,6 +1194,7 @@
>>> * Called to release buffers which were allocated with get_buffer.
>>> * A released buffer can be reused in get_buffer().
>>> * pic.data[*] must be set to NULL.
>>> + * May be called by different threads, but not more than one at the
>>> same time.
>>> * - encoding: unused
>>> * - decoding: Set by libavcodec., user can override.
>>> */
>>
>> did i miss the similar change to reget_buffer() ?
>
> It seems like the purpose of reget_buffer() is to reuse the buffer with the
> data from the last frame, for codecs that draw on top of it? If that's the
> case, it's not compatible with frame threading.
hmm probably it wouldnt be trivial, lets forget about it ...
[...]
>>
>>> +static int submit_frame(PerThreadContext * volatile p, const uint8_t
>>> *buf, int buf_size)
>>> +{
>>> + FrameThreadContext *fctx = p->parent;
>>> + PerThreadContext *prev_thread = fctx->prev_thread;
>>> + AVCodec *codec = p->context->codec;
>>> + int err = 0;
>>> +
>>> + if (!buf_size && !(codec->capabilities & CODEC_CAP_DELAY)) return 0;
>>> +
>>> + pthread_mutex_lock(&p->mutex);
>>> + if (prev_thread) err = update_context_from_copy(p->context,
>>> prev_thread->context, 0);
>>> + if (err) return err;
>>> +
>>> + p->buf = av_fast_realloc(p->buf, &p->allocated_buf_size, buf_size);
>>> + memcpy(p->buf, buf, buf_size);
>>
>> copying every frame does not seem particularly reasonable to me,
>> considering
>> that it should be easy for most user apps to keep a few buffers around or
>> if they cant they can still copy ...
>>
>> This also reminds me that the lavc API should be changed to use AVPackets
>> as input to the decode functions. This would allow av_packet_dup() to be
>> used which would just copy packets when needed ...
>
> Well, with the current API I can't guarantee anything about buffers so I
> did it the safe way. CODEC_FLAG_INPUT_PRESERVED exists but doesn't cover
> quite enough, and I don't see many users for it (only vlc) in google code
> search. I can fix it if the API is better or if more clients can guarantee
> not freeing them.
well if this isnt fixed than it should at least be very clearly marked as
something that needs to be fixed.
[...]
> @@ -775,6 +779,8 @@
>
> s->palctrl = NULL;
> s->reget_buffer= avcodec_default_reget_buffer;
> +
> + s->active_thread_algorithm= FF_THREAD_DEFAULT;
> }
>
> AVCodecContext *avcodec_alloc_context2(enum CodecType codec_type){
This can be set through AVOptions default i think
[...]
> +int ff_get_buffer(AVCodecContext *avctx, AVFrame *f)
> +{
> + int ret, *progress;
> + PerThreadContext *p = avctx->thread_opaque;
> +
> + f->owner = avctx;
> + f->thread_opaque = progress = av_malloc(sizeof(int));
What is the sense behind this malloc?
also thread_opaque is clearly not the correct field semantically to keep
track of decoding progress of each frame/field
[...]
> +/// Set the threading algorithm used, or none if an algorithm was set but no thread count.
> +static void ff_validate_thread_parameters(AVCodecContext *avctx)
> +{
when its static then it doesnt need a ff_
[..]
> Restrictions on clients
> ==============================================
>
> Slice threading -
> * If the client uses draw_horiz_band, it must handle it being called from
> separate threads.
>
> Frame threading -
> * Restrictions with slice threading also apply.
> * get_buffer and release_buffer will be called by separate threads, but are
> protected by a mutex.
... "so the functions do not need to be reentrant."
> * There is one frame of delay added for every thread. Use of reordered_opaque
> will help with A/V sync problems, and clients should not assume that no frame
> being returned after all frames have been submitted means there are no frames
> left.
this one needs more discussion. As is it breaks API/ABI and actual applications
so it will have to be changed or you will have to provide some solid evidence
why the other way around would be worse or tricky.
>
> Restrictions on codecs
> ==============================================
>
> Slice threading -
> None.
>
> Frame threading -
> * Relying on previous contents of buffers no longer works. This includes using
> reget_buffer() and not copying skipped MBs.
elaborate on skipped MBs please.
I do not see how they would be affected by frame threading
> * Error resilience may not work as well, or deterministically.
It has to be fixed to work (yes this is a reason to reject the patch),
about not deterministically that can be discussed, i dont see immedeatly
why it would loose deterministicity but i woundnt mind if fixing it is hard
> * Accepting randomly truncated packets (CODEC_FLAG_TRUNCATED) no longer works.
thats ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080826/8c25f8dc/attachment.pgp>
More information about the ffmpeg-devel
mailing list