[FFmpeg-devel] [PATCH] avcodec/libmp3lame: properly handle unaligned frame data

Thu Apr 27 11:51:12 EEST 2017

L'octidi 8 floréal, an CCXXV, Michael Niedermayer a écrit :
> I agree
> in fact i added such a flag in 2011 (4d34b6c1a1254850e39a36f08f4d2730092a54db)
> within the API of that time to avfilter

It was not a bad idea, but it should not be limited to filters. A few
comments.

* First, the framequeue framework does not produce unaligned code.
According to the C standard, the data it handles stay aligned. The
alignment problems come from non-standard requirements by special
processor features used by some filters and codecs, but not all.

* That means a lot of the most useful codecs and filters will suffer
from it, but not all. For many tasks, the alignment is just fine, and
the extra copy would be wasteful.

* The alignment requirements increase. Before AVX, it was up to 16, now
it can be 32, and I have no doubt future processor will at some point
require 64 or 128. But realigning buffers used with SSE to 32 would be
wasteful too. Thus, we do not require a flag but a full integer.

* The code that does the actual work of realigning a buffer should
available as a stand-alone API, to be used by applications that work at
low-level. I suppose something like that would be in order:

	int av_frame_realign(AVFrame *frame, unsigned align);

Or maybe:

	int av_frame_realign(AVFrame *frame, unsigned align,
	                     AVBufferAllocator *alloc);

where AVBufferAllocator is there to allocate possibly hardware or mmaped
buffers.

* It is another argument for my leitmotiv that filters and codecs are
actually the same and should be merged API-wise.

* It would be better to have the API just work for everything rather
than documenting the alignment needs.

As for the actual implementation, I see a lot of different approaches:

- have the framework realing the frame before submitting it to the
  filters and codecs: costly in malloc() and memcpy() but simple;

- have each filter or codec call av_frame_realign() as needed; it may
  seem less elegant than the previous proposal, but it may prove a
  better choice in the light of what follows;

- have each filter or codec copy the unaligned data into a buffer
  allocated once and for all or on the stack, possibly by small chunks:
  less costly in malloc() and refcounting overhead, and possibly better
  cache-locality, but more complex code;

- run the plain C version of the code on unaligned data rather than the
  vectorized version, or the less-vectorized version (SSE vs AVX) on
  insufficiently aligned data.

Since all this boils down to a matter of performance and is related to
the core task of FFmpeg, I think the choice between the various options
should be done on a case-by-case basis using real benchmarks.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170427/10e4ff9f/attachment.sig>