[MPlayer-dev-eng] Some question regarding H264 and Direct Rendering.

Thu Nov 6 08:44:32 CET 2014

On 06.11.2014, at 05:24, Kjetil Hvalstrand <kjetil.hvalstrand at gmail.com> wrote:
> Hi
> 
> Thanks for replying.
> 
>> If the speed difference is major that usually hints that something went
> 
> Hans de Ruiter has been investigating H264 codec, and found that x86
> version has better support for CPU vector extensions, then the PowerPC
> version has. Well might have something with Apple switching to Intel
> in 2004.
> 
> http://www.amigans.net/modules/xforum/viewtopic.php?topic_id=6723&forum=25

I don't see the connection with direct rendering being faster.

>> wrong/is badly implemented. It can at best save you one memcpy.
> 
> It’s not that simple as using memcpy to copy buffer from A to B, the
> buffer has to be reformatted for Bitmap format that video drivers
> support.
> 
> This means I have to convert the video slice from none interleaved
> yuv420p into interleaved yuv420p.

If you need to convert, direct rendering in the codec is pointless, if _codec_ support for direct rendering makes _any_ speed difference _whatsoever_ that means you implemented something incorrectly in the vo.
The only point of direct rendering (on codec side) is for the codec to write directly into GPU or DMA memory.

> So the more the VO has to work, the less CPU timer for VC.

I know that, the point is that in the situation you described, lack of codec-side direct rendering should not cause any additional overhead in the VO.

>> “Reference frames are a problem for several reasons, though the one that
>> makes them kind of useless for direct rendering is that with direct
>> rendering they might be stored in uncached memory, completely breaking
>> performance.”
> 
> You mean swappable memory.AmigaOS4 native AllocVecTags() does allow
> the user to define what memory should be swappable or not, the
> standard clib malloc() function under AmigaOS4.1 is always swappable
> memory I think, and I’m not sure its aligned to prevent cache misses.

No, I mean non-cacheable memory. If you always use malloc'd buffers, direct rendering is pointless, in fact it will be slightly slower since we have to force edge emulation in the decoder.
I think (though without code to look at it is easy to be wrong) you misunderstood how direct rendering works and what purpose it serves (decoding directly into GPU memory basically) and you have a bug in your non-direct-rendering code that makes it slow.
Or maybe you confuse this with slice rendering (-noslices option to disable)? That can change speed significantly on older CPUs due to cache effects, but FFmpeg implements it only for older codecs (it is not really possible the same way for H.264 due to in-loop filtering).

>> “Plus, we need to make a copy when drawing the OSD. Also, newer codecs like H.264 are problematic >since they do reordering”
> 
> “reordering” you mean that graphic is moved around,

No, I mean B-trees. I.e. frames that are decoded in order 0 1 2 3 4 5 must be shown as e.g. 5 2 3 4 1 or similar.

>> “but it seems more reasonable to fix the system to support 64 bit off_t Yes I agree, I’m just having bit “
> 
> I'm having problem convincing Steven Solie of Hyperion (the current
> owners of AmigaOS) that this should fix it,
> 
>> “(but if the system doesn't support 64 bit off_t seeking
>> in files > 4 GB will not work anyway, so which use cases does such a change fix?)”
> 
> Well the OS has support for seek64() and off64_t and so on, but this
> are not used in mplayer, instead your using the __USE_FILE_OFFSET64
> precompile switch, this essentially the problem.

That is fairly idiotic, incompatibility for no good reason.
However in most places I would like off_t to be replaced by int64_t or such.
It just must be done carefully to not break things.