[MPlayer-dev-eng] Some question regarding H264 and Direct Rendering.

Kjetil Hvalstrand kjetil.hvalstrand at gmail.com
Fri Nov 7 00:09:59 CET 2014


>If you need to convert, direct rendering in the codec is pointless

No I simply feed it y, u, v data where u and v is at offset at
(PictureWidth/2) basically.

Bytes per row is double the number of pixels width of one u plain (or
v plain), the codec does not know the difference, the benefit is that
u and v is automatically stored as interleaved U/V plain thanks to DR,
yes I know maybe I'm using DR for some thing it was not intended for
but it works.

>The only point of direct rendering (on codec side) is for the codec to write directly into GPU or DMA memory.

Yes it does write into memory that will be DMA copied to graphic card.

>and you have a bug in your non-direct-rendering code that makes it slow.

Well I designed to do none interleaved to interleaved conversion, so
its just has to be slower.

>if _codec_ support for direct rendering makes _any_ speed difference _whatsoever_ that means you implemented something incorrectly in the vo.

Of cause if there is a way to tell the mplayer or FFMPEG codec that my
VO only supports padded interleaved yuv420p as video output, and that
all draw_slice() should only give me interleaved yuv that is padded
for DMA, then there should only be 3 memcpy's for etch plains.

And I would not need to convert from YUV420P to none interleaved to
YUV420p interleaved.

MPlayer gives me:

YYYYYY
YYYYYY
YYYYYY
YYYYYY


UUU
UUU

VVV
VVV

But I need:

YYYYYY
YYYYYY
YYYYYY
YYYYYY

UUUVVV
UUUVVV

>No, I mean B-trees. I.e. frames that are decoded in order 0 1 2 3 4 5 must be shown as e.g. 5 2 3 4 1 or similar.

Well right now the images are drawn when mplayer calls
control(VOCTRL_DRAW_IMAGE,mpi)

I then call draw_image(mpi), and if the mpi -> flags is
MP_IMGFLAG_DIRECT, then "mpi -> priv" gives me the image image buffer
to draw.

On the next flip_page I really draw that buffer.

basically I'm doing it the same way as vo_xv.c is doing it, (I think),
this seams to work correct.

>That is fairly idiotic, incompatibility for no good reason.

Yes I agree, it does not make sense, just pure laziness from AmigaOS4
developers.
(or maybe they don't wont to promise anything)

>However in most places I would like off_t to be replaced by int64_t or such.
>It just must be done carefully to not break things.

uint64_t or off64_t, it makes no difference to me, but off_t seams
like some thing historic.

I understand that the use __USE_FILE_OFFSET64 is more flexible to
support operating systems that do not support 64bit file offsets,
while I'm not sure what modern OS / CPU that does not.

Best Regards
Kjetil.

2014-11-06 8:44 GMT+01:00 Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
> On 06.11.2014, at 05:24, Kjetil Hvalstrand <kjetil.hvalstrand at gmail.com> wrote:
>> Hi
>>
>> Thanks for replying.
>>
>>> If the speed difference is major that usually hints that something went
>>
>> Hans de Ruiter has been investigating H264 codec, and found that x86
>> version has better support for CPU vector extensions, then the PowerPC
>> version has. Well might have something with Apple switching to Intel
>> in 2004.
>>
>> http://www.amigans.net/modules/xforum/viewtopic.php?topic_id=6723&forum=25
>
> I don't see the connection with direct rendering being faster.
>
>>> wrong/is badly implemented. It can at best save you one memcpy.
>>
>> It’s not that simple as using memcpy to copy buffer from A to B, the
>> buffer has to be reformatted for Bitmap format that video drivers
>> support.
>>
>> This means I have to convert the video slice from none interleaved
>> yuv420p into interleaved yuv420p.
>
> If you need to convert, direct rendering in the codec is pointless, if _codec_ support for direct rendering makes _any_ speed difference _whatsoever_ that means you implemented something incorrectly in the vo.
> The only point of direct rendering (on codec side) is for the codec to write directly into GPU or DMA memory.
>
>> So the more the VO has to work, the less CPU timer for VC.
>
> I know that, the point is that in the situation you described, lack of codec-side direct rendering should not cause any additional overhead in the VO.
>
>>> “Reference frames are a problem for several reasons, though the one that
>>> makes them kind of useless for direct rendering is that with direct
>>> rendering they might be stored in uncached memory, completely breaking
>>> performance.”
>>
>> You mean swappable memory.AmigaOS4 native AllocVecTags() does allow
>> the user to define what memory should be swappable or not, the
>> standard clib malloc() function under AmigaOS4.1 is always swappable
>> memory I think, and I’m not sure its aligned to prevent cache misses.
>
> No, I mean non-cacheable memory. If you always use malloc'd buffers, direct rendering is pointless, in fact it will be slightly slower since we have to force edge emulation in the decoder.
> I think (though without code to look at it is easy to be wrong) you misunderstood how direct rendering works and what purpose it serves (decoding directly into GPU memory basically) and you have a bug in your non-direct-rendering code that makes it slow.
> Or maybe you confuse this with slice rendering (-noslices option to disable)? That can change speed significantly on older CPUs due to cache effects, but FFmpeg implements it only for older codecs (it is not really possible the same way for H.264 due to in-loop filtering).
>
>>> “Plus, we need to make a copy when drawing the OSD. Also, newer codecs like H.264 are problematic >since they do reordering”
>>
>> “reordering” you mean that graphic is moved around,
>
> No, I mean B-trees. I.e. frames that are decoded in order 0 1 2 3 4 5 must be shown as e.g. 5 2 3 4 1 or similar.
>
>>> “but it seems more reasonable to fix the system to support 64 bit off_t Yes I agree, I’m just having bit “
>>
>> I'm having problem convincing Steven Solie of Hyperion (the current
>> owners of AmigaOS) that this should fix it,
>>
>>> “(but if the system doesn't support 64 bit off_t seeking
>>> in files > 4 GB will not work anyway, so which use cases does such a change fix?)”
>>
>> Well the OS has support for seek64() and off64_t and so on, but this
>> are not used in mplayer, instead your using the __USE_FILE_OFFSET64
>> precompile switch, this essentially the problem.
>
> That is fairly idiotic, incompatibility for no good reason.
> However in most places I would like off_t to be replaced by int64_t or such.
> It just must be done carefully to not break things.
> _______________________________________________
> MPlayer-dev-eng mailing list
> MPlayer-dev-eng at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/mplayer-dev-eng


More information about the MPlayer-dev-eng mailing list