[FFmpeg-devel] [PATCH v2] fbdetile cpu based framebuffer layout detiling v02

C Hanish Menon hanishkvc at gmail.com
Mon Jun 29 01:17:08 EEST 2020


Hi Mark,

A small additional clarification to my last email, where I have responded
to your queries/thoughts.

The additional flexible generic logic which I am experimenting currently,
allows the more complex Tile-Yf to be detiled with around 50% overhead
compared to the targetted Tile-X or Tile-Y implementation. WHile the
flexible generic logic handles Tile-X using only additional 3% overhead
compared to the targetted Tile-X implementation. So in that sense the
generic logic which I am currently experimenting seems to do good at one
level. So for TileX, TileY one uses the targeted logic, while for the more
intricate tiled layouts use the flexible | configurable generic detile
logic.


On Mon, Jun 29, 2020 at 3:10 AM C Hanish Menon <hanishkvc at gmail.com> wrote:

> Hi Mark,
>
> **** hwdownload vs separate filter
>
> True, for kmsgrab use-case one could potentially do this transform as part
> of the drm_transfer_data logic (which currently mmaps and does a linear
> copy, if even I remember correctly).  But like what I had mentioned in my
> previous email, as this is done on the cpu side, if one wants to capture
> very large framebuffers (say 4K or 8K at high fps), it could impact the
> performance to some extent, so in such a situation decoupling the capture
> from detiling, allows one to capture the screen at a very high resolution
> without worrying about detiling and then handle detile in a offline /
> separate pass manner.
>
> NOTE1: Also as a side note, I dont think the existing logic is currently
> fetching the format modifier of the actual frame buffer, I think it gets
> set to NONE type by default and remains like that, unless user passes the
> format_modifier argument, but I could be wrong in this understanding of
> mine, as I have only gone through the code flow quickly once and also as I
> am in alien territory in some sense at one level.
>
> **** Tile layouts
>
> As it mainly supports Intel tile layouts for now, and as older Intel GPUs
> didnt support Tile-Y format for scan out purpose, I think currently most
> set the framebuffer layout to Tile-X for display purpose. So in that sense
> the default type of Tile-X which is used by the filter should be fine for
> most cases. However if one wants, one can change the tile conversion format
> to Tile-Y by passing a argument to the filter. Also as I wasnt very sure
> the format-modifer is being picked up by default, so also used the most
> likely case as the default and inturn provided the option to change the
> layout conversion to use if required.
>
> NOTE2: The Tile-X being the default is my understanding based on a quick
> glance through the Intel GPU documents and potentially some things which I
> might have seen online.
>
> NOTE3: I am not much clued in into this domain in general, nor tracking
> it, but more as I had a issue with some capturing which I wanted to do, I
> went through the ffmpeg kmsgrab + hwup/down and hwcontext code path a bit,
> some documents and headers quickly and then based on a rough logical
> understanding I wanted to implement a quick and flexible solution to solve
> my problem as well as potentially help others who might have a similar
> issue. And that is how this filter got done.
>
> Also I am planning to add a additional generic detile logic later, where
> the user can configure the tile format as a list of direction changes and
> few other constraints and then the same logic can handle either TileX or
> TileY or TileYs or TileYf or ... This will be slower (based on some initial
> tests the generic logic seems to be around 50% slower compared to current
> specific targeted conversion logics which I have implemented), but should
> allow one to try and detile any (or rather more correctly - many) kind of
> tile layouts, as the case may be. Again the idea is to use this generic
> path has a offline / second pass.
>
>
> On Mon, Jun 29, 2020 at 2:28 AM Mark Thompson <sw at jkqxz.net> wrote:
>
>> On 27/06/2020 20:57, hanishkvc wrote:
>> > v02-20200627IST2331
>> >
>> > Unrolled Intel Legacy Tile-Y detiling logic.
>> >
>> > Also a consolidated patch file, instead of the previous development
>> > flow based multiple patch files.
>> >
>> > v01-20200627IST1308
>> >
>> > Implemented Intel Legacy Tile-X and Tile-Y detiling logic
>> >
>> > NOTES:
>> >
>> > This video filter allows framebuffers which are tiled to be detiled
>> > using logic running on the cpu, into a linear layout.
>> >
>> > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling.
>> > THis should help one to work with frames captured (say using kmsgrab)
>> > on laptops having Intel GPU.
>> >
>> > Tile-X conversion logic has been explicitly cross checked, with Tile-X
>> > based frames. However Tile-Y conv logic hasnt been tested with Tile-Y
>> > based frames, but it should potentially do the job, based on my current
>> > understanding of the Tile-Y layout format.
>> >
>> > TODO1: At a later time have to generate Tile-Y based frames, and then
>> > cross check the corresponding logic explicitly.
>> >
>> > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the
>> > layout conversion. But some online discussions from sometime back seem
>> > to indicate that this path is not fully bug free currently.
>> > ---
>> >   Changelog                 |   1 +
>> >   doc/filters.texi          |  62 ++++++++
>> >   libavfilter/Makefile      |   1 +
>> >   libavfilter/allfilters.c  |   1 +
>> >   libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++
>> >   5 files changed, 374 insertions(+)
>> >   create mode 100644 libavfilter/vf_fbdetile.c
>>
>> For your kmsgrab use-case I think you are doing this in the wrong place.
>> There is already a copy during the download step (the hwdownload filter
>> before this), and that does know what the tiling mode
>> is such that it could detile transparently without a need for an extra
>> filter doing another copy.  See drm_transfer_data_from() in
>> libavutil/hwcontext_drm.c, which currently just does the linear copy
>> you observe regardless of the format modifier on the input buffer.
>>
>> Unrelated to the previous point, does the dependence of the actual layout
>> of the X and Y tiled formats on the exact model of GPU in use cause any
>> problems here?  If the layout is actually the same on
>> everything people might use nowadays then it's probably fine; if that
>> isn't true then maybe it needs some extra check.
>>
>> - Mark
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
>
>
> --
> Keep ;-)
> HanishKVC
>


-- 
Keep ;-)
HanishKVC


More information about the ffmpeg-devel mailing list