[FFmpeg-devel] [PATCH 0/4] avdevice/dshow: implement capabilities API

Thu Jun 10 16:29:57 EEST 2021

Let me respond on two levels.

Before exploring the design space of a separation of libavdevice and
libavformat below, I think it is important to first comment on the
current state (and whether the AVDevice Capabilities part of my patch
series should be blocked by this discussion).

Importantly, I would suppose that any reorganization of libavdevice
and libavformat and redesign of the libavdevice API must aim to offer
at least the same functionality as the current API, that is, an
avdevice should be able to be queried for what devices it offers
(get_device_list), should for each device provide information about
what formats it accepts/can provide
(create_device_capabilities/free_device_capabilities) and should be
able to be controlled through the API (control_message). Perhaps these
take different forms, but same functionality should be offered. As
such, having AVDevice Capabilities API implemented for one of the
devices should help, not hamper, redesign efforts because it shows how
this API would actually be used in practice. Fundamental changes such
as a new avdevice API will be backwards incompatible no matter what,
so having one more bit of important functionality
(create_device_capabilities/free_device_capabilities) implemented
doesn't create a larger threshold to initiating such a redesign
effort. Instead, it forces that all the current API functionality is
thought out as well during the redesign effort and nothing is forgotten. I
thus argue that its a good thing to bring back the AVDevice Capabilities
API, since it helps, not hinders the redesign effort. And lets not
forget it offers users of the current API functionality (me at least)
they need now, not at some indeterminate timepoint in the future.

On Wed, Jun 9, 2021 at 10:33 PM Anton Khirnov <anton at khirnov.net> wrote:
> Look through the threads
> [...]

Thanks for the pointers!

> The problem is that libavdevice is a separate library from libavformat,
> but fundamentally depends on accessing libavformat internals.

Ah ok, so this is at first instance about cleanup/separation, not
necessarily about adding new functionality (I do see Mark's list of
opportunities that a new API offer, copied below). I see Nicolas argue
this entanglement of internals is not a problem in practice, and i
suppose there is a certain amount of taste involved here. Nothing
wrong with that. I guess for me personally that it is a little funky
to have to add/change things in AVFormat when changing the AVDevice
API, and that it may be good to for the longer term look at
disentangling them. I will get back to that below, in response to some
quotes of Mark's messages last January.

Mark's (non-exhaustive) list of opportunities a libavdevice API
redesign offers (numbered by me):
On 20/01/2021 12:41, Mark Thompson wrote:
 > 1. Handle frames as well as packets.
 >    1a. Including hardware frames - DRM objects from KMS/V4L2, D3D
surfaces from Windows desktop duplication (which doesn't currently
exist but should).
 > 2. Clear core option set - currently almost everything is set by
inconsistent private options; things like pixel/sample format,
frame/sample rate, geometry and hardware device should be common
options to all.
 > 3. Asynchronicity - a big annoyance in current recording scenarios
with the ffmpeg utility is that both audio and video capture block,
and do so on the same thread which results in skipped frames.
 > 4. Capability probing - the existing method of options which log
the capabilities are not very useful for API users.

1 and 3 i cannot speak to, but 4 is indeed what i ran into: the
current state of most avdevices is not useful at all for an API user
like me when it comes to capability probing (not a reason though to
get rid of the whole API, but to wonder why it wasn't implemented.
while nobody apparently bothered to do it before me, i think there
will be more than just me who will actually use it). Currently I'd
have to issue device specific options on a not-yet opened device,
listen to the log output, parse it, etc. But the current API already
solves this, if only it was implemented. A clear core option set would
be nice indeed. And the AVDevice Capabilities API actually offers a
start at that, since it lists a bunch of options that should be
relevant to query (and set) for each device in the form of
ff_device_capabilities (in my patchset), or av_device_capabilities
before Andreas' patch removing it in January. I don't think its
complete, but its a good starting point.

Mark Thompson (2021-01-25):
> * Many of those are using it via the ffmpeg utility, but not all.

Indeed, i am an (aspiring) API user, of the dshow device specifically,
and possibly v4l2 later (but my project is Windows-only right now).
Currently hampered by lack of some API not being implemented for
dshow, hence my patch set.

> * The libavdevice API is the libavformat API because it was originally
> split out from libavformat, and it has the nice property that devices
> and files end up being interchangable in some contexts.

I can't underline enough how nice this is. My situation is simple:
devices such as webcams (but plenty others) may deliver video in
various formats, including encoded. I would have to decode those to
use them, output provided by the devices would thus have to go through
much the same pipeline as data from video files. I already had code
for reading in video files, so changes to also support webcams were
absolutely minimal. However, i needed some APIs implemented to really
round things off, make things both convenient (already the case) and
flexible (my patch set).

> * The libavdevice API, being the libavformat API for files, is not
> particularly well-suited in other contexts, because devices may not
> have the same properties as files.

Yeah, not every field in the AVFormatxxx structs is relevant for an
AVDevice. And some are a bit funkily named (like url to stuff the
device name of my webcam into). But are there specific fields one
would wish to provide for an avdevice that are currently not
available?

> * Some odd things like the completely-unused capabilities API and the
> almost-never-used message API are hacked on top of that to try to
> avoid some libavformat issues, but are not actually useful to anyone
> (hence the lack of use).

They certainly are useful! As are the avdevices themselves. I
was surprised that these APIs are not/hardly implemented. My patch set
makes using my webcam much more useful, as i am now able to pause and
restart capture (not leading to a buffer filling up when not
interested in the output!), allow me to discover what devices the user
has attached, and what formats these expose, so i can make a proper UI
(like e.g. OBS studio has). And making this UI is minimal effort as i
would not first have to learn how to work with DirectShow, or to add
yet another dependency to my application (again, ffmpeg would be
needed anyway, as i'd need to decode incoming video). It makes ffmpeg
a tool that allows you to move fast, something you can really build
upon, without losing out on device-specific config/access.

> * To implement devices as AVInputFormat/AVOutputFormat instances,
> libavdevice currently needs access to the internals of libavformat.
> * Many developers want to get rid of that dependency on libavformat
> internals, because it creates a corresponding ugliness on the
> libavformat side which has to leave those parts exposed in an
> ABI-constrained way.

What specific internals does libavdevice depend on? Is it only the
various function pointers in AVInputFormat and AVOutputFormat which
are specific to devices, not all formats? Or is there more? I also
understand that avdevices need to implement some of the other function
pointers to be functional (e.g. read_header, read_packet and
read_close), but that seems unavoidable if we'd want avdevices to be
usable where avformats are (and again: that's a huge plus in my view).
I also understand that the AVDevice API being exposed in the
libavformat makes it harder to evolve the AVDevice API.

Let me make an observation though: if we would not want to lose the
possibility to use avdevices drop-in in the place of AVFormats, some
kind of component that has access to internals of both seems
unavoidable. To me, the logical way to keep AVdevices interchangeable
with AVFormats while separating out the AVDevice API would be to
provide some kind of avdevice generic wrapper/adapter format that
would translate between the AVFormat and AVDevice API. This wrapper
would presumably be an AVFormat, but for it to work it would need
access to AVDevice internals (if only to remap function pointers). If
it is in the avdevice library, it would need access to AVFormat
internals. So some entanglement involving internals is unavoidable,
and a bullet that has to be swallowed. Agreed?

Anyway, out of Mark's options i'd vote for a separate new AVDevice
API, and an adapter component to expose/plug in AVDevices as formats.
This general adapter can expose the generic options (and
device-specific options as child options), handle any threading as
needed, map device names to the url field, etc. Workflow could then be
something like (rough proposal to get this started):
AVDeviceContext* dev_ctx = avdevice_alloc_context();
AVInputDevice* dev_inp_ctx = av_find_input_device("dshow"); // or
av_input_device_next(AV_DEVICE_VIDEO) or
av_device_next(AV_DEVICE_VIDEO | AV_DEVICE_INPUT) for any
avdevice_open_input(dev_ctx, dev_inp_ctx, options);
// or:
AVDeviceContext* dev_ctx = avdevice_alloc_input_context(AVInputDevice*
device, const char* dev_name); // e.g. dev_name="dshow"
avdevice_open_input(dev_ctx, NULL, options);
// and similar for output.
// to start capture, discovers stream parameters if not yet known
avdevice_start();
// to just discover stream parameters without starting
avdevice_probe(); // after open
// NB: need to provide a way for devices to provide multiple streams
(e.g. dshow can provide video and audio simultaneously). Should
AVDeviceContexts have AVStreams? Then you introduce a bunch of extra
entanglement again...

// then
AVFormatContext* fmt_ctx = avformat_adapt_avdevice(dev_ctx);
// and use format like usual (except its already opened!)

What this does not offer is av_find_input_format being able to find
devices (some user code may depend on that!), which is a nice part of
the current situation as well. code like
AVFormatContext* fmt_ctx = NULL;
AVInputFormat* fmt = av_find_input_format("something");
avformat_open_input(&fmt_ctx, "url", fmt, &opts);
works for devices as well, you just need to call
avdevice_register_all() first, use a device name like "dshow" in
av_find_input_format, and use a special url such as "video=Integrated
Webcam". Without such functionality you'd need a bunch of special
cases in your app to allow users to use devices as well. Perhaps this
can also still be provided as is currently the case. We should then
also implement a avformat_get_avdevice() function to get the avdevice
from the avdevice adapter format.

As seen above, and argued earlier, complete separation appears to me
impossible without losing most of the benefits of having avdevices in
the first place, and their current ease of use. But happy middle
ground allowing an advanced+flexible libavdevice API and a cleaned up
libavformat API does seem possible. There is a sweet spot there.

All that said, lets not stop work on the current avdevice component
(my patch set) while figuring out the way forward.

Cheers,
Dee