[FFmpeg-devel] Race condition in libavformat/libsrt.c causes EOF to be missed

Belabox Project belaboxproject at gmail.com
Fri Jan 29 05:29:56 EET 2021


Hi,

I've ran into a bug in libavformat's libsrt while using OBS. It's
still present as of commit ed51890191. I've tracked down the issue and
implemented a workaround, but a possible misunderstanding of SRT's
asynchronous functionality which caused this issue seems to be
permeating the implementation.

First, the initial issue: I have a Media Source (this uses libavcodec)
in an OBS [0] scene which takes an SRT input. This is configured with
the 'close file when inactive' option and with a 2s reconnect delay.
The expectation is that if the stream stops / is disconnected (more
specifically if it receives an SRT UMSG_SHUTDOWN packet which
terminates the connection), this source will be hidden. In practice, I
verified that the UMSG_SHUTDOWN packet was always delivered
immediately, however most times the source would end up displaying the
last frame for around a minute, then I'd get a message along the lines
of 'remove_usock: @193058704 not found as either socket or group.
Removing only from epoll system' in the terminal and then Media Source
would finally close. Rarely, it worked as expected. Then I found that
if I modified the SRT server feeding this Media Source to stop sending
data, wait for a couple of seconds, and the close the connection, then
it almost always worked as expected - but not quite every time.

Long story short, I've tracked the issue to a race condition in
libavformat/libsrt.c. The way SRT's async / epoll-based IO works is
that the event status is stored in the epoll containers. That is, if
an event occurs on an SRT socket, and that SRT socket isn't part of
any epoll container, then that event is lost. If we later add that
socket to an epoll container, we still won't receive the event even if
it wasn't serviced. The documentation references this [1].

So what's happening in libavformat/libsrt.c is that libsrt_read()
calls into libsrt_network_wait_fd_timeout() ->
libsrt_network_wait_fd(), which adds the socket to an epoll container
and waits for data, an error or a POLLING_TIME timeout. Then when
libsrt_network_wait_fd() returns, it *removes* the socket from the
epoll container. However, for the OBS use case to work correctly, the
relevant event which must be captured by libsrt_network_wait_fd() is
the SRT_EPOLL_ERR, which would cause AVERROR(EIO) to be returned. The
event is only delivered once when the connection is closed. While data
arrives and is serviced by the rest of the code, this event was
typically lost. But if I waited for a few seconds after I stopped
sending data before closing the connection, then it was typically
captured as this thread was idle, which explains the behavior I've
observed initially.

This does raise the question of how come libsrt_read() doesn't lose
data since it only reads a single packet per SRT_EPOLL_IN event. My
guess is that some internal SRT logic repeatedly raises the
SRT_EPOLL_IN event as long as there's data available in the read
buffer. Not sure if this is deliberate or a coincidence, as it
contradicts the documentation.

I also wonder whether this means there are race conditions in
libsrt_listen() and libsrt_listen_connect(), as they start
asynchronous calls before the socket is added to the epoll container.

My quick and dirty fix was to remove the call to
srt_epoll_remove_usock() from libsrt_network_wait_fd(), so the socket
is always in the epoll container with SRT_EPOLL_ET enabled. Then it
works as expected. Internally in SRT, calls to srt_epoll_add_usock()
and srt_epoll_update_usock() both end up calling
s_UDTUnited.epoll_add_usock(), so for now the srt_epoll_add_usock() in
libsrt_network_wait_fd() can both add the socket to the epoll
container on the first call, and then update the flags on subsequent
calls. But obviously this doesn't use the API correctly according to
the documentation.

Thanks and sorry about the wall of text

[0] https://github.com/obsproject/obs-studio
[1] https://github.com/Haivision/srt/blob/master/docs/API-functions.md#srt_epoll_update_usock


More information about the ffmpeg-devel mailing list