[FFmpeg-devel] [PATCH 1/1] avcodec/libopusdec: Enable FEC/PLC

Wed Feb 17 13:11:09 EET 2021

> Could you elaborate?
> I would have expected that the normal use case is not have a
> lossy input and that the new feature is always useful if data
> was lost.

The use-case for FEC is typically RTP stream where audio is compressed
with opus. In that case, depending on the network conditions, packets
can be lost. From the decoder side, this can be observed as a packet
with a pts that is incremented more than we would expect.

With this patch, if no discontinuity is detected, the packet is decoded
as usual. If a discontinuity is detected, its duration is checked to
deduct how many samples should be reconstructed. One limitation of opus
FEC is that packets can be reconstructed with a granularity of 2.5ms
(120 samples), so libopus can not jsut reconstruct an arbitrary number
of samples. This patch manages this by "rounding" the number of lost
samples to the closest number of samples that libopsu can reconstruct.

> Or is there no way to distinguish between a stream with
> unusual timestamps and a stream with packet loss?

>From my understanding, the only information at the decoder side are the
metadata associated with the packet to decode, notably the pts, dts and
packet duration. It does not have information about how the packet has
been received (RTP, SDP, other...). Packet loss can typically be
detected at higher layers (RTCP packets for RTP) but not at the decoder
layer.

>From observation it seemed that a pts different from the last pts +
packet duration would indicate that some samples were lost. The decoder
than tries to restore as much sample as it can.

Note that packet duration seem to change units depending on how the file
is encapsulated: the same file had a packet->duration of 20 (ms) in a
.mkv container but 960 (samples) in a .opus container. Since the decoder
is not aware of which container is used, it has to "guess" which unit is
used. As FEC is ony used with SILK frames, which have a 10ms (480
samples) minimal duration, and an opus packet has a max duration of
120ms, a threshold can find how the pts is expressed.

An alternative could be to simply provide a new function to decode FEC
data and let the higher layers manage if FEC should be decoded, but this
implies more modification to the stack to make FEC work, while here only
an AVOption has to be set.