[FFmpeg-devel] TR-03 implementation
eloi.bail at savoirfairelinux.com
Thu Feb 16 21:22:04 EET 2017
In november, we wrote on the mailing list about implementing support for TR-03 in ffmpeg .
There were some doubts in the ffmpeg community about whether or not ffmpeg
could handle demuxing 3gbps of RTP input without significantly modifying the
RTP demuxer and/or doing kernel bypassing.
CBC/Radio Canada contracted us to test what was possible and to try to implement TR-03
in ffmpeg. Using 2 servers connected by 10gbps fibre optic connection and a switch we
performed several tests with various tools which showed that it should be possible to
receive and demux 3gbps of RTP raw video with a large enough RX queue in the
NIC and the socket. We then patched ffmpeg to support depayloading 8 and 10 bit
raw video  and process the input stream on a seperate thread . This allowed us
to succesfully receive a 3gbps raw video stream in ffmpeg and write the raw video to
the disk. We were also able to transcode it into h264.
Thus it seems to us that ffmpeg should be able to support TR-03 without significant
modifications nor kernel bypassing.
Bellow is a more detailed description of our testing and development process:
1. In the Linux Kernel: Thanks to iperf tool, we tested that the Linux
kernel is able to handle 3gbps of udp streams with a payload size of 800 to
2. Using a simple RTP demuxer, we ensured that a user space program is able
to handle a 3gbps stream without dropping packets. When adding an
increasing amount of processing per packet, we observed that eventually
packets are dropped. We concluded that minimal processing per packet should
be used to achieve the reception of 3 gbps video stream.
3. We played with Gstreamer which already implements an RTP raw video muxer
/ demuxer. We were able to send a 3gbps video stream without dropping any
packets. In reception, we experienced around 20% packet drop with 3gbps
video stream because the thread in charge of socket reading is taking 100%
CPU. Gstreamer team is aware of that and have ideas to reduce significantly
the CPU usage grouping the processing per packet with the recvmmsg syscall
4. We implement an RTP demuxer compatible with RFC 4175 and pixel format
422-8bits and 422-10bits 
* Checking FFmpeg tool code, we saw that a separate input thread(s) is used
only if there is more than one input. With a minimal pipeline which reads
an RTP stream from a socket and writes the raw video into a file, we
observed that packets were dropped because too much time was used for
We modified FFmpeg tool to force the use of a dedicated input thread.
5. Several queues are used from packet reception to packet processing.
Tunning each queue allowed us to have zero packet dropped:
* In the NIC queue: thanks to ethtool, we increased the queue size from 453
to its maximum (4078) to avoid packet dropped in the NIC queue
* In the Kernel queue: we observed no packet dropped after increasing the
queue size to 16 mo
* In the jitter buffer queue (FFmpeg): By default the jitter buffer is
sized for 500 packets. With 1080P raw videos (RFC4175), we calculated
that a video frame would lead to around 3000 packets.
To be more resilient to packets reordering, we could increase the size of
the jitter buffer but we observed that using a big jitter buffer, a
significant processing per packet is added and lead thus to packet dropped
in the Kernel. In addition, RFC4175 adds a mechanism to be resiliant to packet
reordering per video frame.
* With :
- our test setup composed of 2 servers running Centos 7 linked by a 10gbps
- our modified FFmpeg to handle RFC4175 and to improve the reading
- NIC and Kernel queues tunned and FFmpeg jitter buffer disabled
we were able to:
- send a 3 gbps video stream with gstreamer
- receive with FFmpeg a 3 gbps video stream 422-8 bits without dropping any
packets nor having any video artifacts.
* However, using pixel format 4.2.2 10bits (packed), we encountered a
performance degradation. Indeed 4.2.2 10bits (packed) is not supported in
FFmpeg. We decided to convert into a 4.2.2 10bits planar format. We
believe that this conversion adds too much processing per packets and thus
leads to packets dropped.
We are able to stream (and live transcode) 1080p 60fps 42210-bits without dropping packets. In reception the
bandwidth is around 2.2 gbps.
More information about the ffmpeg-devel