[FFmpeg-user] Full Hardware Acceleration for USB Webcam 1080p at 30fps to H.264 RTSP Stream on Raspberry Pi 4

Wed Feb 12 15:04:15 EET 2025

Description:

I’m trying to achieve full GPU hardware acceleration for encoding video
from a USB webcam (1920x1080 at 30fps) to H.264 RTSP stream using ffmpeg via
MediaMTX Docker on a Raspberry Pi 4 Model B (4GB with PoE+ HAT and fan).

Current Setup:

Raspberry Pi 4B 4GB with PoE+ HAT
OS: Debian GNU/Linux 12 (bookworm) aarch64Kernel: 6.6.74-v8+ #1844 SMP
PREEMPT Mon Jan 27 11:41:19 GMT 2025

Docker Compose:

Code: Select all <https://forums.raspberrypi.com/viewtopic.php?t=383724#>

services:
mediamtx:
image: bluenviron/mediamtx:latest-ffmpeg-rpi
restart: always
environment:
- MTX_RTSPTRANSPORTS=tcp
- MTX_WEBRTCADDITIONALHOSTS=192.168.1.1XX
ports:
- '8559:8559'
devices:
- "/dev/snd:/dev/snd"
- "/dev/video11:/dev/video11"
- "/dev/video0:/dev/video0"
volumes:
- /home/user/Documents/mediamtx/mediamtx_cam3.yml:/mediamtx.yml
- /etc/localtime:/etc/localtime:ro

MediaMTX Configuration (mediamtx.yml):

Code: Select all <https://forums.raspberrypi.com/viewtopic.php?t=383724#>

rtspAddress: :8559
writeQueueSize: 8192
protocols: [tcp]

paths:
swissten:
runOnInit: >
ffmpeg -hide_banner -hwaccel drm -thread_queue_size 64 -f v4l2
-input_format mjpeg -video_size 1920x1080 -framerate 30 -i /dev/video0
-f alsa -thread_queue_size 1024 -ac 2 -ar 48000 -i plughw:webcam
-vsync 2 -b:v 4000k -maxrate 8000k -bufsize 7000k -acodec aac -b:a
226k -vf "format=yuv420p,vflip,hflip,drawtext=fontfile=/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf:text='%{localtime:%d.%m.%Y
- %H\:%M\:%S}':fontcolor=white:fontsize=20:borderw=2:bordercolor=black:x=w-tw-10:y=h-th-10"
-c:v h264_v4l2m2m -preset ultrafast -tune zerolatency -f rtsp
rtsp://localhost:8559/swissten -rtsp_transport tcp
runOnInitRestart: yes

Problem:

MediaMTX, ffmpeg streaming, and hardware-accelerated encoding to H.264
using h264_v4l2m2m are working. However, CPU usage spikes due to pixel
format conversion from MJPEG to yuv420p, which is required by the
h264_v4l2m2m encoder. This pixel format conversion is handled by the CPU,
causing a bottleneck.

What I’m Looking For:

I’m seeking a method to achieve full GPU hardware acceleration:

Either directly encode the MJPEG stream to H.264 using hardware
acceleration.

Or hardware encode the MJPEG to yuv420p, and then to H.264 using GPU,
bypassing CPU for pixel format conversion.

Additional Information:

The /dev/video11 device is the hardware encoder (bcm2835-codec) on the
Raspberry Pi:

Code: Select all <https://forums.raspberrypi.com/viewtopic.php?t=383724#>

[h264_v4l2m2m @ 0x5594dd9d20] Using device /dev/video11
[h264_v4l2m2m @ 0x5594dd9d20] driver 'bcm2835-codec' on card
'bcm2835-codec-encode' in mplane mode
[h264_v4l2m2m @ 0x5594dd9d20] requesting formats: output=YU12 capture=H264

I had to add /dev/video11 in Docker to access the hardware encoder, and
modified the MediaMTX configuration accordingly. Despite this, the pixel
format conversion remains a CPU-bound process.

Any suggestions or tools that could help pipe the 1080p at 30fps stream from
/dev/video0 to the HW encoder and output in yuv420p format without CPU
bottleneck would be greatly appreciated.

Thanks in advance!