[FFmpeg-user] Encoding MP4/AAC audio from pcm: issues with packets, duration, and pts/dts (especially when using -movflags empty_moov)
Eric Amram
eric.amram at gmail.com
Thu Apr 6 23:12:28 EEST 2017
Hello,
I've encountered several issues trying to encode audio PCM into MP4/AAC.
I've recompiled the latest nightly to make sure it was not already solved.
Here are the FFmpeg command line I ran to encode a 8192 bytes of raw s16le PCM file (4096 samples) into MP4/AAC:
ffmpeg -nostdin -hide_banner -loglevel debug \
-f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \
-f mp4 -acodec aac -movflags empty_moov -ac 1 -ar 44100 -b:a 128000 \
result.mp4
(same without empty_moov)
ffmpeg -nostdin -hide_banner -loglevel debug \
-f s16le -channel_layout mono -vn -ac 1 -i test-8192.raw \
-f mp4 -acodec aac -ac 1 -ar 44100 -b:a 128000 \
result.mp4
1/ Why is there an empty packet added to the MP4?
When I run ffmpeg, I get the following logs:
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 50.496140%
Input file #0 (test-8192.raw):
Input stream #0:0 (audio): 4 packets read (8192 bytes); 4 frames decoded (4096 samples);
Total: 4 packets (8192 bytes) demuxed
Output file #0 (result.mp4):
Output stream #0:0 (audio): 4 frames encoded (4096 samples); 5 packets muxed (1814 bytes);
Total: 5 packets (1814 bytes) muxed
4 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x25eb300] Statistics: 0 seeks, 4 writeouts
--> There are *5* packets (and 5 frames) instead of the 4 frames from the input file.
When decoded, this additional packet is a series of 2048 bytes of pure zeros (1024 samples of 0).
However, it does use 536 bytes in the mp4 file. Why such a waste??
Moreover, with empty_moov flag, the mp4 file is seen having a LONGER DURATION by players,
and it triggers 23ms of initial silence when playing the file.
2/ PTS/DTS bug with EMPTY_MOOV on this first packet
Running ffprobe on the result.mp4, the pts/dts seems wrong when using -movflags empty_moov.
# ffprobe -hide_banner -pretty -show_packets result.mp4
WITHOUT empty_moov, the first packet (the empty one with pure zeros) has pts/dts
with negative values, so that the next packet with actual sound starts at 0:00
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'result.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf57.72.100
Duration: 00:00:00.12, start: 0.000000, bitrate: 176 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 124 kb/s (default)
Metadata:
handler_name : SoundHandler
[PACKET]
codec_type=audio
stream_index=0
pts=-1024
pts_time=0:00:-0.023220
dts=-1024
dts_time=0:00:-0.023220
duration=1024
duration_time=0:00:00.023220
convergence_duration=N/A
convergence_duration_time=N/A
size=536 byte
pos=44
flags=KD
[SIDE_DATA]
side_data_type=Skip Samples
skip_samples=1024
discard_padding=0
skip_reason=0
discard_reason=0
[/SIDE_DATA]
[/PACKET]
[PACKET]
codec_type=audio
stream_index=0
pts=0
pts_time=0:00:00.000000
dts=0
dts_time=0:00:00.000000
duration=1024
duration_time=0:00:00.023220
...
But WITH -movflags empty_moov, the first packet starts at pts/dts 0:00, and therefore
mp4 players see a LONGER file, with 23ms of silence at the start:
[PACKET]
codec_type=audio
stream_index=0
pts=0
pts_time=0:00:00.000000
dts=0
dts_time=0:00:00.000000
duration=N/A
duration_time=N/A
convergence_duration=N/A
convergence_duration_time=N/A
size=536 byte
pos=849
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=0
pts=1024
pts_time=0:00:00.023220
dts=1024
dts_time=0:00:00.023220
duration=1024
duration_time=0:00:00.023220
Here is the detail about my FFmpeg version:
ffmpeg version N-85272-gc901ae9 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-11)
configuration: --prefix=/opt/ffmpeg_build --extra-cflags=-I/opt/ffmpeg_build/include --extra-ldflags='-L/opt/ffmpeg_build/lib -ldl' --bindir=/usr/local/bin --pkg-config-flags=--static --enable-gpl --enable-libfreetype
libavutil 55. 59.100 / 55. 59.100
libavcodec 57. 91.100 / 57. 91.100
libavformat 57. 72.100 / 57. 72.100
libavdevice 57. 7.100 / 57. 7.100
libavfilter 6. 84.100 / 6. 84.100
libswscale 4. 7.100 / 4. 7.100
libswresample 2. 8.100 / 2. 8.100
libpostproc 54. 6.100 / 54. 6.100
Any help about why there is an additional first packet filled with zeros,
and why the timing turns wrong with empty_moov would be much appreciated!!
Thank you!
More information about the ffmpeg-user
mailing list