[FFmpeg-devel] Timestamp problems when transcoding asf/wmav2/wmv3 to ts/aac/h264

Fri Dec 6 18:03:22 CET 2013

After transcoding a asf file with wmav2/wmv3 to a ts file with aac/h264 
using the ffmpeg executable, the audio packet timestamps are wrong when 
I again use libavformat to demux the file. I've tested with both the 
ffmpeg aac encoder and libfdk_aac and the problem remains the same.

When I transcode I get the warning log message:
[aac @ 0xa55ee0] Queue input is backward in time
[mpegts @ 0xa56be0] Non-monotonous DTS in output stream 0:1; previous: 
1089818, current: 1087511; changing to 1089819. This may result in 
incorrect timestamps in the output file.
[mpegts @ 0xa56be0] Non-monotonous DTS in output stream 0:1; previous: 
1089819, current: 1089431; changing to 1089820. This may result in 
incorrect timestamps in the output file.

What is happening as far as I understand is that the wmav2 packets have 
slightly wrong timestamps so that sometimes the dts gap is much smaller 
than the actuall sample duration in the packets. wmav2 has much larger 
frames than what is sent in to the aac encoder. When FFMpeg uses 
ff_filter_frame_needs_framing to divide the big audio frames into 
smaller frames for the aac encoder, the smaller frames at the end of the 
large frames get timestamps larger than the next big frame from the asf 
demuxer.

The ffmpeg executable solves this in write_frame(ffmpeg.c:545) by moving 
the next packets 1 timestamp in front of the previous packet. This is 
when the "Non-monotonous DTS" shows up. This work pretty well, and I can 
play the file afterwards. The problem comes when I afterward run the 
fille into my own software that uses libavformat. Some of the packets 
that I then read from the file have non-monotonous increasing dts. 
Putting a log line in the mpegts demuxer shows that the actuall dts in 
the file is correct (only increasing), however somewhere in parse_packet 
(libavformat/utils.c:1201) the timestamps are corrupted by 
compute_pkt_fields.

I don't fully understand what is going on in parse_packet, but it seems 
like with the help of the ac3 parser the packet is spilt into serveral 
smaller packets and execpt for the first sub packet, the timestamps are 
calculated using duration. This caluculation end up giving 
non-monotonous increasing dts in packets returned to the public API. Can 
anyone help shed some light on what is going on here?