[FFmpeg-user] audio artefacts after segment and transcode

Tue May 7 16:06:56 EEST 2019

> I am transcoding larger videos on a set of computers in parallel. I do this by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then transcode parts using GNU parallel, then recombine parts into one output file using ffmpeg -f concat -i ...). This works well, but I had issues with audio being not in sync with videos or having audio "artefacts". I solved that by transcoding audio separately, but I would prefer the more direct solution to transcode both audio and video in one step.

Probably transcoding video and audio (that’s been segmented while stream copying) in one step is more or less causing this… 
If you can live with just encoding in one step you might get better results?
Of course then you’ll need to decode the whole file from start to finish, but that’s not as cpu intensive, and not reliable, as you’ve seen.

> Input #0, avi, from 'input.avi':
>  Metadata:
>    IAS1            : Deutsch
>    IAS2            : English
>    encoder         : Lavf58.20.100
>  Duration: 00:00:30.04, start: 0.000000, bitrate: 861 kb/s
>    Stream #0:0: Video: mpeg4 (Simple Profile) (xvid / 0x64697678), yuv420p, 576x432 [SAR 1:1 DAR 4:3], 719 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
>    Stream #0:1: Audio: mp3 (U[0][0][0] / 0x0055), 48000 Hz, stereo, fltp, 128 kb/s

yuv4 and pcm_f32le/be fits for this, I think. So from this, 

> # step 2: create segments
> 
> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v? -map a? -map s? /tmp/input_part_%06d.mp4

try changing it to 

ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -segment_list /tmp/input_part.list -segment_list_type ffconcat -map 0? -c copy -c:v yuv4 -c:a pcm_f32le /tmp/input_part_%06d.mov

Segment sizes should be longer though, at 0.5 seconds the overhead would not be insignificant. I’m guessing it was just for the demo?

> # step 3: process each "segment" (parts created in step 2)
> 
> for f in `seq -f %06g 1 59`; do ffmpeg -y -hide_banner -i input_part_$f.mp4 -c:v libx265 -map v? -map a? -map s? /tmp/output_part_$f.mp4; done

And encode the segments in your distributed/parallel setup.

ffmpeg -y -hide_banner -i /tmp/input_part_$INDEX.mov -c:v libx265 -c:a eac3 /tmp/output_part_$INDEX.mov

> # step 4: create a ffconcat file for the output file
> 
> for f in /tmp/output_part_*.mp4; do echo "file '$f'" >>/tmp/output_part.list; done

The first line in the ffconcat being ffconcat version 1.0 seems to help, you should probably just use the generated ffconcat segment list as the template,

sed 's/input/output/g' /tmp/input_part.list > /tmp/output_part.list

> # step 5: create output file
> 
> ffmpeg -y -hide_banner -safe 0 -f concat -i /tmp/output_part.list -c:v copy -c:a copy -c:s copy -map v? -map a? -map s? /tmp/output.mov

And putting it all together should be the same.

> Do you have an explanation or do you know how this audio artefacts can be solved? Can it be that it's just an issue with codec timebases or because libx265 is using a variable frame rate (ffprobe of output.mov has an effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)? I would very much appreciate some help.

The timebase thing could bake sense, something something rounding issues when segmenting, timestamps being unaligned, type of thing? But I don’t think x265 does variable frame rates (not sure), regardless in an mp4 it’s most definitely constant. Set the framerate during the encoding step if that’s important, the “normal” ones you can use abbreviations for (ntsc, pal, film, ntsc-film, etc) to pass the right rate instead of rounding the decimals. 

25fps is pal