[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n for subtitle handling

Mon Dec 13 00:03:23 EET 2021

On Sat, Dec 11, 2021 at 06:03:39PM +0000, Soft Works wrote:
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Michael
> > Niedermayer
> > Sent: Saturday, December 11, 2021 6:21 PM
> > To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n
> > for subtitle handling
> > 
> > On Fri, Dec 10, 2021 at 03:02:32PM +0000, Soft Works wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> > > > Cantarín
> > > > Sent: Thursday, December 9, 2021 10:33 PM
> > > > To: ffmpeg-devel at ffmpeg.org
> > > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare
> > AVFrame\n
> > > > for subtitle handling
> > > >
> > > > Hi there.
> > > > This is my first message to this list, so please excuse me if I
> > > > unintendedly break some rule.
> > > >
> > > > I've read the debate between Soft Works and others, and would like to
> > > > add something to it.
> > > > I don't have a deep knowledge of the libs as other people here show. My
> > > > knowledge comes from working with live streams for some years now. And I
> > > > do understand the issue about modifying a public API for some use case
> > > > under debate: I believe it's a legit line of questioning to Soft Works
> > > > patches. However, I also feel we live streaming people are often let
> > > > aside as "border case" when it comes to ffmpeg/libav usage, and this
> > > > bias is present in many subtitles/captions debates.
> > > >
> > > > I work with Digital TV signals as input, and several different target
> > > > outputs more related to live streaming (mobiles, PCs, and so on). The
> > > > target location is Latin America, and thus I need subtitles/captions for
> > > > when we use english spoken audio (we speak mostly Spanish in LATAM). TV
> > > > people send you TV subtitle formats: scte-27, dvb subs, and so on. And
> > > > live streaming people uses other subtitles formats, mostly vtt and ttml.
> > > > I've found that CEA-608 captions are the most compatible caption format,
> > > > as it's understood natively by smart tvs and other devices, as well as
> > > > non-natively by any other device using popular player-side libraries.
> > > > So, I've made my own filter for generating CEA-608 captions for live
> > > > streams, using ffmpeg with the previously available OCR filter. Tried
> > > > VTT first, but it was problematic for live-streaming packaging, and with
> > > > CEA-608 I could just ignore that part of the process.
> > > >
> > > > While doing those filters, besides the whole deal of implementing the
> > > > conversion from text to CEA-608, I struggled with stuff like this:
> > > > - the sparseness of input subtitles, leading to OOM in servers and
> > > > stalled players.
> > > > - the "libavfilter doesn't take subtitle frames" and "it's all ASS
> > > > internally" issues.
> > > > - the "captions timings vs video frame timings vs audio timings"
> > > > problems (people talk a lot about syncing subs with video frames, but
> > > > rarely against actual dialogue audio).
> > > > - other (meta)data problems, like screen positioning or text encoding.
> > > >
> > > > This are all problems Soft Works seems to have faced as well.
> > > >
> > > > But of all the problems regarding live streaming subtitles with ffmpeg
> > > > (and there are LOTS of it), the most annoying problem is always this:
> > > > almost every time someone talked about implementing subtitles in filters
> > > > (in mail lists, in tickets, in other places like stack overflow,
> > > > etcetera), they always asumed input files. When the people specifically
> > > > talked about live streams, their peers always reasoned with files
> > > > mindset, and stated live streaming subtitles/captions as "border case".
> > > >
> > > > Let me be clear: this are not "border case" issues, but actually appear
> > > > in the most common use cases of live streaming transcoding. They all
> > > > appear *inmediatelly* when you try to use subtitles/captions in live
> > > > streams.
> > > >
> > > > I got here (I mean this thread) while looking for ways to fixing some
> > > > issues in my setup. I was reconsidering VTT/TTML generation instead of
> > > > CEA-608 (as rendering behave significantly different from device to
> > > > device), and thus I was about to generate subtitle type output from some
> > > > filter, was about to create my own standalone "heartbeat" filter to
> > > > normalize the sparseness, and so on and so on: again, all stuff Soft
> > > > Works seems to be handling as well. So I was quite happy to find someone
> > > > working on this again; last time I've seen it in ffmpeg's
> > > > mailing/patchwork
> > > > (https://patchwork.ffmpeg.org/project/ffmpeg/patch/20161102220934.26010-
> > 1-
> > > > u at pkh.me)
> > > > the code there seemed to die, and I was already late to say anything
> > > > about it. However, reading the other devs reaction to Soft Works work
> > > > was worrying, as it felt as history wanted to repeat itself (take a look
> > > > at discussions back then).
> > > >
> > > > It has been years so far of this situation. This time I wanted to
> > > > annotate this, as this conversation is still warm, in order to help Soft
> > > > Works's code survive. So, dear devs: I love and respect your work, and
> > > > your opinion is very important to me. I do not claim to know better than
> > > > you do ffmpeg's code. I do not claim to know better what to do with
> > > > libavfilter's API. Please understand: I'm not here to be right, but to
> > > > note my point of view. I'm not better than you; quite on the contrary
> > > > most likely. But I also need to solve some very real problems, and can't
> > > > wait until everything else is in wonderful shape to do it. I can't also
> > > > add lots of conditions in order to just fix the most immediate issues;
> > > > like it's the case with sparseness and heartbeat frames, which was a
> > > > heated debate years ago and seems to still be one, while I find it to be
> > > > the most obvious common sense backwards-compatible solution
> > > > implementation. Stuff like "clean" or "well designed" can't be more
> > > > important than actually working use cases while not breaking previously
> > > > implemented ones: because it's far easier to fix little blocks of "bad"
> > > > code rather than design something everybody's happy with (and history of
> > > > the project seems to be quite eloquent about that, specially when it
> > > > comes to this particular use cases). Also, I have my own patches (which
> > > > I would like to upstream some day), and I can tell the API do change
> > > > quite normally: I understand that should be a curated process, but
> > > > adding a single property for live-streaming subtitles isn't also
> > > > anybody's death, and thus that shouldn't be the kind of issues that
> > > > blocks big and important code implementations like the ones Soft Works
> > > > is working on; I just don't have the time to do myself all that work
> > > > he/she's doing, and it could be another bunch of years until someone
> > > > else have it. I can't tell if Soft Works code is well enough for you, or
> > > > if the ideas behind it are the best there are, but I can tell you the
> > > > implementations are in the right track: as a live streaming worker, I
> > > > know the problems he/she mentions in their exchanges with you all, and I
> > > > can tell you they're all blocking issues when dealing with live
> > > > streaming. Soft Work is not "forcing it" into the API, and this are not
> > > > "border cases" but normal and frequent live streaming issues. So,
> > > > please, if you don't have the time Soft Works have, or the will to
> > > > tackle the issues he/she's tackling, I beg you at least don't kill the
> > > > code this time if it does not breaks working use cases.
> > > >
> > > >
> > > > Thanks,
> > > > Daniel.
> > >
> > > Hi Daniel,
> > >
> > > thanks a lot for your kind words. I'm a "He-Man", and if I could turn
> > > back time, I would have used my real name. Yet I started off as softworkz
> > > and I can't change anymore without compromising the pseudonym.
> > >
> > > As you have realized, the ML can be a pool of sharks at time,
> > > everybody following different motivations, sometimes personal, sometimes
> > > commercial, you'll hardly ever know. From my side, I have benefitted
> > > a lot from ffmpeg and it has always been a plan to contribute something
> > > in return, with the subtitles subject finally being chosen.
> > > The conclusion is that I have spent more time on ML interaction than
> > > on the development itself, so it hasn't really been an economically
> > > effective kind of work load.
> > > Nonetheless, I have patiently applied all requested changes going over
> > > many iterations so far.
> > > From the remaining change requests, there's a major one that I'm rejecting
> > > to change (duality of frame.pts and frame.subtitle_pts field), and I don't
> > > know whether I haven't explained the requirement for the duality of those
> > > sufficiently well, or whether it wasn't attempted to be understood and
> > > just blindly objected as being a "gray" spot regarding the frame API.
> > > The duality doesn't serve just edge cases, it is an important element
> > > of the heartbeat mechanisms for dealing with sparse subtitles and also
> > > important to retain muxing offsets (often subtitles are muxed a few
> > > seconds ahead of time).
> > 
> > > The other point that I'm rejecting to change are the time bases of the
> > > involved fields. I have projected the existing subtitles functionality
> > > to the new API in a direct and transparent way, to achieve a high
> > > level of compatibility and stability for the transition.
> > > Being able to use the result as an instant replacement in production
> > > scenarios is a top-level requirement from my side and I cannot take
> > > the risk of needing to fix regressions all over the place which
> > > would be introduced by a change like making those fields adhering
> > > to a non-constant time-base.
> > 
> > This sounds a bit like you expect that the majority of cases to not
> > change ? iam asking because
> > most cases i tried do change with the part of the patchset which
> > cleanly applies. In fact about half of the changes are the failure i already
> > posted previously. I think you said its an issue elsewhere. Still that needs
> > to be fixed before this patchset can be used as a
> > "instant replacement in production scenarios"
> 
> You had posted two cases that were failing. 
> 
> 1. > ./ffmpeg -i ~/tickets/1332/Starship_Troopers.vob -scodec xsub -qscale 2 -an
> > file1332.avi
> 
> ==> Fixed since V18
> 
> 
> 2. > This breaks:
> > ./ffmpeg -i ~/tickets/153/bbc_small.ts -filter_complex '[0:v][0:s]overlay' -
> > qscale 2 -t 3 -y file.avi
> 
> ==> It wasn't actually a regression. It was a bug in dvbdubdec that just got
> covered up earlier by some sub2video hacks.
> 
> I have submitted this fix for the error:
> 
> https://patchwork.ffmpeg.org/project/ffmpeg/patch/DM8P223MB03655DEE6FF0228743117178BA6A9@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM/

doesnt fix it

the v23_plus set still fails:

./ffmpeg -ss 20 -i dvbsubtest.ts  -filter_complex "[0:v][0:s]overlay[v]" -map '[v]' -map 0:a -acodec copy -vcodec mpeg4 -t 5 -bitexact /tmp/file.avi

Input #0, mpegts, from 'dvbsubtest.ts':
  Duration: 00:00:34.64, start: 79677.098467, bitrate: 4842 kb/s
  Program 1 
  Stream #0:0[0x1901](eng): Video: mpeg2video (Main) ([2][0][0][0] / 0x0002), yuv420p(tv, top first), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn
    Side data:
      cpb: bitrate max/min/avg: 15000000/0/0 buffer size: 1835008 vbv_delay: N/A
  Stream #0:1[0x19a1](eng): Audio: mp2 ([4][0][0][0] / 0x0004), 48000 Hz, stereo, s16p, 256 kb/s
  Stream #0:2[0x19b1](eng): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006)
Stream mapping:
  Stream #0:0 (mpeg2video) -> overlay
  Stream #0:2 (dvbsub) -> overlay
  overlay:default -> Stream #0:0 (mpeg4)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
subtitle input filter: decoding size 0x0
Auto-inserting graphicsub2video filter
[swscaler @ 0x5608969b4940] Value 0.000000 for parameter 'srcw' out of range [1 - 2.14748e+09]
[swscaler @ 0x5608969b4940] Value 0.000000 for parameter 'srch' out of range [1 - 2.14748e+09]
[swscaler @ 0x5608969b4940] Value 0.000000 for parameter 'dstw' out of range [1 - 2.14748e+09]
[swscaler @ 0x5608969b4940] Value 0.000000 for parameter 'dsth' out of range [1 - 2.14748e+09]
[graphicsub2video @ 0x560896bd7540] [IMGUTILS @ 0x7fff37fbd6d0] Picture size 0x0 is invalid
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

Same failure with a different iinput:

./ffmpeg -i ~/tickets/4062/negative_pts_sub.ts -copyts -filter_complex '[0:0][0:3]overlay=shortest=1[outv0]' -map 0:1 -map '[outv0]' -bitexact /tmp/sadlybroken.ts

Input #0, mpegts, from 'tickets//4062/negative_pts_sub.ts':
  Duration: 00:00:04.89, start: -47.631967, bitrate: 6154 kb/s
  Program 1601 
    Metadata:
      service_name    : Yle TV1 HD 7M
      service_provider: Yle
  Stream #0:0[0x137]: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, top first), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 50 tbr, 90k tbn
  Stream #0:1[0x3b6](fin): Audio: ac3 ([6][0][0][0] / 0x0006), 48000 Hz, stereo, fltp, 448 kb/s
  Stream #0:2[0x3b9](dut): Audio: ac3 ([6][0][0][0] / 0x0006), 48000 Hz, stereo, fltp, 192 kb/s (visual impaired) (descriptions)
  Stream #0:3[0x4cb](fin): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006)
  Stream #0:4[0x4e2](fin): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006) (hearing impaired)
  Stream #0:5[0x1450](fin): Subtitle: dvb_teletext ([6][0][0][0] / 0x0006), 492x250
Stream mapping:
  Stream #0:0 (h264) -> overlay (graph 0)
  Stream #0:3 (dvbsub) -> overlay (graph 0)
  Stream #0:1 -> #0:0 (ac3 (native) -> mp2 (native))
  overlay:default (graph 0) -> Stream #0:1 (mpeg2video)
Press [q] to stop, [?] for help
[h264 @ 0x55f22e15d840] reference picture missing during reorder
[h264 @ 0x55f22e15d840] Missing reference picture, default is 2147483647
[h264 @ 0x55f22df57d00] mmco: unref short failure
[h264 @ 0x55f22df7fb00] reference picture missing during reorder
    Last message repeated 1 times
[h264 @ 0x55f22df7fb00] Missing reference picture, default is 65774
    Last message repeated 1 times
[h264 @ 0x55f22e0cbb00] mmco: unref short failure
[h264 @ 0x55f22de22840] mmco: unref short failure
[h264 @ 0x55f22df89a00] reference picture missing during reorder
[h264 @ 0x55f22df89a00] Missing reference picture, default is 65782
[h264 @ 0x55f22e15d840] mmco: unref short failure
subtitle input filter: decoding size 0x0
Auto-inserting graphicsub2video filter
[swscaler @ 0x55f22fabad80] Value 0.000000 for parameter 'srcw' out of range [1 - 2.14748e+09]
[swscaler @ 0x55f22fabad80] Value 0.000000 for parameter 'srch' out of range [1 - 2.14748e+09]
[swscaler @ 0x55f22fabad80] Value 0.000000 for parameter 'dstw' out of range [1 - 2.14748e+09]
[swscaler @ 0x55f22fabad80] Value 0.000000 for parameter 'dsth' out of range [1 - 2.14748e+09]
[graphicsub2video @ 0x55f22fb10b40] [IMGUTILS @ 0x7fff903e52b0] Picture size 0x0 is invalid
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

and another one:

./ffmpeg -i ~/tickets/4752/dump_dvbsubtitles.mp4 -ss 5 -t 1 -filter_complex '[0:v][0:s]overlay' -bitexact /tmp/withsubtitles.ts

Input #0, mpegts, from '/home/michael/tickets//4752/dump_dvbsubtitles.mp4':
  Duration: 00:01:05.45, start: 57364.369689, bitrate: 6849 kb/s
  Program 1163 
  Stream #0:0[0xcb]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, top first), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 50 tbr, 90k tbn
  Stream #0:1[0x193](eng): Audio: ac3 ([6][0][0][0] / 0x0006), 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:2[0x25b](ara): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006)
  Stream #0:3[0x265](eng): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006)
Stream mapping:
  Stream #0:0 (h264) -> overlay (graph 0)
  Stream #0:2 (dvbsub) -> overlay (graph 0)
  overlay:default (graph 0) -> Stream #0:0 (mpeg2video)
  Stream #0:1 -> #0:1 (ac3 (native) -> mp2 (native))
Press [q] to stop, [?] for help
[h264 @ 0x5561c71b23c0] reference picture missing during reorder
[h264 @ 0x5561c71b23c0] Missing reference picture, default is 2147483647
[h264 @ 0x5561c7383340] mmco: unref short failure
    Last message repeated 1 times
[h264 @ 0x5561c7383340] number of reference frames (0+4) exceeds max (3; probably corrupt input), discarding one
[h264 @ 0x5561c7383340] reference picture missing during reorder
[h264 @ 0x5561c7383340] Missing reference picture, default is 66008
subtitle input filter: decoding size 0x0
Auto-inserting graphicsub2video filter
[swscaler @ 0x5561c87a0e80] Value 0.000000 for parameter 'srcw' out of range [1 - 2.14748e+09]
[swscaler @ 0x5561c87a0e80] Value 0.000000 for parameter 'srch' out of range [1 - 2.14748e+09]
[swscaler @ 0x5561c87a0e80] Value 0.000000 for parameter 'dstw' out of range [1 - 2.14748e+09]
[swscaler @ 0x5561c87a0e80] Value 0.000000 for parameter 'dsth' out of range [1 - 2.14748e+09]
[graphicsub2video @ 0x5561c7194340] [IMGUTILS @ 0x7ffce3e4ef80] Picture size 0x0 is invalid
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

Heres one thet generates different output: (i have not checked at all if this is a bug just seeing its differnt)

./ffmpeg -i ~/tickets/679/oversized_pgs_subtitles.mkv -filter_complex '[0:s:1]scale=848x480,[0:v]overlay=shortest=1' -bitexact /tmp/old-pgstest.avi

-rw-r----- 1 michael michael 657582 Dez 12 22:53 /tmp/new-pgstest.avi
-rw-r----- 1 michael michael 773460 Dez 12 22:54 /tmp/old-pgstest.avi

similarly differens are with: (again i did not debug what is going on why there is a difference)
./ffmpeg -i \[a-s\]_full_metal_panic_fumoffu_-_01_-_the_man_from_the_south_-_a_hostage_with_no_compromises__rs2_\[1080p_bd-rip\]\[BBB48A25\].mkv  -filter_complex '[0:s:1]scale=800:600' -t 15 -qscale 2 -bitexact /tmp/pgstest2.avi

./ffmpeg -i ~/tickets/2397/242_4.mkv -filter_complex '[0:v][0:s:1]overlay' -qscale 2 -bitexact /tmp/file2397.avi

./ffmpeg -f lavfi -i 'movie=/home/michael/videos/Closedcaption_rollup.ts[out0+subcc]' /tmp/rollup.srt

as well as others

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20211212/2d57ce80/attachment.sig>