[FFmpeg-devel] [PATCH 2/3] textdec: Rename all generic parts from srt to text.

Wed Aug 1 20:25:24 CEST 2012

On Wed, Aug 01, 2012 at 10:53:28AM -0700, Philip Langdale wrote:
> On Wed, 1 Aug 2012 18:51:02 +0200
> Nicolas George <nicolas.george at normalesup.org> wrote:
> 
> > 
> > This is not really text -> ass. It could be called "pseudohtml_to_ass"
> 
> Sure.
>  
> > >  
> > > -static int srt_decode_frame(AVCodecContext *avctx,
> > > -                            void *data, int *got_sub_ptr, AVPacket
> > > *avpkt) +static int text_decode_frame(AVCodecContext *avctx,
> > > +                             void *data, int *got_sub_ptr,
> > > AVPacket *avpkt) {
> > >      AVSubtitle *sub = data;
> > >      int ts_start, ts_end, x1 = -1, y1 = -1, x2 = -1, y2 = -1;
> > > @@ -220,8 +220,8 @@ static int srt_decode_frame(AVCodecContext
> > > *avctx, ptr = read_ts(ptr, &ts_start, &ts_end, &x1, &y1, &x2, &y2);
> > >          if (!ptr)
> > >              break;
> > > -        ptr = srt_to_ass(avctx, buffer, buffer+sizeof(buffer), ptr,
> > > -                         x1, y1, x2, y2);
> > > +        ptr = text_to_ass(avctx, buffer, buffer+sizeof(buffer),
> > > ptr,
> > > +                          x1, y1, x2, y2);
> > 
> > After some thought, I am not comfortable with that. If the codec is
> > text, it should have nothing to do with ASS, especially since ASS is
> > still a mess of temporary hacks.
> 
> So, the problem here is matroska. When you put SRT into matroska, it
> gets tagged as TEXT, but all the formatting remains, and should be
> respected. You may recall the change I proposed a couple of months ago
> to identify matroska TEXT tracks as SRT, as a way to make things line
> up again. You and Clément felt that was abusive as the track is text
> in the sense of not including SRT timing information. Fair enough, hence
> I made this change.
> 
> But the pseudohtml styling is still present in the track and needs to
> be respected, so any decoder that wants to decode CODEC_ID_TEXT and
> work correctly with SRT-in-MKV must behave as written.
> 
> Either we do this, or we identify mkv text tracks as srt. I don't see
> another solution (unless you want a third decoder just for
> srt-in-mkv...)
> 

Before I comment on this, I'd like to restat again a few things:

At the moment, the current design for "pure" subtitles demuxers (aka not
in video containers like mkv) is to split the text file into chunk and
*NOT* discard the timing information (and the decoders just skip them).
This was done for a few reasons:

 - I was said a long time ago that demuxers should not drop arbitrary data
 - that would allow subtitles muxer to be "raw" muxers
 - if we now change that behaviour in lavf, some incompatibilities might
   occur with a different lavc version

For the first point, I don't know if that really makes sense for subtitles
and if there is any strong reason behind.

For the second point, it is actually an issue when dealing with -ss and
-t, and eventually if we plan to do some timestamp scaling at some point.
I think we agreed that muxers should handle the timing using the packet
info and not parsing again the packet data.

The last point is a problem: we can't decide now that the lavc/srtdec
will never receive a packet with timing in it.

So why I am talking about this here? Well, I think ideally the subtitles
decoder should *not* receive the timing information. This way, we would
have:

 - Matroska demuxer outputting SubRip packets the same way the SubRip
   demuxer would output SubRip packets (both without any timing
   information). Both would use CODEC_ID_SRT, and the codec will just
   honor the markup [note: we might need to put the coords in the side
   data or something].

   Maybe we could just introduce CODEC_ID_SUBRIP and deprecate
   CODEC_ID_SRT for that purpose and avoid the incompatibility I mentioned
   in the 3rd point.

 - Now, CODEC_ID_TEXT will be available for any formats using text
   *without* SubRip markup (contrary to Matroska): this might be needed
   for some video codec, but it means this codec will also be available
   for various subtitles format where we don't need any markup. Right now
   MicroDVD, JacoSUB, SAMI, SubViewer etc all have their own decoder for
   special markups, but I'm sure various subtitles format don't have any
   markup system, and here CODEC_ID_TEXT would make sense: that can not
   work with CODEC_ID_TEXT is considered to be "html"/SubRip markup.

If we decide to make the demuxers drop the timestamping, I will gladly
update jacosub/sami/subviewer/etc demuxers since I don't think anyone is
using them yet.

BTW, you might have noticed I don't like very much this tight link between
SRT/SubRip and TEXT just because Matroska happened to play with the
confusion.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120801/2726de11/attachment.asc>