[MPlayer-dev-eng] Small changes to subreader.c file

Adam Tlałka atlka at pg.gda.pl
Tue Oct 11 23:42:57 CEST 2005


Dnia Tue, 11 Oct 2005 11:47:15 +0200, Ivan Kalvachev  
<ikalvachev at gmail.com> napisał:
> - srt is text format and should be in OS text format.
> - srt writing program write is in win text format. (all windows and most  
> unix).
> - some hardware players / vendors  require it to be in win format.
>
> Basically this flamewar is a reflection of much older war, these of
> text files. As nobody had won it, we can keep fighting forever.
> And to be the confusion complete i think that Macs still use the <cr>
> only convention.

MacOS X is more Unix-like system but I don't know exactly which format
of line endings it uses. Previous versions used to use CR only.

> Anyway I had send mails to subrip author(s) with request for
> specifications or comment.
> I have no doubt of the answer as there is explicit <lf><cr> code in
> the SubRip program (as shown above).

OK.
If we talking about standards - cr/lf text files are widely used in text  
data
exchange between different systems and this is stated in many docs, for  
example:
in RFC 2646  -  The Text/Plain Format Parameter - 3.  The Problem

    The Text/Plain media type is the lowest common denominator of
    Internet email, with lines of no more than 997 characters (by
    convention usually no more than 80), and where the CRLF sequence
    represents a line break [MIME-IMT].

in RFC 2854  -  The 'text/html' Media Type - 4. Encoding considerations

As with all MIME text subtypes, the canonical form of "text/html"
    must always represent a line break as a sequence of a CR byte value
    (0x0D) followed by an LF (0x0A) byte value.  Similarly, any
    occurrence of such a CRLF sequence in "text/html" must represent a
    line break.  Use of CR byte values and LF byte values outside of line
    break sequences is also forbidden. This rule applies regardless of
    the character encoding ('charset') involved.
it's a proposition ;-)

in XML definition - Extensible Markup Language (XML) 1.0 (Third Edition)
  2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for editing  
convenience,
are organized into lines. These lines are typically separated by some  
combination of the
characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as if  
it normalized
all line breaks in external parsed entities (including the document  
entity) on input,
before parsing, by translating both the two-character sequence #xD #xA and  
any #xD that is
not followed by #xA to a single #xA character.

My goal is to have the subtitle output format as the most compatible one.
If I write them to CD-R I just want to have a usable file under any player
- hardware or software and not depending on OS. There is no RFC for  
subtitle
file format of course and nobody wrote down which line ending it have to  
use
- it appeared on Windows platform with Delphi Pascal environment where  
cr/lf
was the standard.
So I wote for cr/lf files which are parsed used XML line handling method.
Patch in progress.

Best regards,
-- 
Adam Tlałka      mailto:atlka at pg.gda.pl    ^v^ ^v^ ^v^
System  & Network Administration Group           ~~~~~~
Computer Center,  Gdańsk University of Technology, Poland
PGP public key:   finger atlka at sunrise.pg.gda.pl




More information about the MPlayer-dev-eng mailing list