[MPlayer-dev-eng] [PATCH] ftp charset selection support

Alban Bedel albeu at free.fr
Sun Apr 13 15:12:59 CEST 2008


On Sat, 12 Apr 2008 22:29:14 +0400
Andrew Savchenko <Bircoph at list.ru> wrote:

> Hi,
> 
> On Saturday 12 April 2008 14:30, Alban Bedel wrote:
> > Andrew Savchenko <Bircoph at list.ru> wrote:
> > Generally I'm OK with the idea of this patch, but we might want
> > to also take a look at the URL/filename encoding problem in
> > general. The problem have 2 side, source encoding and
> > destination encoding. As source we currently have:
> >
> > * command line: the terminal encoding will do
> > * slave command: terminal encoding or fs encoding?
> 
> IMHO if mplayer reads slave commands from stdin, it should use 
> terminal encoding. If it reads them from file it should use FS 
> encoding. Maybe it will be better to introduce some input_charset 
> internal variable to make input-encoding-dependent functions to be 
> independent of type of input: where we use -slave from file or 
> normal stdin, or from command line. Perhaps it is worthy to make 
> this variable configurable by the user.

Yeah, the encoding used for slave commands should be configurable.
The problem is the default, I dunno if it would be too wise to have
different default depending on the fd type. It might lead to quiet some
confusions.

> > * playlist: some format/transport protocol might specify it,
> >             fallback on local fs encoding?
> 
> Perhaps in case of transport protocol it will be reasonable to use 
> server's encoding as fallback, and for local playlist use local 
> encoding.

That what I was thinking, although again we probably want to have the
"local encoding" configurable.

> Also, ENCA seems to be a good choice as fallback for this and 
> previous issue. Currently it is used in subtitle routines and 
> saves me a lot of time. Only Russian subtitles I own are present 
> in 5 different encodings and it would be a pain to deal with them 
> if not ENCA. Of course this approach in not ideal and may fail, 
> but this is better than nothing and seems to work perfectly in the 
> most of cases.

The problem is that most path are relatively short, it make guessing a
lot harder.

> > On the destination side only the stream itself can tell what is
> > needed. Using a common format, like UTF-8, would be tempting;
> 
> Are you talking about network streams or about an internal streams 
> inside mplayer? Most likely that network streams are the subject. 
> I dunno exactly, but I'm not sure that all type of streams support 
> encoding information and all network servers set it properly.

I'm talking about the MPlayer stream types. For ex. the local fs stream
need a path in the local encoding, the ftp stream need to use the
remote server encoding, some protocol might mandate UTF-8, etc

> > but AFAIK back and forth conversion is not reliable with all
> > encoding.
> 
> At least it seems to be reliable for cyrillic and CJK charsets, I 
> can't tell about the rest.

AFAIK CJK are the problem, bcs with the Han unification different chars
in the legacy encodings give the same unicode char. Obviously you can't
then convert back from unicode to legacy without some loss.

> > The best would then probably be to simply pass the 
> > encoding along with the URL/filename when opening the stream.
> > Suggestions welcome :)
> 
> And why not introduce new command line option? Inventing new URL 
> schemes seems not good from my standpoint, but this may be 
> implemented optionally. And do not forget about ENCA. We can use 
> the same approach as in subcp: 
> enca[:fallback language[:fallbach charset]].

That's what I meant. To be more precise I thought about adding an extra
parameter to open_stream() to indicate the filename encoding.

> > > +  //recode filename if needed
> > > +  if (ftp_charset) {
> > > +    mp_msg(MSGT_OPEN, MSGL_V, "[ftp] recoding filename %s
> > > from %s to %s\n", +            p->filename,
> > > get_term_charset(), ftp_charset); +
> > > +    iconv_t iconv_filename = (iconv_t)(-1);
> > > +    /* There is no guarantee that filename_recode_common
> > > wouldn't +     * be called during playback. Thus buffer must
> > > be copied to +     * p->filename with appropriate memory
> > > reallocation. */ +    char *buf =
> > > filename_recode_common(iconv_filename, +           
> > > p->filename, ftp_charset, get_term_charset());
> >
> > Don't use declarations in middle of code blocks.
> 
> Will be enough to place declaration just after "{" clause after 
> "if" statement?

Sure.
 
> > > +    size_t len = strlen(buf) + 1;
> > > +    p->filename = realloc(p->filename, len);
> > > +    memcpy(p->filename, buf, len);
> > > +  }
> > > +#endif /* USE_ICONV */
> >
> > This is leaking memory.
> 
> Why?! This is realloc(), not malloc(). It will reuse (or free() and 
> malloc()) memory of p->filename. At the end, close_f() will free 
> this memory inside m_struct_free() call in the same way as it will 
> be freed in the case of original p->filename pointer. From the 
> point of m_option_free() it should not be any difference between 
> handling the old and the new pointer.

The iconv context is not freed AFAICT.

	Albeu




More information about the MPlayer-dev-eng mailing list