[MPlayer-dev-eng] [PATCH] ftp charset selection support

Alban Bedel albeu at free.fr
Sat Apr 12 12:30:33 CEST 2008


On Sat, 12 Apr 2008 05:45:59 +0400
Andrew Savchenko <Bircoph at list.ru> wrote:

> Hello,
> 
> Currently there are a lot of ftp servers with different charsets 
> and it is a problem to work with these hosts if local encoding 
> differs, only for Russian there are over 10 different encoding and 
> 3 of them are widely used. So I made a patch which allows to 
> select remote server's encoding and recode URL into it.

Just out of interest, how is this problem handled in HTML? Is it
needed to write the document in the encoding used by the FTP server
one link to?

Generally I'm OK with the idea of this patch, but we might want to
also take a look at the URL/filename encoding problem in general.
The problem have 2 side, source encoding and destination encoding.
As source we currently have:

* command line: the terminal encoding will do
* slave command: terminal encoding or fs encoding?
* playlist: some format/transport protocol might specify it,
            fallback on local fs encoding?

On the destination side only the stream itself can tell what is
needed. Using a common format, like UTF-8, would be tempting; but
AFAIK back and forth conversion is not reliable with all encoding.
The best would then probably be to simply pass the encoding along
with the URL/filename when opening the stream. Suggestions welcome :)

> I've found that function for filename recoding already exists: 
> filename_recode() in mp_msg.c, so I changed it in order to recode 
> from arbitrary initial encoding to arbitrary final one. I 
> extracted common part to filename_recode_common() and added iconv 
> error handling routines (otherwise iconv silently fails if e.g. 
> incorrect encoding is specified.). iconv_t handlers are separate 
> for each "branch" of convertios, because both strcmp() and 
> iconv_open()/iconv_close() are expensive operations to be 
> performed each time. This is what the first patch supposed to do.

> +#ifdef USE_ICONV
> +#include <iconv.h>
> +#endif /* USE_ICONV */
> +
> +/* maximum message length of mp_msg */
> +#define MSGSIZE_MAX 3072
> +
> +const char* filename_recode_common(iconv_t inv_msgiconv, 
> +    const char* filename,const char *tocode, const char *fromcode);
> +

This is going to break if iconv is unavailable. The function name could
also be better. This function is not specific to filenames and _common
say very little. mp_recode_to()?

I would suggest making this function static and adding a wrapper that
hide iconv. That would allow mp_msg to keep using a static iconv context
and make things simpler for other uses.

> -/* maximum message length of mp_msg */
> -#define MSGSIZE_MAX 3072
> -

Why do you move this define out of mp_msg.c?

> The second patch is just cosmetics required after the first patch.

Fine.

> The third patch is ftp URL recoding support itself utilizing the 
> first patch.

> +/* defined in stream_ftp.c */
> +#ifdef USE_ICONV
> +extern char *ftp_charset;
> +#endif /* USE_ICONV */

No need to put extern declarations under #ifdef.

> +#ifdef USE_ICONV
> +#include <iconv.h>
> +#include <string.h>

The need for iconv should be abstracted away.

> +#ifdef USE_ICONV

Get rid of the #ifdef.

> +  //recode filename if needed
> +  if (ftp_charset) {
> +    mp_msg(MSGT_OPEN, MSGL_V, "[ftp] recoding filename %s from %s to %s\n",
> +            p->filename, get_term_charset(), ftp_charset);
> +    
> +    iconv_t iconv_filename = (iconv_t)(-1);
> +    /* There is no guarantee that filename_recode_common wouldn't
> +     * be called during playback. Thus buffer must be copied to
> +     * p->filename with appropriate memory reallocation. */
> +    char *buf = filename_recode_common(iconv_filename, 
> +            p->filename, ftp_charset, get_term_charset());

Don't use declarations in middle of code blocks.

> +    size_t len = strlen(buf) + 1;
> +    p->filename = realloc(p->filename, len);
> +    memcpy(p->filename, buf, len);
> +  }
> +#endif /* USE_ICONV */

This is leaking memory.

> And the fourth patch is man page update describes new -ftp-charset 
> option.
> 
> Currently user must specify encoding on his/her own, but in several 
> cases it is possible to autodetect encoding:
> 1) Negotiation with server using LANG feature (rfc2640) if server 
> supports it.
> 2) Alternatively try to server's guess encoding by descendingly 
> parsing directories in URL (starting from /) in order to identify 
> encoding using ENCA.
> 
> This will be implemented later, I hope.

Patch welcome :)

	Albeu




More information about the MPlayer-dev-eng mailing list