[FFmpeg-devel] [PATCH] "Mojibake" in Japanese

compn tempn at twmi.rr.com
Mon Feb 6 18:39:08 CET 2012


On Mon, 6 Feb 2012 18:17:28 +0100, Clément Bœsch wrote:
>On Tue, Feb 07, 2012 at 02:12:15AM +0900, Tetsuya Yoshida wrote:
>> Hi Carl!
>> 
>> > Please provide a sample.
>> 
>> For example, in the case of the 'あ'. ('あ' is Japanese multibyte character)
>> Byte code written in Shift JIS is '0x82 0xa0'.
>> Byte code written in UTF-8 is '0xe3 0x81 0x82'.
>> When the '0x82 0xa0' is written as ISO-8859-1, libavformat is read as UTF-8.
>> But UTF-8 does not have the character corresponding to the '0x82 0xa0'.
>> So Mojibake occurs.
>> Also, Since bytes length is changed by PUT_UTF8,
>> Outputted byte code will not match neither Shift JIS nor UTF-8 encodings.
>> 
>> libiconv convert between different character encodings.
>> In this case, '0x82 0xa0' will convert to '0xe3 0x81 0x82'.
>> So libavformat will be able to read correctly the ID3 Tags.
>> 
>> I was prepared to a mp3 file.
>> It is occurs Mojibake.
>> in title.
>> It is written the 'あ' by Shift JIS as ISO-8859-1 in title.
>> 
>> Also, I written libiconv sample source.
>> You will get image easily, if you run a program.
>> If that does not contain Japanese fonts on your computer, it would not be
>> displayed correctly.
>> Please run a program on your computer.
>> 
>> Line 1 of outputted file is written in Shift JIS.
>> If open as Shift JIS encoding, line 1 is correct display.
>> 
>> Line 2 of outputted file is written in UTF-8.
>> If open as UTF-8 encoding, line 2 is correct display.
>> 
>> ==============================
>> $ gcc mojibake.c -liconv
>> $ ./a.out > mojibake.txt
>> 
>> $ vim -c 'e ++enc=sjis' mojibake.txt
>> ShiftJIS: あ ( 0x82 0xa0 )
>> UTF8: 縺? ( 0xe3 0x81 0x82 )
>> 
>> $ vim -c 'e ++enc=utf8' mojibake.txt
>> ShiftJIS: ?? ( 0x82 0xa0 )
>> UTF8: あ ( 0xe3 0x81 0x82 )
>> ==============================
>> 
>> Did you understand Mojibake?
>> 
>
>By "sample", Carl meant a file so we can test your patch :-)

we need an mp3 file with that id3 tag which shows the problem that is.
not a text file.

-compn


More information about the ffmpeg-devel mailing list