[FFmpeg-devel] [PATCH] "Mojibake" in Japanese
ubitux at gmail.com
Mon Feb 6 18:17:28 CET 2012
On Tue, Feb 07, 2012 at 02:12:15AM +0900, Tetsuya Yoshida wrote:
> Hi Carl!
> > Please provide a sample.
> For example, in the case of the 'あ'. ('あ' is Japanese multibyte character)
> Byte code written in Shift JIS is '0x82 0xa0'.
> Byte code written in UTF-8 is '0xe3 0x81 0x82'.
> When the '0x82 0xa0' is written as ISO-8859-1, libavformat is read as UTF-8.
> But UTF-8 does not have the character corresponding to the '0x82 0xa0'.
> So Mojibake occurs.
> Also, Since bytes length is changed by PUT_UTF8,
> Outputted byte code will not match neither Shift JIS nor UTF-8 encodings.
> libiconv convert between different character encodings.
> In this case, '0x82 0xa0' will convert to '0xe3 0x81 0x82'.
> So libavformat will be able to read correctly the ID3 Tags.
> I was prepared to a mp3 file.
> It is occurs Mojibake.
> in title.
> It is written the 'あ' by Shift JIS as ISO-8859-1 in title.
> Also, I written libiconv sample source.
> You will get image easily, if you run a program.
> If that does not contain Japanese fonts on your computer, it would not be
> displayed correctly.
> Please run a program on your computer.
> Line 1 of outputted file is written in Shift JIS.
> If open as Shift JIS encoding, line 1 is correct display.
> Line 2 of outputted file is written in UTF-8.
> If open as UTF-8 encoding, line 2 is correct display.
> $ gcc mojibake.c -liconv
> $ ./a.out > mojibake.txt
> $ vim -c 'e ++enc=sjis' mojibake.txt
> ShiftJIS: あ ( 0x82 0xa0 )
> UTF8: 縺? ( 0xe3 0x81 0x82 )
> $ vim -c 'e ++enc=utf8' mojibake.txt
> ShiftJIS: ?? ( 0x82 0xa0 )
> UTF8: あ ( 0xe3 0x81 0x82 )
> Did you understand Mojibake?
By "sample", Carl meant a file so we can test your patch :-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 490 bytes
Desc: not available
More information about the ffmpeg-devel