Archive

Posts Tagged ‘utf8’

FileWriter, XML and UTF-8

November 15th, 2009 RaftaMan No comments

Deep down in the Java-API:

http://java.sun.com/javase/6/docs/api/java/io/FileWriter.html

Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream.

So, if you want to write you XML-Document to a file, for the love of god, don’t use the FileWriter like this:

        BufferedWriter bufout = new BufferedWriter(new FileWriter(OUTFILE));
        bufout.write(out);
        bufout.close();

or you might end up with an XML-file that has a UTF-16 header (encoding="UTF-16") but is encoded completely differently (plain ASCII?! Not sure…).

Insted, use

                OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(OUTFILE),"UTF-16");
                out.write(s);
                out.close();

Resources:
http://www.malcolmhardie.com/weblogs/angus/2004/10/23/java-filewriter-xml-and-utf-8/

Categories: Uncategorized Tags: ,

Convert filenames from iso-8859-1 to utf-8

June 28th, 2009 RaftaMan No comments

Just as you can convert entire files from one charset to another, you can convert the filenames. For example:

convmv -f iso-8859-15 -t utf-8 -r .

would recursively convert all files in the current directory from iso-8859-1 charset into utf-8. Well, not exactly. To finally rename the files you need the --notest flag. Otherwise convmv will perform a dry run without any changes.

Categories: Uncategorized Tags: ,

Convert files from iso-8859-1 to utf-8

June 26th, 2009 RaftaMan No comments

Howto convert iso-8859-1 charset files into utf-8? Simple:

iconv --from-code=ISO-8859-1 --to-code=UTF-8 oldfile > newfile

Of course, your values for --from-code and --to-code may vary. For a list of available encodings use iconv --list

Categories: Uncategorized Tags: ,