Author Topic:   character encoding
mags
unregistered
posted March 16, 2000 09:01 AM           
How can we be sure about the encoding being used for character reading/writing? Have a look at the following question.

Which method implementations will write the given string to a file named "file", using UTF8 encoding?

IMPLEMENTATION A:
public void write(String msg) throws IOException {
FileWriter fw = new FileWriter(new File("file"));
fw.write(msg);
fw.close();
}

IMPLEMENTATION B:
public void write(String msg) throws IOException {
OutputStreamWriter osw =
new OutputStreamWriter(new FileOutputStream("file"), "UTF8");
osw.write(msg);
osw.close();
}

IMPLEMENTATION C:
public void write(String msg) throws IOException {
FileWriter fw = new FileWriter(new File("file"));
fw.setEncoding("UTF8");
fw.write(msg);
fw.close();
}

IMPLEMENTATION D:
public void write(String msg) throws IOException {
FilterWriter fw = FilterWriter(new FileWriter("file"), "UTF8");
fw.write(msg);
fw.close();
}

IMPLEMENTATION E:
public void write(String msg) throws IOException {
OutputStreamWriter osw = new OutputStreamWriter(
new OutputStream(new File("file")), "UTF8"
);
osw.write(msg);
osw.close();
}

I think readers and writers will always use 16 bit Unicode for characters. For FileOutputStream, do we have to specify the encoding method?
Any thoughts will be appreciated.
Thanks.

Tony Alicea
sheriff
posted March 16, 2000 11:08 AM             
If FileOutputStream is a stream, it works with bytes. In Java, characters are encoded, not bytes. So a Reader and/or Writer of sorts would have to be involved in encoding:

OutputStreamWriter(OutputStream out, String enc)
Create an OutputStreamWriter that uses the named character encoding.

maha anna
bartender
posted March 16, 2000 12:16 PM             
IMPLEMENTATION A:

Not correct .This kind of implementation may or may not use UTF-8 encoding. Every Java platform has a default encoding which may or may not be set to UTF-8. Because all subclasses of Reader and Writer uses the default encoding scheme if we do not explicitly set to different type during their construction. Because a Chineese m/c which has input and output in terms of Chineese chars would have been set to diff. default encoding scheme.
This is what I think. Others please add to this

IMPLEMENTATION B:
This is the ONLY CORRECT implementation in given ans. Just ignore the NOT CORRECT part alone. The added info is just extra info. This was pointed out by Betty in this post

Not Correct. There is no relation between the stream (byte) related classes and the encoding scheme. If you think logically also all 'stream' classes read/write as 'raw bytes'. There are some high level methods which can maximum convert these raw bytes to primivites. Since the no. of bytes of a primitive is same at all platforms, these stream classes's read/write methods works consistantly in all platforms. There is no need for encoding scheme when you read in terms of 'bytes'. Also if you want to verify this, in JDK there is no stream class constructors taking 'encoding scheme' as an arg to the constructor.

IMPLEMENTATION C:

Not Correct. There is no method called 'fileWriterObject.setEncoding(String scheme);'

IMPLEMENTATION D:

This also NOT CORRECT because of the incorrect construction of the FilterWriter object. There is no FilterWriter constructor taking enc. scheme as an arg.

IMPLEMENTATION E:

public void write(String msg) throws IOException {
OutputStreamWriter osw = new OutputStreamWriter(
new OutputStream(new File("file")), "UTF8"
);
osw.write(msg);
osw.close();
}
At first this implementation looks fine. But if you look closely you CANNOT instantiate InputStrean/OutputStream classes. They are abstract. If this mistake would have been corrected to 'new FileInputStream(...) / or any other valid InputStream object Then this is CORRECT.

[This message has been edited by maha anna (edited April 06, 2000).]

mags
unregistered
posted March 16, 2000 12:46 PM           
Maha and Tony,
Thanks a bunch for your response. Maha your detailed explanation made the point very clear.

|