News:

When registered with our forums, feel free to send a "here I am" post here to differ human beings from SPAM bots.

Main Menu

Source Code encoding in UNICODE version

Started by mauser, January 01, 2006, 07:32:29 PM

Previous topic - Next topic

mauser

I have the unicode version of revision 1635. i write win32 apps and it seems that the source codes are utf-8 encoded, but MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance

mauser

Also C::B didn't show anything when i tried to open a file with russian characters in cp1251 charset editied in another editor. It behaved the way if the file was just empty.

Michael

Quote from: mauser on January 01, 2006, 07:32:29 PM
MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance

What do you mean by "understand"? Is the problem by loading, displaying or compiling the sources?

Michael
[url="http://img207.imageshack.us/img207/9728/411948picture4em.png"]http://img207.imageshack.us/img207/9728/411948picture4em.png[/url]

anonuser

UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.


Der Meister

MSVC seems to need a signature for Unicode-Source files. You can set this in the dialog "file->Extra save options" (or somthing similar to that). The required option is "UTF-8 with signature". If you save your file that way MSVC recognizes that it is a Unicode-Source-File and works with it without problems.

But: The signature consits of two or three bytes at the beginning of the file. Some editors show them (or at least some cryptic characters for them), some don't and some don't even open such a file. Code::Blocks opens it (at least it did at my last try) and seems to have no problems with that signature (it even doesn't show it). But the compilers I tested (gcc and icc - both on linux) refused to compile this file. They complain about invalid characters in the file. Unfortunately MSVC (the IDE as well as the compiler) seems to need this signature to properly handle Unicode-source-files.
The only solution I can give you here: Don't use Unicode-Source files if you want to use them with MSVC and other editors/compilers. In strings you can still use unicode-characters if you use their code instead that character itself, i.e. write '\x00E4' instead of 'รค'.
Real Programmers don't comment their code. If it was hard to write, it should be hard to understand.
Real Programmers don't write in BASIC. Actually, no programmers write in BASIC, after the age of 12.

mauser

Quote from: anonuser on January 01, 2006, 07:54:05 PM
UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.



Well utf-8 is not ascii. it is ascii compatible in some way.
And i think i just need to swith to GCC. that assumes utf-8 input by default.
Thanks

Leviathan

ascii is a subset of utf-8. All symbols in ascii are present in utf8 with the same value.
But utf-8 is a multi-byte encoding, so a symbol may consist of 1, 2 or even 4 bytes, so obviously ascii doesn't contain all symbols utf-8 does.

Now, more on topic: What "Der Meister" said is absolutely correct. Windows uses utf-16 internally, therefore its support for utf8 is limited. Also, it expects a BOM (Byte order mark) at the beginning of a unicode-file. All other textfiles are assumed to be (extended) ascii.
Unix on the other hand doesn't expect a BOM, so the first 2 bytes are interpreted as symbols.

You have 2 choices: Either stick to ASCII like "Der Meister" suggested, or write a (very simple) program to quickly add or remove the Signature (0xFEBBBF) to/from files.

mandrav

You should try with r1648. This should be fixed (for now).
Be patient!
This bug will be fixed soon...

killerbot


mandrav

Quote from: killerbot on January 02, 2006, 06:55:56 PM
?? why for now ??

otherwise this bug can be closed.

http://sourceforge.net/tracker/index.php?func=detail&aid=1384513&group_id=126998&atid=707416

Well, because I didn't add any code to handle "strange" encodings, I just asked not to do any conversion on the charset. I believe it is fixed now, but I 'll wait a while until more people have tested it.
Be patient!
This bug will be fixed soon...

killerbot

I builded and tested on some as files and scintilla editor.cxx and for those it worked, let's hope there are no side effects. (winXP sp2 system)

tiwag

it works for me now with the files, which previously didn't open.
we'll see what happens in future , don't care too much for now ...

killerbot

BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven

mandrav

Quote from: killerbot on January 02, 2006, 09:17:59 PM
BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven

It will be fixed now that I pinpointed the error :)
Just show some patience ;)
Be patient!
This bug will be fixed soon...

Ceniza

Quote from: mandravJust show some patience :wink:

Heh, you should consider to add something like that as your signature now :P