News:

The new Release 25.03 is out! You can download binaries for Windows and many major Linux distros here .

Main Menu

Wrong file encode detected.

Started by edison, October 15, 2014, 04:58:56 AM

Previous topic - Next topic

edison

This is a bug which was existed long time ago.
test code:
#include <stdio.h>

int main(void)
{
    printf("Hello World! 测试");

    return 0;
}


[attachment deleted by admin]

stahta01

I suggest posting a link to the file or attaching the file.

Also state the correct encoding and the wrong encoding value detected.

NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.

It is posted somewhere on this board.

Tim S.
C Programmer working to learn more about C++.
On Windows 10 64 bit and Windows 11 64 bit.
--
When in doubt, read the CB WiKi FAQ. [url="http://wiki.codeblocks.org"]http://wiki.codeblocks.org[/url]

edison

#2
Quote from: stahta01 on October 15, 2014, 05:36:42 AM
I suggest posting a link to the file or attaching the file.
Also state the correct encoding and the wrong encoding value detected.
NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.
It is posted somewhere on this board.
Tim S.

?
I have uploaded a screenshot which include notepad++ and CB open same file. The correct one is notepad++.
It is not a good solution that to choice bypass the encode dectect.

MortenMacFly

Quote from: edison on October 15, 2014, 04:58:56 AM
This is a bug which was existed long time ago.
Sorry, but I can't reproduce. I've created a new file "main.c" copied/pasted your code snippet into it and it just looks exactly like in the forums and notepad...?!
My Settings are:
- Encoding: Windows 1252
- Use this encoding "as fallback"
- Try to detect...: OFF
- If conversion fails... : ON

However, are you sure you've saved your file in a proper file format like UTF-8?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

#4
I have created a video for demo this issue:
https://vimeo.com/108988215

The CB was ran with default settings.

You can reproduce this problem via add language in Windows CP, it is Simplified Chinese(the code page should be Windows-936 or GBK or cp936) here.

MortenMacFly

Quote from: edison on October 15, 2014, 11:13:02 AM
I have created a video for demo this issue:
I've seen this video. I am asking again:
Quote from: MortenMacFly on October 15, 2014, 08:52:39 AM
However, are you sure you've saved your file in a proper file format like UTF-8?
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

Quote from: MortenMacFly on October 16, 2014, 08:33:49 AM
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854

MortenMacFly

Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

Quote from: MortenMacFly on October 17, 2014, 07:45:26 AM
Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)

but why if I use defaut encode(windows-936) to save file and CB will detect it as other encode ? Is it normal? Why other editor(for example notepad++) have not such problem?

MortenMacFly

Because with the content you have in the file you have multiple options for a valid encoding. They're is no single solution. That's handled differently by editors. That's why I said enter some characters that make it easier for the detection engine to identify your language. We are using the same mechanism Mozilla uses,btw...
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

MortenMacFly

...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

#11
Quote from: MortenMacFly on October 21, 2014, 10:50:15 PM
...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.

But I had encouter a problem when using UTF8 w/BOM:
There is some un-readable charter(s) in the first line (for example, the first line should be #include xxxx, but with UTF8 w/BOM that was changed to ("??")#include xxxx in the CB editor).

MortenMacFly

I don't know what exactly you do wring, but it works perfectly here:

Steps:
- Create a new file
- enable to use BOM
- save as UTF-8
- close file
- re-open file
-> Result: UTF-8, no matter if I had added ANSI or unicode characters from your example.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]