News:

Accounts with zero posts and zero activity during the last months will be deleted periodically to fight SPAM!

Main Menu

Wrong file encode detected.

Started by edison, October 15, 2014, 04:58:56 AM

Previous topic - Next topic

edison

This is a bug which was existed long time ago.
test code:
#include <stdio.h>

int main(void)
{
    printf("Hello World! 测试");

    return 0;
}


[attachment deleted by admin]

stahta01

I suggest posting a link to the file or attaching the file.

Also state the correct encoding and the wrong encoding value detected.

NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.

It is posted somewhere on this board.

Tim S.
C Programmer working to learn more about C++.
On Windows 10 64 bit and Windows 11 64 bit.
--
When in doubt, read the CB WiKi FAQ. [url="http://wiki.codeblocks.org"]http://wiki.codeblocks.org[/url]

edison

#2
Quote from: stahta01 on October 15, 2014, 05:36:42 AM
I suggest posting a link to the file or attaching the file.
Also state the correct encoding and the wrong encoding value detected.
NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.
It is posted somewhere on this board.
Tim S.

?
I have uploaded a screenshot which include notepad++ and CB open same file. The correct one is notepad++.
It is not a good solution that to choice bypass the encode dectect.

MortenMacFly

Quote from: edison on October 15, 2014, 04:58:56 AM
This is a bug which was existed long time ago.
Sorry, but I can't reproduce. I've created a new file "main.c" copied/pasted your code snippet into it and it just looks exactly like in the forums and notepad...?!
My Settings are:
- Encoding: Windows 1252
- Use this encoding "as fallback"
- Try to detect...: OFF
- If conversion fails... : ON

However, are you sure you've saved your file in a proper file format like UTF-8?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

#4
I have created a video for demo this issue:
https://vimeo.com/108988215

The CB was ran with default settings.

You can reproduce this problem via add language in Windows CP, it is Simplified Chinese(the code page should be Windows-936 or GBK or cp936) here.

MortenMacFly

Quote from: edison on October 15, 2014, 11:13:02 AM
I have created a video for demo this issue:
I've seen this video. I am asking again:
Quote from: MortenMacFly on October 15, 2014, 08:52:39 AM
However, are you sure you've saved your file in a proper file format like UTF-8?
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

Quote from: MortenMacFly on October 16, 2014, 08:33:49 AM
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854

MortenMacFly

Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

Quote from: MortenMacFly on October 17, 2014, 07:45:26 AM
Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)

but why if I use defaut encode(windows-936) to save file and CB will detect it as other encode ? Is it normal? Why other editor(for example notepad++) have not such problem?

MortenMacFly

Because with the content you have in the file you have multiple options for a valid encoding. They're is no single solution. That's handled differently by editors. That's why I said enter some characters that make it easier for the detection engine to identify your language. We are using the same mechanism Mozilla uses,btw...
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

MortenMacFly

...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

edison

#11
Quote from: MortenMacFly on October 21, 2014, 10:50:15 PM
...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.

But I had encouter a problem when using UTF8 w/BOM:
There is some un-readable charter(s) in the first line (for example, the first line should be #include xxxx, but with UTF8 w/BOM that was changed to ("??")#include xxxx in the CB editor).

MortenMacFly

I don't know what exactly you do wring, but it works perfectly here:

Steps:
- Create a new file
- enable to use BOM
- save as UTF-8
- close file
- re-open file
-> Result: UTF-8, no matter if I had added ANSI or unicode characters from your example.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]