Print Page - Wrong file encode detected.

Title: Wrong file encode detected.
Post by: edison on October 15, 2014, 04:58:56 AM

This is a bug which was existed long time ago.
test code:

#include <stdio.h>

int main(void)
{
    printf("Hello World! 测试");

    return 0;
}

[attachment deleted by admin]

Title: Re: Wrong file encode detected.
Post by: stahta01 on October 15, 2014, 05:36:42 AM

I suggest posting a link to the file or attaching the file.

Also state the correct encoding and the wrong encoding value detected.

NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.

It is posted somewhere on this board.

Tim S.

Title: Re: Wrong file encode detected.
Post by: edison on October 15, 2014, 05:48:50 AM

Quote from: stahta01 on October 15, 2014, 05:36:42 AM
I suggest posting a link to the file or attaching the file.
Also state the correct encoding and the wrong encoding value detected.
NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.
It is posted somewhere on this board.
Tim S.

?
I have uploaded a screenshot which include notepad++ and CB open same file. The correct one is notepad++.
It is not a good solution that to choice bypass the encode dectect.

Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 15, 2014, 08:52:39 AM

Quote from: edison on October 15, 2014, 04:58:56 AM
This is a bug which was existed long time ago.

Sorry, but I can't reproduce. I've created a new file "main.c" copied/pasted your code snippet into it and it just looks exactly like in the forums and notepad...?!
My Settings are:
- Encoding: Windows 1252
- Use this encoding "as fallback"
- Try to detect...: OFF
- If conversion fails... : ON

However, are you sure you've saved your file in a proper file format like UTF-8?

Title: Re: Wrong file encode detected.
Post by: edison on October 15, 2014, 11:13:02 AM

I have created a video for demo this issue:
https://vimeo.com/108988215 (https://vimeo.com/108988215)

The CB was ran with default settings.

You can reproduce this problem via add language in Windows CP, it is Simplified Chinese(the code page should be Windows-936 or GBK or cp936) here.

Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 16, 2014, 08:33:49 AM

Quote from: edison on October 15, 2014, 11:13:02 AM
I have created a video for demo this issue:

I've seen this video. I am asking again:

Quote from: MortenMacFly on October 15, 2014, 08:52:39 AM
However, are you sure you've saved your file in a proper file format like UTF-8?

From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.

Title: Re: Wrong file encode detected.
Post by: edison on October 17, 2014, 06:27:23 AM

Quote from: MortenMacFly on October 16, 2014, 08:33:49 AM
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.

I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854

Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 17, 2014, 07:45:26 AM

Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854

Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)

Title: Re: Wrong file encode detected.
Post by: edison on October 17, 2014, 08:28:17 AM

Quote from: MortenMacFly on October 17, 2014, 07:45:26 AM
Quote from: edison on October 17, 2014, 06:27:23 AM
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)

but why if I use defaut encode(windows-936) to save file and CB will detect it as other encode ? Is it normal? Why other editor(for example notepad++) have not such problem?

Title: Re:
Post by: MortenMacFly on October 21, 2014, 10:48:19 PM

Because with the content you have in the file you have multiple options for a valid encoding. They're is no single solution. That's handled differently by editors. That's why I said enter some characters that make it easier for the detection engine to identify your language. We are using the same mechanism Mozilla uses,btw...

Title: Re:
Post by: MortenMacFly on October 21, 2014, 10:50:15 PM

...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.

Title: Re:
Post by: edison on October 22, 2014, 04:58:56 AM

Quote from: MortenMacFly on October 21, 2014, 10:50:15 PM
...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.

But I had encouter a problem when using UTF8 w/BOM:
There is some un-readable charter(s) in the first line (for example, the first line should be #include xxxx, but with UTF8 w/BOM that was changed to ("??")#include xxxx in the CB editor).

Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 22, 2014, 07:34:52 AM

I don't know what exactly you do wring, but it works perfectly here:

Steps:
- Create a new file
- enable to use BOM
- save as UTF-8
- close file
- re-open file
-> Result: UTF-8, no matter if I had added ANSI or unicode characters from your example.

Code::Blocks Forums

User forums => Using Code::Blocks => Topic started by: edison on October 15, 2014, 04:58:56 AM