News:

As usual while waiting for the next release - don't forget to check the nightly builds in the forum.

Main Menu

Spellchecker Issues

Started by Khram, March 10, 2015, 01:22:29 AM

Previous topic - Next topic

oBFusCATed

Can someone try the ru-ru dictionary that is coming with libre office to spellcheck some of the files in the attached project on windows?

@Khram: It will be easier if you post the files for the dictionary yourself and so others can use them to debug the issue.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

MortenMacFly

Quote from: oBFusCATed on April 14, 2015, 02:16:52 AM
Can someone try the ru-ru dictionary that is coming with libre office to spellcheck some of the files in the attached project on windows?
Well I picked just one ru_RU dictionary I found and they are not correctly spell-checked. Maybe I picked the wrong one?

@Khram: What dictionary do you use exactly?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]

Alpha

If making a guess, the issue might be here:
Code (cpp) Select

bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
    return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}

oBFusCATed

Quote from: Khram on May 13, 2015, 10:47:20 PM
In new nightly build  (10253) - spellSheck no working
Of course it is not working - no one has fixed it, because they can't reproduce it.

Please post a source file and a dictionary file that should be used to reproduce the problem.
Also (probably) post a screenshot with your regional settings.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Alatar

Please find dictonary on yandex.disk - https://yadi.sk/d/glyHPzRKgsZkQ
Testfile and screenshot of wrong behaviour are attached

Here is C::B version string and localisation settings:


Code::Blocks svn build  rev 10309 May 25 2015, 10:02:04 - wx2.8.12 (Linux, unicode) - 64 bit

alatar@al_work:~% locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

alatar@al_work:~% uname -a
Linux al_work 3.17.7-gentoo #1 SMP PREEMPT Mon Mar 30 18:24:07 MSK 2015 x86_64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz GenuineIntel GNU/Linux

oBFusCATed

@Alatar:
I've tried the hunspell binary and it cannot spell-check correctly the Russian part of the file using your dictionary.
I've tried something like

$ copy the dictionary to /usr/share/hunspell
$ hunspell -d Russian-English  -i utf-8 /tmp/spellcheck_check.txt


I guess this is the problem:

error: unknown encoding Windows-1251: using iso88591 as fallback


Please keep in mind that hunspell uses iconv to do the conversions.
If you can reproduce the problem with huspell in a console, then you should talk to either hunspell devs or the vendor of your dictionary.

I'm running this test on gentoo linux.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

White-Tiger

I've also problems with SpellChecker and CB r10341
Basically it's not working at all.. the only thing that works is the "user dictionary".

I'm not even using any kind of weird language.. I only need the English spell checking to work as that's the main language used by developers.
Not sure what's wrong here though.. I don't see any errors in the CB consoles (only when I delete the th_* files as they can't be found, or if I switch to the GB dictionary because it's then loading the US one)

It's not only highlighting everything that is not in the custom dictionary, but also Edit->Spelling... doesn't provide suggestions or that like..
The source files I've checked with aren't even UTF-8 yet, they are still plain ASCII without special chars in them

Here are the dicts I've tried to use on my Windows machine with dictionary path set to %AppData%\codeblocks\SpellChecker : https://db.tt/pSVUEisr
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

oBFusCATed

@White-Tiger:
Just tried them and they work as expected in both r10333 and r10358 on linux.
Do you have any other hunspell based apps that you can try if they work correctly?

Also is there a nightly that just works with this dictionary?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

White-Tiger

#23
well... yes those dictionaries work in Miranda NG (IM)
I've also just tried the last stable of Code::Blocks, that one didn't work as well... going to boot up my XP VM now and try it there

edit:
tried my XP VM with r10341 nightly, SpellChecker seemed to work at first.. yet I've found out the reasons. The problem lies in the path... "%AppData%\codeblocks\SpellChecker" by itself is fully functional, but my user name includes a special character: "é"
So as soon as there's any non-ASCII char in the path, it fails to work.
Normally I wouldn't choose such a Windows user name.. but Windows simply used my real name the moment I've signed in with my Microsoft account... And so far, I didn't had a program that couldn't handle it. (and Code::Blocks works in most cases)
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

oBFusCATed

Interesting. I guess someone running windows should have to debug this.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

raynebc

One insight I can offer is that the traditional file I/O C functions like fopen (at least in Windows with MinGW) tend to not support file paths containing Unicode or extended ASCII characters.  It's been a huge thorn in my side for some time now.  Third party I/O functions (like the ones in the Allegro game library) can open such files with absolutely no problem.  Non cross-platform implementations like the ones in Visual Studio also probably support such file paths because I've never run into any Windows-specific application with that limitation.

White-Tiger

#26
well.. the way Code::Blocks opens files seems to be fine... not sure if Code::Blocks uses Unicode / wchar_t / TCHAR on Windows, but the thing is that SpellChecker finds and successfully opens the dictionaries... otherwise this shouldn't work:
SpellChecker: Thesaurus files 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_GB.idx' not found!
SpellChecker: Loading 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_US.idx' instead...


So parts of SpellChecker seem to work, while others doesn't

edit: My bet:
HunspellInterface.cpp:61-62: should both prefix the path with "\\?\" to let Hunspell handle UTF-8 paths on Windows.. (Windows only)
see:
Quote from: hunspell/hunspell.hxx
  /* Hunspell(aff, dic) - constructor of Hunspell class
   * input: path of affix file and dictionary file
   *
   * In WIN32 environment, use UTF-8 encoded paths started with the long path
   * prefix \\\\?\\ to handle system-independent character encoding and very
   * long path names (without the long path prefix Hunspell will use fopen()
   * with system-dependent character encoding instead of _wfopen()).
   */
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

White-Tiger

Quote from: Alpha on May 14, 2015, 04:22:37 AM
If making a guess, the issue might be here:
Code (cpp) Select

bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
    return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}

This was actually the root of a problem I've encountered after fixing my file path issue above.
I had to change the "wxIspunct(ch)" part into "(wxIspunct(ch) && ch!='\'')" because words such as "doesn't" also showed up to be misspelled..
I suggest to unify the source code and use something like seen in HunspellInterface.cpp:130 (uses a list of known "non-word" chars)
Code (cpp) Select
  wxString strDelimiters = _T(" \t\r\n.,?!@#$%^&*()-=_+[]{}\\|;:\"<>/~0123456789");
  wxStringTokenizer tkz(strText, strDelimiters);


I've further noticed that SpellChecker doesn't seem to handle UTF-8 at all.. at least when I try to correct the word "doesn¾" and use the suggested "doesn't", I'll end up with "doesn'txBE"
The menu item also only showed "doesn" without any visible char thereafter. (so only the first half of the UTF-8 char)
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

oBFusCATed

Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

stahta01

Quote from: oBFusCATed on August 06, 2015, 08:56:05 PM
Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?

Maybe the wrong single quote is used?

Tim S.
C Programmer working to learn more about C++.
On Windows 10 64 bit and Windows 11 64 bit.
--
When in doubt, read the CB WiKi FAQ. [url="http://wiki.codeblocks.org"]http://wiki.codeblocks.org[/url]