News:

Accounts with zero posts and zero activity during the last months will be deleted periodically to fight SPAM!

Main Menu

patch : from std::map replace to hashmap

Started by Loaden, March 21, 2010, 01:21:02 PM

Previous topic - Next topic

ollydbg

Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:


stdMap 1 : 2953
stdMap 2 : 2953
stdMap 3 : 2953
stdMap 4 : 2969
stdMap 5 : 2953

---------------

hashMap 1 : 3000
hashMap 2 : 2968
hashMap 3 : 2969
hashMap 4 : 2969
hashMap 5 : 2969


with 16 map entries. the  result is listed below:

stdMap 1 : 3079
stdMap 2 : 3062
stdMap 3 : 3078
stdMap 4 : 3078
stdMap 5 : 3063

---------------

hashMap 1 : 2953
hashMap 2 : 2953
hashMap 3 : 2969
hashMap 4 : 2953
hashMap 5 : 2984


The conclusion is: We should use gnu hash wxString map.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Loaden

Quote from: ollydbg on March 22, 2010, 05:36:29 AM
Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:
Try change the hash function TO:
namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

Loaden

#17
Please test this demo, -O2:
#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
   
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
   
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

stdMap 1 : 3375
stdMap 2 : 3407
stdMap 3 : 3390
stdMap 4 : 3375
stdMap 5 : 3407

stdMap Total Time: 16954
stdMap Average Time: 3390
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.520649

Process returned 0 (0x0)   execution time : 25.219 s
Press any key to continue.


18 replacement:
#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    stdMap[_T("_STD_END2")] = _T("2}");
   
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    hashMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    hashMap[_T("_STD_END2")] = _T("2}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
   
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

stdMap 1 : 4265
stdMap 2 : 4266
stdMap 3 : 4265
stdMap 4 : 4297
stdMap 5 : 4266

stdMap Total Time: 21359
stdMap Average Time: 4271
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.619527

Process returned 0 (0x0)   execution time : 29.609 s
Press any key to continue.



Jenna

I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.

Loaden

Quote from: jens on March 22, 2010, 07:08:53 AM
I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.
OK, I think we can use unordered_map.
This file includes at least one deprecated or antiquated header which \ may be removed without further notice at a future date. Please use a \ non-deprecated interface with equivalent functionality instead. For a \ listing of replacement headers and interfaces, consult the file \ backward_warning.h. To disable this warning use -Wno-deprecated.


/* A list of valid replacements is as follows:
Use: Instead of:
<sstream>, basic_stringbuf <strstream>, strstreambuf
<sstream>, basic_istringstream <strstream>, istrstream
<sstream>, basic_ostringstream <strstream>, ostrstream
<sstream>, basic_stringstream <strstream>, strstream
<unordered_set>, unordered_set <ext/hash_set>, hash_set
<unordered_set>, unordered_multiset <ext/hash_set>, hash_multiset
<unordered_map>, unordered_map <ext/hash_set>, hash_map
<unordered_map>, unordered_multimap <ext/hash_set>, hash_multimap
<functional>, bind <functional>, binder1st
<functional>, bind <functional>, binder2nd
<functional>, bind <functional>, bind1st
<functional>, bind <functional>, bind2nd
<memory>, unique_ptr <memory>, auto_ptr
*/


Here is the demo, need -std=gnu++0x option
#include <iostream>
#include <map>
#include <unordered_map>

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
   
    std::unordered_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
   
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

stdMap 1 : 1828
stdMap 2 : 1828
stdMap 3 : 1812
stdMap 4 : 1829
stdMap 5 : 1812

stdMap Total Time: 9109
stdMap Average Time: 1821
------------------------------

hashMap 1 : 703
hashMap 2 : 703
hashMap 3 : 703
hashMap 4 : 688
hashMap 5 : 703

hashMap Total Time: 3500
hashMap Average Time: 700
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.615596

Process returned 0 (0x0)   execution time : 12.797 s
Press any key to continue.


ollydbg

@loaden, I have tested and find that there is nearly about 60% speed improvement, really cool!!!!

stdMap 1 : 2203
stdMap 2 : 2188
stdMap 3 : 2203
stdMap 4 : 2203
stdMap 5 : 2203

stdMap Total Time: 11000
stdMap Average Time: 2200
------------------------------

hashMap 1 : 860
hashMap 2 : 875
hashMap 3 : 890
hashMap 4 : 875
hashMap 5 : 891

hashMap Total Time: 4391
hashMap Average Time: 878
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.600909

Process returned 0 (0x0)   execution time : 15.500 s
Press any key to continue.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

koso

I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang  gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
unsigned long len = __h  = 128 ^ x.Len();

const wxChar* p = x.c_str();

for (long i = 1; i < len; i+=5)
            __h = __h ^ p[i];

        return size_t(__h);
    }
};
}

ollydbg

#22
Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang  gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.
en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Loaden

Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang  gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
unsigned long len = __h  = 128 ^ x.Len();

const wxChar* p = x.c_str();

for (long i = 1; i < len; i+=5)
            __h = __h ^ p[i];

        return size_t(__h);
    }
};
}


Thanks! What mean is unsigned long len = __h  = 128 ^ x.Len();
This code will crash in my demo.
I changed to :
namespace std
{
template<> struct hash<wxString>
{
    size_t operator()(const wxString& x) const
    {
        size_t len = x.length();
        unsigned long __h = 128 ^ len;
        const wxChar* p = x.c_str();
        for (size_t i = 0; i < len; i += 5)
            __h = __h ^ p[i];
        return size_t(__h);
    }
};
}


This is the lastest report:
stdMap 1 : 1812
stdMap 2 : 1797
stdMap 3 : 1828
stdMap 4 : 1797
stdMap 5 : 1812

stdMap Total Time: 9046
stdMap Average Time: 1809
------------------------------

hashMap 1 : 469
hashMap 2 : 453
hashMap 3 : 469
hashMap 4 : 468
hashMap 5 : 454

hashMap Total Time: 2313
hashMap Average Time: 462
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.74461

koso

Yes, there was typo :) Btw. it can be even better, but you should analyze, how it iss mostly used in code:

1. when you test this code with some string, which is "mapped", std::map will be slower. But once you try it with some random string, std::map can be faster (depends how well is writen hash function) - So you should find out, how often is it searching for known string, and how often for some other...

2. hash function should be faster, that string comparision used in std::map, so for example, when you try to search for string XXX, wxString comparator will soon find out, that there is no string beggining with XXX and will end -> hash function should use the same system (i.e. check, if first char is "_" and if not, then add to h some penalty value and imediately end hashing)

Loaden

Quote from: ollydbg on March 22, 2010, 09:15:58 AM
Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang  gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.
en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.
#if __GNUC_MINOR__ < 4 || (__GNUC_MINOR__ >= 4 && __GNUC_PATCHLEVEL__ < 3)
   typedef std::map<wxString, wxString> wxStringHashMap;
#else
   typedef std::unordered_map<wxString, wxString> wxStringHashMap;
#endif

oBFusCATed

unordered_map is available on every gcc >= 4.0 (I think), 4.1.2 have it for sure!
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

koso

Littlee modification. This version is slower, but should be better when searching mostly for not mapped strings. This version has lower probability for collision -> but it needs to be tweaked using full "key" set. Each mapped key should have different hash value.

namespace std
{
template<> struct hash< wxString > {
    size_t operator()( const wxString& x ) const {
unsigned long len  = x.Len();
size_t __h  = len;

const wxChar* p = x.c_str();

for (unsigned long i = 1; i < len; i+=6) {
  __h  ^= p[i];
  __h <<= 4;
}
        return __h;
    }
};
}


PS: before cycle can be added this code:

if ( p[0]!=wxChar('_') ) {
  return size_t(128 ^ __h);
}


this will rapidly decrease search time for random strings, BUT only if there are no mapped strings without "_" at the begginning. This conforms your test example, but probably wont real usage. 

ollydbg

@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

    if (c == '_' || wxIsalpha(c))
    {
        // keywords, identifiers, etc.

        // operator== is cheaper than wxIsalnum, also MoveToNextChar already includes IsEOF
        while (    ( (c == '_') || (wxIsalnum(c)) )
               &&  MoveToNextChar() )
            c = CurrentChar(); // repeat

        if (IsEOF())
            return wxEmptyString;

        needReplace = true;
        str = m_Buffer.Mid(start, m_TokenIndex - start);
    }
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

koso

Quote from: ollydbg on March 22, 2010, 12:30:20 PM
@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

So if I understand it good, you are checking all words in source code (which can potentially be macro) for replacement -> so there will be many checks, but only small amount of them will lead to success.

More interesting question is, it the list/set of macros beeing replaced is constant, or it is dependent on user config? (Is this related to CC  plugin configuration, where user can add custom replacement rules?). First case would simplify many things, but second will make it little complicated => you won`t be able to add there some "super-fast" filter functions, which will eliminate most false checks. (example of this is check for "_" at the begining -> it is very fast, but once user can add custom macros, it will be probably even slower than without it). 

Little theory:

1. searching in <map> is based on string comparision in R-B trees. Count of operation depends on height of tree, which will be soomething like log2 (number of macros) - in your case not more than 5 comparisions  of wxString.

2. <unordered_map> with theoreticaly good hash function will run hasher function, and then 0 or 1 string comparission.

So if hash function is faster than aprox. 3 or 4 wxString comparision, you should get better results with hashing. This looks like no problem, but effectivity of string comparision is depedent on string length -> if you are checking many short words,  it will be realy hard to design better hash function. And also, functions presented in this topic where more faster than good hashing function, so <unordered_map> will have to use more comparision operations...