patch : from std::map replace to hashmap

ollydbg · March 22, 2010, 05:36:29 AM

Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:

Code Select


stdMap 1 : 2953
stdMap 2 : 2953
stdMap 3 : 2953
stdMap 4 : 2969
stdMap 5 : 2953

---------------

hashMap 1 : 3000
hashMap 2 : 2968
hashMap 3 : 2969
hashMap 4 : 2969
hashMap 5 : 2969

with 16 map entries. the result is listed below:

Code Select


stdMap 1 : 3079
stdMap 2 : 3062
stdMap 3 : 3078
stdMap 4 : 3078
stdMap 5 : 3063

---------------

hashMap 1 : 2953
hashMap 2 : 2953
hashMap 3 : 2969
hashMap 4 : 2953
hashMap 5 : 2984

The conclusion is: We should use gnu hash wxString map.

Loaden · March 22, 2010, 05:53:52 AM

Quote from: ollydbg on March 22, 2010, 05:36:29 AM
Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:

Try change the hash function TO:

Code Select

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

Loaden · March 22, 2010, 06:22:11 AM

Please test this demo, -O2:

Code Select

#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code Select

stdMap 1 : 3375
stdMap 2 : 3407
stdMap 3 : 3390
stdMap 4 : 3375
stdMap 5 : 3407

stdMap Total Time: 16954
stdMap Average Time: 3390
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.520649

Process returned 0 (0x0)   execution time : 25.219 s
Press any key to continue.

18 replacement:

Code Select

#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    stdMap[_T("_STD_END2")] = _T("2}");
    
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    hashMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    hashMap[_T("_STD_END2")] = _T("2}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code Select

stdMap 1 : 4265
stdMap 2 : 4266
stdMap 3 : 4265
stdMap 4 : 4297
stdMap 5 : 4266

stdMap Total Time: 21359
stdMap Average Time: 4271
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.619527

Process returned 0 (0x0)   execution time : 29.609 s
Press any key to continue.

Jenna · March 22, 2010, 07:08:53 AM

I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.

Loaden · March 22, 2010, 07:36:15 AM

Quote from: jens on March 22, 2010, 07:08:53 AM
I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.

OK, I think we can use unordered_map.

Code Select

This file includes at least one deprecated or antiquated header which \ may be removed without further notice at a future date. Please use a \ non-deprecated interface with equivalent functionality instead. For a \ listing of replacement headers and interfaces, consult the file \ backward_warning.h. To disable this warning use -Wno-deprecated. 


/* A list of valid replacements is as follows: 
Use: Instead of: 
<sstream>, basic_stringbuf <strstream>, strstreambuf
<sstream>, basic_istringstream <strstream>, istrstream
<sstream>, basic_ostringstream <strstream>, ostrstream
<sstream>, basic_stringstream <strstream>, strstream
<unordered_set>, unordered_set <ext/hash_set>, hash_set
<unordered_set>, unordered_multiset <ext/hash_set>, hash_multiset
<unordered_map>, unordered_map <ext/hash_set>, hash_map
<unordered_map>, unordered_multimap <ext/hash_set>, hash_multimap
<functional>, bind <functional>, binder1st
<functional>, bind <functional>, binder2nd
<functional>, bind <functional>, bind1st
<functional>, bind <functional>, bind2nd
<memory>, unique_ptr <memory>, auto_ptr
*/

Here is the demo, need -std=gnu++0x option

Code Select

#include <iostream>
#include <map>
#include <unordered_map>

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    
    std::unordered_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code Select

stdMap 1 : 1828
stdMap 2 : 1828
stdMap 3 : 1812
stdMap 4 : 1829
stdMap 5 : 1812

stdMap Total Time: 9109
stdMap Average Time: 1821
------------------------------

hashMap 1 : 703
hashMap 2 : 703
hashMap 3 : 703
hashMap 4 : 688
hashMap 5 : 703

hashMap Total Time: 3500
hashMap Average Time: 700
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.615596

Process returned 0 (0x0)   execution time : 12.797 s
Press any key to continue.

ollydbg · March 22, 2010, 08:38:30 AM

@loaden, I have tested and find that there is nearly about 60% speed improvement, really cool!!!!

Code Select

stdMap 1 : 2203
stdMap 2 : 2188
stdMap 3 : 2203
stdMap 4 : 2203
stdMap 5 : 2203

stdMap Total Time: 11000
stdMap Average Time: 2200
------------------------------

hashMap 1 : 860
hashMap 2 : 875
hashMap 3 : 890
hashMap 4 : 875
hashMap 5 : 891

hashMap Total Time: 4391
hashMap Average Time: 878
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.600909

Process returned 0 (0x0)   execution time : 15.500 s
Press any key to continue.

koso · March 22, 2010, 09:08:49 AM

I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

Code Select

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
	unsigned long len = __h  = 128 ^ x.Len();
	
	const wxChar* p = x.c_str();
	
	for (long i = 1; i < len; i+=5)
            __h = __h ^ p[i];
	
        return size_t(__h);
    }
};
}

ollydbg · March 22, 2010, 09:15:58 AM

Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.

Loaden · March 22, 2010, 09:32:25 AM

Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

Code Select Expand
namespace std { template<> struct hash< wxString > { size_t operator()( const wxString& x ) const { unsigned long __h = 0; unsigned long len = __h = 128 ^ x.Len(); const wxChar* p = x.c_str(); for (long i = 1; i < len; i+=5) __h = __h ^ p[i]; return size_t(__h); } }; }

Thanks! What mean is unsigned long len = __h = 128 ^ x.Len();
This code will crash in my demo.
I changed to :

Code Select

namespace std
{
template<> struct hash<wxString>
{
    size_t operator()(const wxString& x) const
    {
        size_t len = x.length();
        unsigned long __h = 128 ^ len;
        const wxChar* p = x.c_str();
        for (size_t i = 0; i < len; i += 5)
            __h = __h ^ p[i];
        return size_t(__h);
    }
};
}

This is the lastest report:

Code Select

stdMap 1 : 1812
stdMap 2 : 1797
stdMap 3 : 1828
stdMap 4 : 1797
stdMap 5 : 1812

stdMap Total Time: 9046
stdMap Average Time: 1809
------------------------------

hashMap 1 : 469
hashMap 2 : 453
hashMap 3 : 469
hashMap 4 : 468
hashMap 5 : 454

hashMap Total Time: 2313
hashMap Average Time: 462
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.74461

koso · March 22, 2010, 09:47:15 AM

Yes, there was typo

Btw. it can be even better, but you should analyze, how it iss mostly used in code:

1. when you test this code with some string, which is "mapped", std::map will be slower. But once you try it with some random string, std::map can be faster (depends how well is writen hash function) - So you should find out, how often is it searching for known string, and how often for some other...

2. hash function should be faster, that string comparision used in std::map, so for example, when you try to search for string XXX, wxString comparator will soon find out, that there is no string beggining with XXX and will end -> hash function should use the same system (i.e. check, if first char is "_" and if not, then add to h some penalty value and imediately end hashing)

Loaden · March 22, 2010, 09:48:09 AM

Quote from: ollydbg on March 22, 2010, 09:15:58 AM
Quote from: koso on March 22, 2010, 09:08:49 AM
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.
en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.

Code Select

#if __GNUC_MINOR__ < 4 || (__GNUC_MINOR__ >= 4 && __GNUC_PATCHLEVEL__ < 3)
    typedef std::map<wxString, wxString> wxStringHashMap;
#else
    typedef std::unordered_map<wxString, wxString> wxStringHashMap;
#endif

oBFusCATed · March 22, 2010, 10:18:32 AM

unordered_map is available on every gcc >= 4.0 (I think), 4.1.2 have it for sure!

koso · March 22, 2010, 11:53:47 AM

Littlee modification. This version is slower, but should be better when searching mostly for not mapped strings. This version has lower probability for collision -> but it needs to be tweaked using full "key" set. Each mapped key should have different hash value.

Code Select

namespace std
{
template<> struct hash< wxString > {
    size_t operator()( const wxString& x ) const {
	unsigned long len  = x.Len();
	size_t __h  = len;
	
	const wxChar* p = x.c_str();
	
	for (unsigned long i = 1; i < len; i+=6) {
	  __h  ^= p[i];
	  __h <<= 4;
	}
        return __h;
    }
};
}

PS: before cycle can be added this code:

Code Select

if ( p[0]!=wxChar('_') ) {
	  return size_t(128 ^ __h);
	}

this will rapidly decrease search time for random strings, BUT only if there are no mapped strings without "_" at the begginning. This conforms your test example, but probably wont real usage.

ollydbg · March 22, 2010, 12:30:20 PM

@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

Code Select

    if (c == '_' || wxIsalpha(c))
    {
        // keywords, identifiers, etc.

        // operator== is cheaper than wxIsalnum, also MoveToNextChar already includes IsEOF
        while (    ( (c == '_') || (wxIsalnum(c)) )
               &&  MoveToNextChar() )
            c = CurrentChar(); // repeat

        if (IsEOF())
            return wxEmptyString;

        needReplace = true;
        str = m_Buffer.Mid(start, m_TokenIndex - start);
    }

koso · March 22, 2010, 03:11:01 PM

Quote from: ollydbg on March 22, 2010, 12:30:20 PM
@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

So if I understand it good, you are checking all words in source code (which can potentially be macro) for replacement -> so there will be many checks, but only small amount of them will lead to success.

More interesting question is, it the list/set of macros beeing replaced is constant, or it is dependent on user config? (Is this related to CC plugin configuration, where user can add custom replacement rules?). First case would simplify many things, but second will make it little complicated => you won`t be able to add there some "super-fast" filter functions, which will eliminate most false checks. (example of this is check for "_" at the begining -> it is very fast, but once user can add custom macros, it will be probably even slower than without it).

Little theory:

1. searching in <map> is based on string comparision in R-B trees. Count of operation depends on height of tree, which will be soomething like log₂ (number of macros) - in your case not more than 5 comparisions of wxString.

2. <unordered_map> with theoreticaly good hash function will run hasher function, and then 0 or 1 string comparission.

So if hash function is faster than aprox. 3 or 4 wxString comparision, you should get better results with hashing. This looks like no problem, but effectivity of string comparision is depedent on string length -> if you are checking many short words, it will be realy hard to design better hash function. And also, functions presented in this topic where more faster than good hashing function, so <unordered_map> will have to use more comparision operations...

Code::Blocks Forums

News:

patch : from std::map replace to hashmap

ollydbg

Loaden

Loaden

Jenna

Loaden

ollydbg

koso

ollydbg

Loaden

koso

Loaden

oBFusCATed

koso

ollydbg

koso