Hi, In the current architecture, the function ThisOrReplacement(m_Token) was only called in Tokenizer::GetToken()
wxString Tokenizer::GetToken()
{
m_UndoTokenIndex = m_TokenIndex;
m_UndoLineNumber = m_LineNumber;
m_UndoNestLevel = m_NestLevel;
if(m_PeekAvailable)
{
m_TokenIndex = m_PeekTokenIndex;
m_LineNumber = m_PeekLineNumber;
m_NestLevel = m_PeekNestLevel;
m_Token = m_PeekToken;
}
else
m_Token = DoGetToken();
m_PeekAvailable = false;
return ThisOrReplacement(m_Token);
}
To accelerate this "Macro replacement", I think it should be moved to DoGetToken.
Here are the reasons:
1, ThisOrReplacement(m_Token) internally use a wxString--> wxString map container, so, it will use a search algorithm in this map(normally this will cause a search on a balanced BST), this will take a lot of time.
2, we can avoid many situations to call this function, for example, when m_Token is '{' or wxEmptyString or many other string that shouldn't need macro expansion.
Any comments?
Thanks
Also, I suggest that when the Tokenizer return a Token( wxString ), it should also combined with a "type", which means the parser can use this type information to do Syntax Analysis.
If I can remember, Ceniza call this a "Smart lexer" :D
Quote from: ollydbg on June 29, 2009, 04:10:29 AMinternally use a wxString--> wxString map container, so, it will use a search algorithm in this map(normally this will cause a search on a balanced BST), this will take a lot of time.
2, we can avoid many situations to call this function, for example, when m_Token is '{' or wxEmptyString or many other string that shouldn't need macro expansion.
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
Quote from: thomas on June 29, 2009, 09:40:37 AM
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
Not fully understand you comments :(
I mean if we return a wxString from DoGetToken(), (for example '{') as we know '{' certainly don't need to do macro replacement, so ,we can avoid calling
ThisOrReplacement('{');.
Also, there are many wxString like '{' :D
Quote from: ollydbg on June 29, 2009, 09:57:35 AM
Quote from: thomas on June 29, 2009, 09:40:37 AM
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
Not fully understand you comments :(
I understood that your idea is to catch special cases which cannot possibly be macros, so they need not looked up in the
map<wxString,wxString>.
In other words, replace code that looks like:
return the_map.find(token);with something like:
if(token.IsEmpty() || (token == one_constant) || (token == two_constant) || (token == three_constant))
return token;
else
return the_map.find(token);My point is that maps have O(log(n)) lookup, so unless a source has 20 billion preprocessor defines, it is really nothing to worry about. For "normal" amounts, log(n) will be something like 4, maybe 5. Let's assume the worst case of 5. One "operation" is a compare and a branch.
Adding a line like the above will remove 5 operations done by the map lookup in the best case, at a cost of 1-4 additional operations (average 2). So we save 5-1 = 4 operations in the best, and 5-2 = 3 operations in the average case.
In the worst case, it will add 4 operations to the existing 5, almost doubling the work.
This scenario
might still be advantageous, but it's not likely that it will be a big win.
Now, you were talking of "many" cases. Let's say "many" means 10. In this case, we will do 1-10 operations (5 average) to eliminate the 5 lookups done by the map.
So, on the average, we replace 5 operations with 5 operations (zero win, but more complicated code), and in the worst case, we add 10 operations,
tripling the amount of work done.
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)
std::tr1::unordered_map is hash map, so the lookup is O(1) operation.
Quote from: thomas on June 30, 2009, 12:23:17 PM
with something like:
if(token.IsEmpty() || (token == one_constant) || (token == two_constant) || (token == three_constant))
return token;
else
return the_map.find(token);
Hi, thomas, Thanks for the full explanation about your point.
But I think you still misunderstand my mind. :(
Look at the code in DoGetToken(), I have made a pseudo code below. We can add a bool variable
bool NeedReplacement;
if (c == '_' || wxIsalpha(c))
{
..........
NeedReplacement = true;
}
else if (wxIsdigit(CurrentChar()))
{
.........
NeedReplacement = false;
}
else if (CurrentChar() == '"' || CurrentChar() == '\'')
{
......
NeedReplacement = false;
}
else if (CurrentChar() == '(')
{
......
NeedReplacement = false;
}
else
{
.......
}
if(!NeedReplacement)
return token;
else
return the_map.find(token);
Because we have already know the information of the current token, we can still save quite a lot of time. :D
Quote from: oBFusCATed on June 30, 2009, 01:37:44 PM
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)
std::tr1::unordered_map is hash map, so the lookup is O(1) operation.
This is the great news.
If it is really true that hash map runs faster, I do suggest change to use std::tr1::unordered_map, thanks for the suggestion!!!
Quote from: oBFusCATed on June 30, 2009, 01:37:44 PM
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)
std::tr1::unordered_map is hash map, so the lookup is O(1) operation.
I personally have been using std::tr1::unordered_map for everything for ages (at least a year), I definitely think if it's available it should be used. Apparently GCC has a __GXX_EXPERIMENTAL_CXX0X__ macro. eg:
#ifdef __GXX_EXPERIMENTAL_CXX0X__
typedef std::tr1::unordered_map CBMap;
#else
typedef std::map CBMap;
#endif
CBMap<wxString, wxString> myMap;or something like that.
Kazade: unordered containers do not require the experimental c++-0x, they are in TR1