I have some code improvement about the preprocessor in our legacy Code completion plugin, see
https://github.com/asmwarrior/codeblocks_sfmirror/tree/master
What I want is that I would like to use "id compare" instead of "string compare" for the high level parser.
Comments are welcome, thanks.
I have been looking at this commit (https://github.com/asmwarrior/codeblocks_sfmirror/commit/9dc5777ddf05aeb599c72dc8bbfd8d03e14b68f0) (the exact location of the changes was not specified).
PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.
I just suggest changing
if (m_PPTokenStream.size() > 0)
to this
if (!m_PPTokenStream.empty())
and removing this part
else
;// peekToken.Clear();
Quote from: Miguel Gimenez on September 17, 2024, 05:58:45 PM
I have been looking at this commit (https://github.com/asmwarrior/codeblocks_sfmirror/commit/9dc5777ddf05aeb599c72dc8bbfd8d03e14b68f0) (the exact location of the changes was not specified).
PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.
Thanks for the comment.
The "deque" here is used to move the token cursor forward or backward, because we have some interface to "peek token" (look ahead) or "undo token"(move the cursor backward), so I think a "deque" is a good structure to use.
Quote
I just suggest changing
if (m_PPTokenStream.size() > 0)
to this
if (!m_PPTokenStream.empty())
Thanks, but what's the difference? Maybe the "empty()" function runs much faster?
Quote
and removing this part
else
;// peekToken.Clear();
I will read this part of the code later.
Thanks.
I very briefly glanced across the commit pointed to by the link.
findings:
- class PToken: data-member m_Kind is left uninitialized when PToken is default initialized and when initialized via the ctor with 4 args (while having 5 data-member).
The call to the latter one is also a little errorprone. It might easily be used incorrectly because it has 3 int args. I'd consider member-initializer for the 4 integral data
member and delete the default-ctor. - The cctor of PToken is unecessary and breaks the rule-of-0 without need. It might also copy m_Kind from
an uninitialized data-member. That is undefined behaviour. IMHO the cctor should be removed. - The compound statement after if (IsEOF()) is repeated. It sets 2 data-members of PToken and should be delegated to PToken.
Cheers
Quote from: blauzahn on September 18, 2024, 09:08:10 AM
I very briefly glanced across the commit pointed to by the link.
findings:
Thanks for the comment.
Quoteclass PToken: data-member m_Kind is left uninitialized when PToken is default initialized and when initialized via the ctor with 4 args (while having 5 data-member).
The call to the latter one is also a little errorprone. It might easily be used incorrectly because it has 3 int args. I'd consider member-initializer for the 4 integral data
member and delete the default-ctor.
Oh, yes, I should initialize the m_Kind member variable in the default constructor and other constructors.
About the argument: "PPToken(wxString lexeme, int charIndex, int lineIndex, int nestLevel)", I really don't know where does the "charIndex" come from, I will looked into it.
Quote
The cctor of PToken is unecessary and breaks the rule-of-0 without need. It might also copy m_Kind from
an uninitialized data-member. That is undefined behaviour. IMHO the cctor should be removed.
My initial though is that PPToken's copy constructor is used because I think it need to construct the element in the deque, in some cases, the PPToken get copied to the deque. Am I wrong?
Oh, you are correct,
Quote
In C++, the "Rule of Zero" is a guideline that suggests avoiding writing custom constructors, destructors, or copy/move assignment operators if the default compiler-generated versions will suffice. The rule states that if a class does not need custom resource management (like dynamic memory allocation), it can rely on the compiler-generated special member functions.
So, the copy constructor is not needed here, because the compiler will generate the same one if I remove it.
Quote
The compound statement after if (IsEOF()) is repeated. It sets 2 data-members of PToken and should be delegated to PToken.
Do you mean that the
/** Check whether the Tokenizer reaches the end of the buffer (file) */
bool IsEOF() const
{
return m_TokenIndex >= m_BufferLen;
}should be removed from the high level parser, but we can return a PPToken which has m_Kind field set as "EOF"?
Thanks.
No, I mean this:
if (IsEOF())
{
m_Lex = wxEmptyString;
m_Lex.m_Lexeme = wxEmptyString;
m_Lex.m_Kind = PPTokenKind::EndOfFile;
I haven't looked into the context but setting several data-member usually is none of the caller's business. In addition to that, the data-member m_Lexeme was unnecessarily set twice, once through the implicit operator, once directly. Although I mostly avoid setter, a primitive one here may be like:
class PToken
{
// ...
void setEof()
{
m_Lexeme = wxEmptyString;
m_Kind = PPTokenKind::EndOfFile;
}
// ...
};
if (IsEOF())
{
m_Lex.setEof();
return false;
}
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.
btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.
Quote from: blauzahn on September 18, 2024, 04:47:04 PM
No, I mean this:
...
...
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.
Thanks, I understand your idea now.
Quote from: blauzahn on September 18, 2024, 04:49:27 PM
btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.
I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
All I know is a tool like: ssbssa/heob: Detects buffer overruns and memory leaks. (https://github.com/ssbssa/heob)
But it is also hard to read its log output, because the log is always long.
QuoteI think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:
https://github.com/msys2/MSYS2-packages/discussions/3020 (https://github.com/msys2/MSYS2-packages/discussions/3020)
Quote from: blauzahn on September 19, 2024, 09:45:40 AM
QuoteI think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:
https://github.com/msys2/MSYS2-packages/discussions/3020 (https://github.com/msys2/MSYS2-packages/discussions/3020)
Thanks, but when reading that discussion, I think that feature is not implemented yet, at least for mingw64/gcc platform in msys2. :(
FYI:
I have add the fix commits in the branch: https://github.com/asmwarrior/codeblocks_sfmirror/tree/master
And the github action build of my master branch(windows 64 version) is now on: main64 (https://github.com/asmwarrior/x86-codeblocks-builds/actions/runs/10955598063)
Happy coding. ;)
I found a more detailed answer about how address sanitizer like tools work under Windows, but sadly the mingw64/gcc is not included.
See here:
Compilers that support sanitizers (address, UB etc.) on Windows (https://stackoverflow.com/questions/55480333/clang-8-with-mingw-w64-how-do-i-use-address-ub-sanitizers)