Check macro usage on every identifier like token: issues and discussions

ollydbg · November 10, 2014, 07:41:56 AM

We have already enabling some kind of macro usage check in our parser(see rev9932), but that's not fully.
My idea was try to enable the macro checking on each identifier like tokens, I did some tests, and it won't slow down much of the parsing speed, the result is quite good.
But we still have some issues to solve
1, we have different build target, and different build target have different C Preprocessor definitions, if we still parse the whole C::B project, we can get bad result.
E.g. In Codeblocks.cbp, we have one macro definition in one target

Code Select

#define type(obj) ((obj)._type)
or even bad

Code Select

#define type xxx
And we have some other target which can have some code like:

Code Select


        GdbCmd_SetCatch(DebuggerDriver *driver, const wxString &type, int *resultIndex)

In this case, the "type" was replaced to un-wanted tokens.

2, we have some code like:

Code Select


// Assumes the buffer bitmap only covers the client area;
// does not prepare the window DC
#define wxBUFFER_CLIENT_AREA        0x02

class WXDLLEXPORT wxBufferedDC : public wxMemoryDC
{
public:
    // Default ctor, must subsequently call Init for two stage construction.
    wxBufferedDC()
        : m_dc(NULL),
          m_buffer(NULL),
          m_style(0)
    {
    }

    // Construct a wxBufferedDC using a user supplied buffer.
    wxBufferedDC(wxDC *dc,
                 wxBitmap& buffer = wxNullBitmap,
                 int style = wxBUFFER_CLIENT_AREA)
        : m_dc(NULL), m_buffer(NULL)
    {
        Init(dc, buffer, style);
    }

When parsing the function arguments(parameters), if macro expansion is enabled on every tokens, we will get

Code Select

(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = 0x02)
instead of

Code Select

(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = wxBUFFER_CLIENT_ARE)
What would you like to see? The former or the later?

3, In the further, I would like to see that all the macro handling was put in the Tokenizer class. (Currently, we have macro handling both in Parserthread and Tokenizer class).

4, Too speed up the macro definition search, I think a hash table (e.g. wxWidgets: wxHashMap Class Reference) is preferred. (Currently, the macro definition is stored in the TokenTree, which is actually a Trie - Wikipedia, the free encyclopedia)

EDIT: Even in the same build target, we can still have different C preprocessor definitions for different translation file, so this seems not easy to solve. One method is that we try to maintain macro definition for each file, or create a include file hierarchy.

EDIT2:
Patches (git patches serial) attached, comments were welcome.

EDIT3:
About issue 1, I think we have discussed here, see Huki's comments Re: Several improvements to Code Completion plugin and Re: Several improvements to Code Completion plugin

Huki · April 09, 2015, 07:11:18 PM

Quote from: ollydbg on April 06, 2015, 02:51:59 AM
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome

I've attached some bug fix patches against your last cc_macro serials. Some description:

Fixed = [] , handling when parsing variable declaration. i.e.,

Code (assignment parsing) Select

int var1 = 0; // this var has an assignment
int var2 = 0; // mouse on var2 -> we get "intint var2"
int var3 = 0; // mouse on var3 -> we get "intintint var3"

Code (comma parsing) Select

int num=0, num2=0, num3=0;
// mouse on each variable and we get:
// num -> "int num"
// num2 -> "int, num2"
// num3 -> "int,, num3"

Code (arrays parsing) Select

int array[5]; // vars with [] are not parsed

The rest of the patches should already be clear.

ollydbg · April 11, 2015, 04:44:20 PM

Quote from: Huki on April 09, 2015, 07:11:18 PM
Quote from: ollydbg on April 06, 2015, 02:51:59 AM
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome
I've attached some bug fix patches against your last cc_macro serials. Some description:
...

Very nice work, Huki, thanks. I have them applied in my local copy, it fixes some errors of my patch serials.
BTW: I see a trailing spaces in a patch when I run the "git am" command

Code Select


Applying: CC: fix variable parsing with "=" or "[]"
f:/cb_sf_git/trunk/.git/rebase-apply/patch:52: trailing whitespace.
                         || peek==ParserConsts::oparray_chr
warning: 1 line adds whitespace errors.

Do you have the "remove trailing space" option enabled in C::B?

Any suggestion is that the commit message should have a "*" if it is a major function change or "-" if it is a minor change.
So, it should be:

Code Select


* CC: xxxxxxx

detailed descriptions

You can wrote as many detailed text as possible in the commit log messages, E.g. see my commit r10157 as an example. I just follow Linux's commit message style, which means a simple line as a subject, and after an empty line, write details, like this: Git Commit Messages : 50/72 Formatting.

EDIT: I have fixed all the mentioned issue above in my local copy, so you don't need to post the patches again.

MortenMacFly · April 11, 2015, 10:13:48 PM

Well guys it would be nice to follow but the patches don't apply in svn. Any chance you make svn compatible patches or a svn branch or work together on a git repo? The repo that I was referring to is terribly old... Or am I missing something?

ollydbg · April 14, 2015, 06:25:23 AM

Quote from: MortenMacFly on April 11, 2015, 10:13:48 PM
Well guys it would be nice to follow but the patches don't apply in svn. Any chance you make svn compatible patches or a svn branch or work together on a git repo? The repo that I was referring to is terribly old... Or am I missing something?

Hi, Morten, I attach the svn style patch which is created from this script.

MortenMacFly · April 14, 2015, 08:22:39 AM

Quote from: ollydbg on April 14, 2015, 06:25:23 AM
Hi, Morten, I attach the svn style patch which is created from this script.

That is nice, but why don't you (two) just work on a GIT repo? I could check this out via SVN and create patches myself. However, it should need to stay in sync with trunk though - so a bit more work is needed. On the pro side, Huki can commit directly and you can sync yourself with him easily.

It thought thats what the branch in the "OBF repo" was for...?! ???

ollydbg · June 19, 2015, 04:23:23 PM

Hi, Morten, this is a reply after two months... How time flies.

I think not much people are testing my patch serials. I'm considering what is the time to let those patches into our SVN repo. I don't have a clean plan, maybe, we can let them in, and we may have more testers through nightly build versions.

What's your or other guy's idea?

oBFusCATed · June 19, 2015, 08:30:18 PM

If you add some tests to the cctest project I am totally fine if you push these in trunk.
Then you can fix bugs and add more tests. So you won't re-introduce old problems by fixing the new ones.

If you're adding tests it should be relatively safe. Also you should be around to be able to fix if something comes up.

ollydbg · June 20, 2015, 08:13:24 AM

Quote from: oBFusCATed on June 19, 2015, 08:30:18 PM
If you add some tests to the cctest project I am totally fine if you push these in trunk.
Then you can fix bugs and add more tests. So you won't re-introduce old problems by fixing the new ones.

If you're adding tests it should be relatively safe. Also you should be around to be able to fix if something comes up.

OK, thanks for the suggestion, I will take two steps.
1, add more tests these days.
2, commit my patch serials only if they don't cause any regression.

ollydbg · June 20, 2015, 08:40:54 AM

Currently, I see only one regression (as of revision 10341, before my patch and after applying my patch) is here:

Code Select


--- C:/before.txt	Sat Jun 20 14:14:40 2015
+++ C:/after.txt	Sat Jun 20 14:09:01 2015
@@ -56,14 +56,14 @@
 Total 9 tests, 9 PASS, 0 FAIL
 --------------------------------------------------------
 ********************************************************
   Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr.cpp
 ********************************************************
--PASS: Fun  FuncArray
+*FAIL: Fun  FuncArray
 -PASS: foo  foo
 --------------------------------------------------------
-Total 2 tests, 2 PASS, 0 FAIL
+Total 2 tests, 1 PASS, 1 FAIL
 --------------------------------------------------------
 ********************************************************
   Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr_com_interface.cpp
 ********************************************************
 -PASS: factory->  QueryInterface

Simplify the test case, I get such test file: ccc_bug.cpp

Code Select



int (*FuncArray[10][20]) ();

//Fun  //FuncArray

I will look into it.

ollydbg · June 20, 2015, 09:13:29 AM

OK, I found the cause of this regression, I get the parenthesis is L"(* FuncArray [ 10 ] [ 20 ])", and if without my patch, it don't have a space after the "FuncArray", this is because in my patch serials, I have change the function of reading the parenthesis. (The old code is hard to understand for me, and using raw wxChar pointers).

Then in the function:

Code Select


        // pattern: m_Str AAA (*BBB) (...);
        if (peek == ParserConsts::semicolon && pos != wxNOT_FOUND)
        {
            name.RemoveLast();  // remove ")"
            name.Remove(0, pos+1).Trim(false); // remove "(* "

            // pattern: m_Str AAA (*BBB[X][Y]) (...);
            pos = name.find(ParserConsts::oparray_chr);
            if (pos != wxNOT_FOUND)
                name.Remove(pos);

            TRACE(_T("HandleFunction() : Add token name='")+name+_T("', args='")+args+_T("', return type='") + m_Str+ _T("'"));
            Token* newToken =  DoAddToken(tkFunction, name, lineNr, 0, 0, args);

We pass the name as "FuncArray " (note a space after "FuncArray".

Now, I think I need to return a more compact form like "(*FuncArray[10][20])".

EDIT: the old code for reading parenthesis dose not consider the macro expansion in its content, it just find the "(" and ")" pair in the raw wxString, while in my patch serials, I do macro expansion on every token if possible.

ollydbg · June 20, 2015, 11:28:25 AM

Now, here comes the function I need to implement, it looks like "space" are added only in some very special cases.

Code Select


void Tokenizer::ReadParentheses(wxString& str)
{
    // we create a local buffer here, so the data is copied from m_Buffer to buffer
    // once the local buffer is full, we append the content in local buffer to the result str
    static const size_t maxBufferLen = 4093;
    // local buffer
    wxChar buffer[maxBufferLen + 3];
    buffer[0] = _T('$'); // avoid segfault error, because we have some *(p - 1) = x in the code
    wxChar* realBuffer = buffer + 1;
    // p points to the local buffer
    wxChar* p = realBuffer;

    int level = 0; // brace level of '(' and ')'

    while (NotEOF())
    {
        while (SkipComment())
            ;
        wxChar ch = CurrentChar();

        while (ch == _T('#')) // do not use if
        {
            const PreprocessorType type = GetPreprocessorType();
            if (type == ptOthers)
                break;
            HandleConditionPreprocessor(type);
            ch = CurrentChar();
        }

        const unsigned int startIndex = m_TokenIndex;

        switch(ch)
        {
        case _T('('):
            {
                ++level;
                *p = ch;
                ++p;
            }
            break;

        case _T(')'):
            {
                if (*(p - 1) <= _T(' '))
                    --p; // if previous char is a space, we overwrite the space with a ')', so we save one char
                --level;
                *p = ch;
                ++p;
            }
            break;

        case _T('\''):
        case _T('"'):
            {
                MoveToNextChar();     // skip the leading '"' or '\''
                SkipToStringEnd(ch);  // m_TokenIndex point to the trailing '"' or '\'' now
                MoveToNextChar();     // move to the char after trailing char
                const size_t writeLen = m_TokenIndex - startIndex; // the string length
                const size_t usedLen = p - realBuffer;
                if (usedLen + writeLen > maxBufferLen) // there is not enough space left to write the string
                {
                    if (writeLen > maxBufferLen) // to big string?
                    {
                        TRACE(_T("ReadParentheses(): Catch exception 1: %lu"), static_cast<unsigned long>(writeLen));
                        return;
                    }

                    if (p != realBuffer) // add the used string to the result str
                    {
                        str.Append(realBuffer, usedLen);
                        p = realBuffer;  // adjust the p, as it was an empty buffer
                    }
                    // append the string to str (not write the string to the buffer)
                    str.Append((const wxChar*)m_Buffer + startIndex, writeLen);
                }
                else
                {   // the buffer can hold the string, so copy it to the buffer
                    memcpy(p, (const wxChar*)m_Buffer + startIndex, writeLen * sizeof(wxChar));
                    p += writeLen;
                }

                continue;
            }
            break;

        case _T(','): // if there is a space before the comma, we just remove it, but we should add a space after the comma
        case _T('*'):
        case _T('&'):
            {
                if (*(p - 1) <= _T(' '))
                    --p;

                *p = ch;
                *++p = _T(' ');
                ++p;
            }
            break;

        case _T('='):
            {
                if (*(p - 1) <= _T(' '))
                {
                    *p = _T('=');
                    // Don't add a space after '=' sign, in case another '=' follows it
                    // (see how the 'else' block below works).
                    //*++p = _T(' ');
                    ++p;
                }
                else // special handling of "==" "=!" "=>" and "=<"
                {
                    switch (*(p - 1))
                    {
                    case _T('='):
                    case _T('!'):
                    case _T('>'):
                    case _T('<'):
                        {
                            *p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;

                    default:
                        {
                            *p = _T(' ');
                            *++p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;
                    }
                }
            }
            break;

        case _T(' '): // we only add a space if the former char is not a space, also not a '('
        case _T('\r'): // the tab char are regard as space
        case _T('\t'):
            {
                if (*(p - 1) != _T(' ') && *(p - 1) != _T('('))
                {
                    *p = _T(' ');
                    ++p;
                }
            }
            break;

        case _T('\n'): // we need keep the \n for records paras correct position
            if (*(p - 1) == _T(' '))
                --p;
            if (*(p - 1) != _T('('))
            {
                *p = ch;
                ++p;
            }
            break;

        default: // any other case, we can directly copy the char
            {
                *p = ch;
                ++p;
            }
            break;
        }

        if (p >= realBuffer + maxBufferLen)
        {
            str.Append(realBuffer, p - realBuffer);
            p = realBuffer;
        }

        MoveToNextChar();

        if (level == 0)
            break;
    }//while (NotEOF())

    if (p > realBuffer)
        str.Append(realBuffer, p - realBuffer);
    TRACE(_T("ReadParentheses(): %s, line=%u"), str.wx_str(), m_LineNumber);
    if (str.Len() > 512)
    {   TRACE(_T("ReadParentheses(): Catch exception 2: %lu"), static_cast<unsigned long>(str.Len())); }
}

So, I need to follow this way.

ollydbg · June 20, 2015, 04:04:08 PM

OK, the only regression is fixed now.
I have attach the full patches (include git style patch serials, or a single svn patch)

ollydbg · July 01, 2015, 05:11:52 PM

One issue for macro replacement parsing is that if you have such code:

Code Select


#define uint8 unsigned char
void f(uint8 aaa);

Then the parser will see this function declaration:

Code Select

void f(unsigned char aaa);

This is because we try to expand every macro usage.

Does this cause some issue?

oBFusCATed · July 01, 2015, 09:41:14 PM

Would you this problem affect the autocompletion, calltip and symbol browser?

Code::Blocks Forums

News:

Check macro usage on every identifier like token: issues and discussions

ollydbg

Huki

ollydbg

MortenMacFly

ollydbg

MortenMacFly

ollydbg

oBFusCATed

ollydbg

ollydbg

ollydbg

ollydbg

ollydbg

ollydbg

oBFusCATed