New parser model for Code completion

takeshimiya · December 08, 2005, 09:15:31 AM

Don't worry, this is what opensource is, you code on your free-time when you want, but if you don't have time, no problem.

If you're going to improve the code completion plugin it will be great, as I consider it's the part of C::B that most needs a lot of improvement.

But given that writting a complete C++ parser is difficult and can take years, I wonder if you did gave a look at the opensource ones I researched here.

Good luck! :: Suerte!

rickg22 · December 08, 2005, 04:39:03 PM

It's not really hard. Right now I'll only improve the parsing time by elliminating the token-adding overhead. That is, if Yiannis doesn't find out what's wrong first and beats me to a quick-hack

(Alright, alright, here's the answer, you could use a hash table on the tokens' names. There's your quick hack

)

Regarding the parser... that's easy, too. It's just matter of designing a finite state machine that can be programmed in XML, so we can add custom languages. And no, i'm NOT being sarcastic! I actually had thought of this since i looked at codecompletion for the first time.

cyberkoa · December 08, 2005, 05:55:45 PM

remember to come back later, dun disappear

takeshimiya · December 08, 2005, 06:30:20 PM

You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

A parser like the one that haves Visual Assist X:
-Parses all files in the code, and the ones outside the project too.
-Parses the current file without need to be saved.
-Can parse non valid code.
-The comments surrounding functions is keeped with functions (important).
-The parser running in a background thread.

It would be great if the C++ parser could be separated from the CodeCompletion code, so it could be used for other purposes (better syntax highlight, code analysing).
For example, having with the same style the keyword int, float, CMyClass, wxWindow, etc. without requiering user to specify user-keywords.
Or having brace match highlighting, but when you are inside of that block (not like now that only higlights if you're at the brace).

Basically I would want CodeCompletion plugin becoming Visual Assist X. It really makes a difference, no joke. Once you get used to it's features, it's very difficult to use anything else.

If you ask what features are essential to me, those are:

Hovering Tooltips
Enhanced Listboxes
Better Parameter Info
Enhanced Syntax Coloring
Local Symbols in Bold
Suggestion Lists
Shorthand
Repair Case
Convert Dot to ->
Context Field
Definition Field
Navigate Back and Forward

Hovering tooltips is almost the most important feature to me:

The point of this is that you don't need any documentation, you get straight the comments from any function. It's simply too adictive.

All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?

rickg22 · December 08, 2005, 07:22:54 PM

Quote from: Takeshi Miya on December 08, 2005, 06:30:20 PM
You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

Oh no, i don't mean a full C++ parser... that would require Yacc / Bison / etc / eew. Actually, i'm only replicating (and generalizing) the C::B parser's functionality. But I'm gonna rewrite it using FSA's so we can extend it for other languages. And I'm gonna use my work-in-progress search tree, so we end up with something like this:

the current_token_id would be something like enum, like
token_if,token_then,token_class, etc. (I'll have two search trees: One for C++ keywords, like "class", "typedef", "public", "for","if", etc;, and another for identifiers "myvar","myclass" <- these are the ones added to the parser.)

Code Select


if(is_keyword)
{
  appropriate_action = thisparser->lookup_action_for_keywords(current_state,current_token_id);
  next_state = thisparser->lookup_state_for_keywords(current_state,current_token_id);
}
else
{
  appropriate_action = thisparser->takeaction_for_identifiers(current_token_id);
  next_state = thisparser->lookup_state_for_identifiers(current_state,current_token_id);
}

switch(appropriate_action)
{
 ...
 case id_add_function_declaration: add_function_declaration(thistoken->params) /* or something */
 
}

current_state = next_state;

See, a finite state machine changes state depending on the current state and keyword. (FSA's or FSM's as they're also called They're able to recognize regular expressions) Additionally, I'm adding an appropriate action which would modify the parser.

See:

http://www.cs.brown.edu/~jes/book/BOOK/node10.html

Quote
(insert lotsa suggestions here)

All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?

I don't know, I haven't thought of the details. But the theory is there

rickg22 · December 08, 2005, 07:27:30 PM

Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.

http://en.wikipedia.org/wiki/Pushdown_automaton

Michael · December 08, 2005, 07:43:58 PM

Quote from: rickg22 on December 08, 2005, 07:27:30 PM
Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.
http://en.wikipedia.org/wiki/Pushdown_automaton

Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

Michael

takeshimiya · December 08, 2005, 07:54:39 PM

Great

How fits in that parser: attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments?

ie.
// Save editor contents. Returns true on success, false otherwise.
virtual bool Save() { return true; }

wxString desc; // title of the regex

So when you are, let's say in the Symbols window (or completion window, or a tooltip), you can see the identifier with the surrounding comments.
Then automagically you won't requiere references manual of any library anymore. You'll have the API documentation right there

Once attached the comments to their respective identifier, they could be parsed also (ie. doxygen, javadoc style of comments).

Urxae · December 08, 2005, 07:55:44 PM

Quote from: Michael on December 08, 2005, 07:43:58 PM
Quote from: rickg22 on December 08, 2005, 07:27:30 PM
http://en.wikipedia.org/wiki/Pushdown_automaton
Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed

...

Michael · December 08, 2005, 07:57:12 PM

Quote from: Urxae on December 08, 2005, 07:55:44 PM
That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed ...

Oops...you're right :oops:

rickg22 · December 08, 2005, 08:01:15 PM

AH HAH! Here's what I wanted to show you! Take special look at the "State Transition table".

http://en.wikipedia.org/wiki/Event_driven_finite_state_machine

This is the kind of automaton i'll be building. Fits like a shoe to the current C::B parser model.

takeshimiya · December 08, 2005, 08:26:46 PM

Yeah, lot's of computer science theory

But how fits in the current parser, attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments..?

I just hope it wouldn't become very hard to make more the parser more generalized (currently everything's hard coded).
And I hope by generalizing it, you don't reinvent the well, there are lot's of parser generators out there.

takeshimiya · December 08, 2005, 08:52:12 PM

lol :lol:
http://www.w3.org/TR/2005/WD-scxml-20050705/

rickg22 · December 08, 2005, 10:16:50 PM

Wow, thanks! That'll save me a lot of time!

rickg22 · December 09, 2005, 12:19:32 AM

Some help, guys!

While I'm busy making my SearchTree class, i'd appreciate if someone of you can examine the tokenizer and parser classes (perhaps even parserthread?) and make a state transition (and actions?) table, like which token goes to which state... specifically, Tokenizer::DoGetToken(), Tokenizer::SkipUnwanted(), etc.

Generally, each Skip(char n) function is a state whose only exit transition is the one that breaks out of the loop.

Code::Blocks Forums

News:

New parser model for Code completion

takeshimiya

rickg22

cyberkoa

takeshimiya

rickg22

rickg22

Michael

takeshimiya

Urxae

Michael

rickg22

takeshimiya

takeshimiya

rickg22

rickg22