News:

When registered with our forums, feel free to send a "here I am" post here to differ human beings from SPAM bots.

Main Menu

New parser model for Code completion

Started by takeshimiya, December 08, 2005, 09:15:31 AM

Previous topic - Next topic

takeshimiya

Don't worry, this is what opensource is, you code on your free-time when you want, but if you don't have time, no problem.

If you're going to improve the code completion plugin it will be great, as I consider it's the part of C::B that most needs a lot of improvement. :)
But given that writting a complete C++ parser is difficult and can take years, I wonder if you did gave a look at the opensource ones I researched here.

Good luck! :: Suerte! :D

rickg22

#1
It's not really hard. Right now I'll only improve the parsing time by elliminating the token-adding overhead. That is, if Yiannis doesn't find out what's wrong first and beats me to a quick-hack :P

(Alright, alright, here's the answer, you could use a hash table on the tokens' names. There's your quick hack :P )

Regarding the parser... that's easy, too. It's just matter of designing a finite state machine that can be programmed in XML, so we can add custom languages. And no, i'm NOT being sarcastic! I actually had thought of this since i looked at codecompletion for the first time.

cyberkoa


takeshimiya

You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

A parser like the one that haves Visual Assist X:
-Parses all files in the code, and the ones outside the project too.
-Parses the current file without need to be saved.
-Can parse non valid code.
-The comments surrounding functions is keeped with functions (important).
-The parser running in a background thread.

It would be great if the C++ parser could be separated from the CodeCompletion code, so it could be used for other purposes (better syntax highlight, code analysing).
For example, having with the same style the keyword int, float, CMyClass, wxWindow, etc. without requiering user to specify user-keywords.
Or having brace match highlighting, but when you are inside of that block (not like now that only higlights if you're at the brace).

Basically I would want CodeCompletion plugin becoming Visual Assist X. It really makes a difference, no joke. Once you get used to it's features, it's very difficult to use anything else.

If you ask what features are essential to me, those are:

Hovering Tooltips
Enhanced Listboxes
Better Parameter Info
Enhanced Syntax Coloring
Local Symbols in Bold
Suggestion Lists
Shorthand
Repair Case
Convert Dot to ->
Context Field
Definition Field
Navigate Back and Forward

Hovering tooltips is almost the most important feature to me:

The point of this is that you don't need any documentation, you get straight the comments from any function. It's simply too adictive. :D


All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?

rickg22

Quote from: Takeshi Miya on December 08, 2005, 06:30:20 PM
You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

Oh no, i don't mean a full C++ parser... that would require Yacc / Bison / etc / eew. Actually, i'm only replicating (and generalizing) the C::B parser's functionality. But I'm gonna rewrite it using FSA's so we can extend it for other languages. And I'm gonna use my work-in-progress search tree, so we end up with something like this:

the current_token_id would be something like enum, like
token_if,token_then,token_class, etc. (I'll have two search trees: One for C++ keywords, like "class", "typedef", "public", "for","if", etc;, and another for identifiers "myvar","myclass" <- these are the ones added to the parser.)


if(is_keyword)
{
  appropriate_action = thisparser->lookup_action_for_keywords(current_state,current_token_id);
  next_state = thisparser->lookup_state_for_keywords(current_state,current_token_id);
}
else
{
  appropriate_action = thisparser->takeaction_for_identifiers(current_token_id);
  next_state = thisparser->lookup_state_for_identifiers(current_state,current_token_id);
}

switch(appropriate_action)
{
...
case id_add_function_declaration: add_function_declaration(thistoken->params) /* or something */

}

current_state = next_state;


See, a finite state machine changes state depending on the current state and keyword. (FSA's or FSM's as they're also called They're able to recognize regular expressions) Additionally, I'm adding an appropriate action which would modify the parser.

See:

http://www.cs.brown.edu/~jes/book/BOOK/node10.html

Quote
(insert lotsa suggestions here)

All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?
I don't know, I haven't thought of the details. But the theory is there :)

rickg22

Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.

http://en.wikipedia.org/wiki/Pushdown_automaton


Michael

Quote from: rickg22 on December 08, 2005, 07:27:30 PM
Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.
http://en.wikipedia.org/wiki/Pushdown_automaton
Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

Michael
[url="http://img207.imageshack.us/img207/9728/411948picture4em.png"]http://img207.imageshack.us/img207/9728/411948picture4em.png[/url]

takeshimiya

#7
Great :)

How fits in that parser: attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments?

ie.
// Save editor contents. Returns true on success, false otherwise.
virtual bool Save() { return true; }

wxString desc; // title of the regex


So when you are, let's say in the Symbols window (or completion window, or a tooltip), you can see the identifier with the surrounding comments.
Then automagically you won't requiere references manual of any library anymore. You'll have the API documentation right there :)

Once attached the comments to their respective identifier, they could be parsed also (ie. doxygen, javadoc style of comments).

Urxae

Quote from: Michael on December 08, 2005, 07:43:58 PM
Quote from: rickg22 on December 08, 2005, 07:27:30 PM
http://en.wikipedia.org/wiki/Pushdown_automaton
Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed ;)...

Michael

Quote from: Urxae on December 08, 2005, 07:55:44 PM
That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed ;)...
Oops...you're right :oops:
[url="http://img207.imageshack.us/img207/9728/411948picture4em.png"]http://img207.imageshack.us/img207/9728/411948picture4em.png[/url]

rickg22

AH HAH! Here's what I wanted to show you! Take special look at the "State Transition table".

http://en.wikipedia.org/wiki/Event_driven_finite_state_machine

This is the kind of automaton i'll be building. Fits like a shoe to the current C::B parser model.

takeshimiya

Yeah, lot's of computer science theory :)

But how fits in the current parser, attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments..?

I just hope it wouldn't become very hard to make more the parser more generalized (currently everything's hard coded).
And I hope by generalizing it, you don't reinvent the well, there are lot's of parser generators out there.


rickg22

Wow, thanks! That'll save me a lot of time! :)

rickg22

#14
Some help, guys!

While I'm busy making my SearchTree class, i'd appreciate if someone of you can examine the tokenizer and parser classes (perhaps even parserthread?) and make a state transition (and actions?) table, like which token goes to which state... specifically, Tokenizer::DoGetToken(), Tokenizer::SkipUnwanted(), etc.

Generally, each Skip(char n) function is a state whose only exit transition is the one that breaks out of the loop.