News:

Accounts with zero posts and zero activity during the last months will be deleted periodically to fight SPAM!

Main Menu

EOL issue, the EOL should be automatically detected, and make consistent

Started by ollydbg, August 24, 2012, 08:07:23 AM

Previous topic - Next topic

ollydbg

I just want to have a feature request:
When I open a LF file, when I hit enters, it should add many lines of LF.
When I open a CRLF file, when I hit enters, it should add many lines of CRLF.
But currently, I see that I can't do that, the EOL is fixed, and either I result a mixed EOF file, or I need to set the EOL in the editor when I open each projects.

This is really annoying.

I have see such report 6 years ago, see: Feature Request: Auto EOL Mode, but I don't think it was implemented.

BTW: I see that Notepad++ have such feature. Will C::B finally implement it?

EDIT: I also find on in the feature request:
Leave EOL format be

EDIT: I just tested the SciTE 3.2.1, and it works as what I want "leave as it is".
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

ollydbg

I dig the code a little, and see that the EOL is set in the editor before the file is loaded.

sdk\cbeditor.cpp


// static
void cbEditor::InternalSetEditorStyleBeforeFileOpen(cbStyledTextCtrl* control)
{
   if (!control)
       return;

   ......
   ......


   // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
   control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
   control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
   control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
   control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
   if (mgr->ReadBool(_T("/selection/use_vspace"), false))
       control->SetVirtualSpaceOptions(wxSCI_SCVS_RECTANGULARSELECTION | wxSCI_SCVS_USERACCESSIBLE);
   else
       control->SetVirtualSpaceOptions(wxSCI_SCVS_NONE);
}

Here, it just read the "eol/eolmode" from the configure file.

So, my question is: If we have add an option like "leave the eol as it is", then we need to Set the EOL after the file is loaded, right?

If I remove the line "control->SetEOLMode....", I think the control should have a eol which is consistent to the opened file?
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

ollydbg

Hi, guys, I dig it further, and I found that the SciTE use this code:

In its source: C:\scite321\scite\src\SciTEIO.cxx


void SciTEBase::DiscoverEOLSetting() {
SetEol();
if (props.GetInt("eol.auto")) {
int linesCR;
int linesLF;
int linesCRLF;
CountLineEnds(linesCR, linesLF, linesCRLF);
if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_LF);
else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_CR);
else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_CRLF);
}
}


So, it change the EOL dynamically.

BTW: currently, by default, under Windows, it use CRLF, see:
[scintilla] default eol mode on OS X - Google Groups
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

thomas

I'm confused (though I think you want the correct thing, but brain is not working well during the last weeks, so I'm not sure).

To clarify, with "add many LF" do you mean "change editor's EOL mode to file's EOL mode" or do you mean "change all line endings in the file"? You basically want anything typed into a file to be whatever the file was before, correct?

Also, what if the file, like many files from OSS projects is mixed already?
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

ollydbg

Quote from: thomas on August 24, 2012, 03:29:45 PM
I'm confused (though I think you want the correct thing, but brain is not working well during the last weeks, so I'm not sure).
Hi, thomas, sorry about the confusion, I'm not a English native speaker, so my English may be poor.

Quote
To clarify, with "add many LF" do you mean "change editor's EOL mode to file's EOL mode" or do you mean "change all line endings in the file"?
I say "add many LF" which means when I hit the enter key many times, the editor add many lines which have LF styled EOL.

Here is an example, when I open a file, which has LF styled EOL, by default, if I have the configuration with "CR LF" in the editor setting, the new added line in the editor will have "CR LF" ending, so, the file becomes "LF" and "CR LF" mixed.

See the image below: (The new added empty line have CRLF styled EOL, but the original file have all LF styled EOL)


On the other side, if you set the editor as "LF" format, then you will have the same issue that an original CRLF styled file will contains many LF styled lines.
QuoteYou basically want anything typed into a file to be whatever the file was before, correct?
Yes, What I want is to let the editor automatically detect which EOL is used in the original file, then the new added line should still use same type of EOL.

Currently, both SciTE and NotePad++ has this feature, so I believe C::B should have this feature too.
QuoteAlso, what if the file, like many files from OSS projects is mixed already?
For how to detect a files EOL, I think it is simple, just scan the first lines of the file, and set the EOL of the whole file as it is, I think SciTe use this kind of detection.

OK about my brain?  :)

EDIT:
It looks like to determine which EOL is used for a mixed EOL files, SciTe use a voting

void SciTEBase::CountLineEnds(int &linesCR, int &linesLF, int &linesCRLF) {
linesCR = 0;
linesLF = 0;
linesCRLF = 0;
int lengthDoc = LengthDocument();
char chPrev = ' ';
TextReader acc(wEditor);
char chNext = acc.SafeGetCharAt(0);
for (int i = 0; i < lengthDoc; i++) {
char ch = chNext;
chNext = acc.SafeGetCharAt(i + 1);
if (ch == '\r') {
if (chNext == '\n')
linesCRLF++;
else
linesCR++;
} else if (ch == '\n') {
if (chPrev != '\r') {
linesLF++;
}
} else if (i > 1000000) {
return;
}
chPrev = ch;
}
}



If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

dmoore

Looks like you are on the right track. Why don't you make a patch? For mixed files, voting makes sense.

Do we have room on the status bar for the EOL mode? (Or is it there already) Some sort of warning, especially for mixed sources would be nice (though not everyone likes warnings and they shouldn't be intrusive).
Python plugins: [url="https://github.com/spillz/codeblocks-python"]https://github.com/spillz/codeblocks-python[/url]
Code::Blocks Daily Builds -- Ubuntu PPA: [url="https://launchpad.net/~damien-moore/+archive/codeblocks"]https://launchpad.net/~damien-moore/+archive/codeblocks[/url]

thomas

Quote from: ollydbg on August 24, 2012, 04:13:31 PMOK about my brain?  :)
Yours is hopefully ok, it's mine that isn't good.

The approach sounds very good, though I would like to also reserve a way to always have a particular encoding and line ending (for example I'm using UTF-8 and LF everywhere, for everything, even under Windows -- don't want to have the editor interfere with this in any way, even hypothetically).

Maybe these options would be a good solution:

  • use system default
  • use XYZ line ending
  • use file's line ending (and fall back to system default)

To detect the encoding, I'd just scan over all lines (does not really take that much longer!) and take whichever encoding occurs more often. On a tie, fall back to system default.
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

ollydbg

Thank you thomas and dmoore for the constructive suggestion.

OK, I will take the job to implement this, but before that, I need to firstly comprehend what the current way do. (Now, I even do not understand what does "Ensure consistent EOLs" used for, so it will take some time to read the source)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

dmoore

Quote from: ollydbg on August 24, 2012, 05:41:10 PM
(Now, I even do not understand what does "Ensure consistent EOLs" used for, so it will take some time to read the source)

Converts all EOLs in the doc to whatever is the currently set EOL mode of that editor when you save.

Note that I put in some features related to EOLs in the editortweaks plugin so users could temporarly change from the default manually, per editor. Would be nice not to break that, but I suspect what you are doing will be fine. (If installed, those options are in Edit->Editor tweaks)
Python plugins: [url="https://github.com/spillz/codeblocks-python"]https://github.com/spillz/codeblocks-python[/url]
Code::Blocks Daily Builds -- Ubuntu PPA: [url="https://launchpad.net/~damien-moore/+archive/codeblocks"]https://launchpad.net/~damien-moore/+archive/codeblocks[/url]

ollydbg

Here is the draft patch, I add an item "AUTO" in the editor setting, this has the value "3".
The changed file are: src\sdk\cbeditor.cpp and src\sdk\resources\editor_configuration.xrc
Note, I have problems to generate the patch file, because wxsmith change the resource file a lot(strange, it convert the spaces to tabs, I manually change them back to spaces), so I add the xrc file. Both of the files were generated by tortoiseGit, which have LF format.
I added a static function, which I think is not a good idea, right?
Comments are welcome. Thanks.

EDIT: I remove the old attachment, and the patch against the latest svn revision is below:

Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,39 @@

END_EVENT_TABLE()

+
+// Count the EOL in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    //int GetCharAt(int pos)
+    //int GetLength() const;
+    int lengthDoc = control->GetLength();
+    char chPrev = ' ';
+
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++) {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r') {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        } else if (ch == '\n') {
+            if (chPrev != '\r') {
+                linesLF++;
+            }
+        } else if (i > 1000000) {
+            return;
+        }
+        chPrev = ch;
+    }
+}
+
// class constructor
cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
    : EditorBase(parent, filename),
@@ -1557,7 +1590,6 @@
        control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);

    // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
    control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
    control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
    control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1607,23 @@

    ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));

+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+    if (eolMode == 3) //auto detect the eol
+    {
+        int linesCR;
+        int linesLF;
+        int linesCRLF;
+        CountLineEnds(control, linesCR, linesLF, linesCRLF);
+        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
+            eolMode = wxSCI_EOL_LF;
+        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
+            eolMode = wxSCI_EOL_CR;
+        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
+            eolMode = wxSCI_EOL_CRLF;
+    }
+    control->SetEOLMode(eolMode);
+
    // Interpret #if/#else/#endif to grey out code that is not active
    control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));

Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                <item>CR LF</item>
                                                <item>CR</item>
                                                <item>LF</item>
+                                                <item>AUTO</item>
                                              </content>
                                              <style>wxCB_READONLY</style>
                                            </object>



If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

ollydbg

Damn, I found is that my local git clone was merged around 2012-08-10 (it is the code base my git patch was against on), and after that day( in 2012-08-14) there is a new svn commit 8230 in the xrc file, see:
WebSVN - codeblocks - Path Comparison - /trunk/src/sdk/resources/editor_configuration.xrc Rev 8142 and /trunk/src/sdk/resources/editor_configuration.xrc Rev 8230

So, I think lots of the "extra change"(I believed the bug of wxsmith) in my attachment file in the previous post did the same thing as the commit 8230.

Sorry about the noise. :-[
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

ironhead

I've manually applied the patch against r.8250 (some of the offsets were wrong) and it's working really well!  Thank you for implementing this feature, I hope to see it applied to the official repository. ;)

dmoore

I don't have time to test right now, but the code looks good. I have one small suggestion to use the info window to alert the user to the outcome of the voting logic if there are mixed line endings. In your logic that sets the line ending, you could modify it as follows:


       if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
       {
           eolMode = wxSCI_EOL_LF;
           if((linesCR>0) || (linesCRLF>0))
           {
               wxBell();
               InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"LF\""), 1000);
           }
       }
       else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
       {
           eolMode = wxSCI_EOL_CR;
           if((linesLF>0) || (linesCRLF>0))
           {
               wxBell();
               InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR\""), 1000);
           }
       }
       else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
       {
           eolMode = wxSCI_EOL_CRLF;
           if((linesCR>0) || (linesLF>0))
           {
               wxBell();
               InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR-LF\""), 1000);
           }
       }
Python plugins: [url="https://github.com/spillz/codeblocks-python"]https://github.com/spillz/codeblocks-python[/url]
Code::Blocks Daily Builds -- Ubuntu PPA: [url="https://launchpad.net/~damien-moore/+archive/codeblocks"]https://launchpad.net/~damien-moore/+archive/codeblocks[/url]

ollydbg

Thanks ironhead for the test.
Thanks dmoore for the suggestion, below is the refined patch. (combined your suggestion, code style and comments refined)

Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,42 @@

END_EVENT_TABLE()

+// Count lines of EOL style in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    int lengthDoc = control->GetLength();
+    const int maxLineNumber = 1000000;
+    char chPrev = ' ';
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++)
+    {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r')
+        {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        }
+        else if (ch == '\n')
+        {
+            if (chPrev != '\r')
+            {
+                linesLF++;
+            }
+        }
+        else if (i > maxLineNumber)     // stop the loop if the file contains too many lines
+            return;
+
+        chPrev = ch;
+    }
+}
+
// class constructor
cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
     : EditorBase(parent, filename),
@@ -1557,7 +1593,6 @@
         control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);

     // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
     control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
     control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
     control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1610,53 @@

     ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));

+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+
+    // The code snippet of auto detect EOL is copied from SciTE's source, CountLineEnds() function
+    // scans the current source file, and counts lines of each EOL style, then finally sets the EOL
+    // by voting logic. In the case of mixed EOL files, we give the user a beep and InfoWindow notification.
+    if (eolMode == 3) //auto detect the EOL
+    {
+        int linesCR;
+        int linesLF;
+        int linesCRLF;
+
+        // count lines of each EOL style
+        CountLineEnds(control, linesCR, linesLF, linesCRLF);
+
+        //voting logic
+        unsigned int delay = 2000;
+        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_LF;
+            if((linesCR>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"LF\""), delay);
+            }
+        }
+        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_CR;
+            if((linesLF>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR\""), delay);
+            }
+        }
+        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
+        {
+            eolMode = wxSCI_EOL_CRLF;
+            if((linesCR>0) || (linesLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR-LF\""), delay);
+            }
+        }
+    }
+    control->SetEOLMode(eolMode);
+
     // Interpret #if/#else/#endif to grey out code that is not active
     control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));

Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                 <item>CR LF</item>
                                                 <item>CR</item>
                                                 <item>LF</item>
+                                                <item>AUTO</item>
                                               </content>
                                               <style>wxCB_READONLY</style>
                                             </object>



I would like to show the filename of the control in the InfoWindow::Display(), I see cbEditor(editorbase) have a member function:

virtual const wxString& GetShortName() const { return m_Shortname; }

What a pity is that
// static
void cbEditor::InternalSetEditorStyleAfterFileOpen(cbStyledTextCtrl* control)

This is a static member function, so no access to the filename.  :(
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

MortenMacFly

... you shouldn't not forget to do nothing (!) if the file dos not contain a line-feed, btw. For this to work, you can return true/false in the method you've introduced.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: [url="https://www.codeblocks.org/docs/main_codeblocks_en.html"]https://www.codeblocks.org/docs/main_codeblocks_en.html[/url]
C::B FAQ: [url="https://wiki.codeblocks.org/index.php?title=FAQ"]https://wiki.codeblocks.org/index.php?title=FAQ[/url]