hi everybody,
took a look into file globals.cpp and I prose some refactoring of three functions:
- general code clean up
- improved readability
- simplification of logic
- removed some duplication
- speed up measured to be about 14% due, I suppose, to more cache locality (all is relative: on my machine, with my test case,... )
regards
frithjof
GetVectorFromString should not call wxArrayString, because there is an additional copy!
because first I have a const wxArrayString and then return a std::vector ?
my thoughts were: in the original function I have two vector's, one for the values in the function and the other as a copy during the return.
Now I have one wxArrayString and because the function is so "short" the unnamed object which is returned gets the benefit of RVO. so there are at most the same number of containers, at best even one less. the original function cannot count on RVO imho.
but maybe I get something wrong here...
thx for he feedback
nah... I think you are right... there is an extra copy of the array... damm
so maybe make a template function out of it in order to not textually duplicate the code... what do you think?
never mind... i am too tired to think strait
A template is fine, but generally doesn't make much difference.
not for the size of executable, but to avoid the duplicate
factory recall !
found a strange error in my patch. sorry, haven't tested it enough, I think.
will post new patch these days
here the new version of the patch.
discovered, that assign() for wxArrayString does something else than it should. it appends the given values instead of replacing the contents of the wxArrayString...
if anyone is interested, here my measuring in msec for the functions returning an array/vector from a string:
item count - | current arry - | new array - | current std::vec - | new std::vec |
512 | 1 | 0 | 1 | 0 |
1.200 | 7 | 0 | 7 | 0 |
2.500 | 24 | 0 | 26 | 1 |
5.000 | 93 | 2 | 106 | 3 |
10.000 | 402 | 2 | 395 | 6 |
regards
reworked the next function: EscapeSpaces
speed up in msec:
words in text | | current version | | new version |
32 | 0 | 0 |
64 | 1 | 0 |
128 | 6 | 0 |
256 | 28 | 0 |
512 | 97 | 0 |
1024 | 388 | 0 |
2048 | 1570 | 0 |
4096 | 6264 | 1 |
removed the comment about speeding this function up.
some minor changes
regards
Have not tested, but it looks like GetStringFromArray() will have problems with an empty array.
Why are you doing 3 iterations on the array in GetArrayFromString?
Why are you doing additional copy of the whole array?
And why are you doing Mid(a, npos-something) (hint happens when the array doesn't have a separator at the end)?
- if (!ret.IsEmpty() && ret[0] != _T('"') && ret[0] != _T('\''))
+ if (str.IsEmpty() || str.StartsWith(_T("\"")) || str.StartsWith(_T("\'")))
Why is this change needed? Why have you replaced the operator[] with startswith?
@alpha
thx, you are right. I will add a test for empty array.
@obfuscated
doing three separate iterations on the array results in the code running much faster due to more cache locality. my measurements confirmed the speedup. the original version continuously jumped from one context to another. putting each action in a separate loop is just faster.
for instance, the original version repeated the test for the value of trimspaces in each iteration. now it is done only once.
same for the third iteration of removing empty entries.
the part of the extra whole copy I don't understand. do you mean in the return statement? if so, that would happen in the original version too. or is there another extra copy that slipped me? are you looking at the last version of the patch? the one before had the problem of the extra copy in the std:vector version of the function, but I already changed that.
and if there would be an extra copy, wouldn't the gained speed justify it?
as to your last point: hit and sunk. my functions have a problem with the case where there is no separator at the end. it just omits the part after the last separator. I will correct it, slipped me, thx for the find.
@obfuscated
the change to StartsWith() was merely visual. i tried both versions and the StartsWith() seemed to me to result in a cleaner image to the eye, making it easier and faster to grasp that there are three separate conditions...
as I said, only a personal impression... blame my eyesight ;)
please revise and test attached corrected patch...
Quote from: frithjofh on February 05, 2016, 12:51:20 PM
please revise and test attached corrected patch...
Sorry, but for me, this patch does not apply on SVN head, unfortunately.
i will correct it...
Quote from: frithjofh on February 05, 2016, 11:41:39 AM
doing three separate iterations on the array results in the code running much faster due to more cache locality.
The code runs faster because you've removed the calls to wxString::Remove.
This call is removing characters from the beginning of the string and copying the rest of the string to the front.
So the improvement is not because of better cache locality, but because there is less copying.
Quote from: frithjofh on February 05, 2016, 11:41:39 AM
for instance, the original version repeated the test for the value of trimspaces in each iteration. now it is done only once.
Probably the compiler has re-written the loop and has extracted the test outside of it.
I'm sure that if you remove the check from the original version it won't change its performance!
Quote from: frithjofh on February 05, 2016, 11:41:39 AM
same for the third iteration of removing empty entries.
Why do you need to add elements and then remove them?
Quote from: frithjofh on February 05, 2016, 11:41:39 AM
the part of the extra whole copy I don't understand. do you mean in the return statement?
The return statement is doing a redundant copy.
p.p. this post is made before I've looked at the latest patch.
here the corrected patch. search and destroy please... :)
@obfuscated:
QuoteThe code runs faster because you've removed the calls to wxString::Remove.
This call is removing characters from the beginning of the string and copying the rest of the string to the front.
So the improvement is not because of better cache locality, but because there is less copying.
agree about the removing of wxString::Remove. the most gain lies here in avoiding the copying and also in the avoiding of modification of the wxString in place. building the output piece by piece is faster. as to cache locality: I was just making an assumption, nothing more. I am no professional programmer, I did not measure cache misses.
QuoteProbably the compiler has re-written the loop and has extracted the test outside of it.
I'm sure that if you remove the check from the original version it won't change its performance!
I wouldn't know, as I did not measure this particular part of the code before and after. perhaps the compiler does this optimization on its own, perhaps not, I didn't look at the generated machine code. if there is a performance gain here, i think it will stand back behind the gain from avoiding the copying above anyway. by lengths. I find the new version easier to read though and easier to reason about.
QuoteWhy do you need to add elements and then remove them?
because as above, I think it makes it a lot easier to read and reason about. and strictly spoken, the elements are not removed from the container locale to the function, just copied around (and in some future only moved around). the return value is an unnamed object constructed directly from the resulting new range of valid entries.
QuoteThe return statement is doing a redundant copy.
correct, that is what some former version of the patch did. I took up the hint given and changed it accordingly.
@all:
thx for all the input. thx for not ignoring my efforts all together. apologies for my beginner mistakes.
please give me more bashing, so I will eventually learn and improve ... ;)
I've done some benchmarking and profiling.
orig - current version in codeblocks
forum - your version + a fix when find returns npos
mine - obviously this is my implementation
The results printed by google's benchmark lib:
> bin/release/wx_routine_bench
Success: 13 tests passed.
Test time: 0.00 seconds.
Run on (4 X 3599.88 MHz CPU s)
2016-02-06 17:15:06
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------
BM_orig/1 742 741 958904
BM_orig/8 7456 7457 93333
BM_orig/64 85563 85504 8140
BM_orig/512 2997799 3000000 233
BM_orig/4k 173072815 173000000 4
BM_orig/32k 28505832434 28504000000 1
BM_orig/64k 139696512699 139686000000 1
BM_forum/1 774 774 886076
BM_forum/8 5031 5029 140000
BM_forum/64 33001 32993 21277
BM_forum/512 256982 256831 2745
BM_forum/4k 2011854 2014368 348
BM_forum/32k 19614743 19611111 36
BM_forum/64k 43754816 43764706 17
BM_mine/1 246 246 2834008
BM_mine/8 1321 1320 530303
BM_mine/64 9476 9481 74468
BM_mine/512 69313 69295 10145
BM_mine/4k 537183 536715 1321
BM_mine/32k 6133614 6135135 111
BM_mine/64k 12912220 12907407 54
BM_mine_vector/1 217 217 3225806
BM_mine_vector/8 1447 1445 479452
BM_mine_vector/64 9060 9064 77778
BM_mine_vector/512 68974 68972 10294
BM_mine_vector/4k 539424 538986 1321
BM_mine_vector/32k 6113907 6115044 113
BM_mine_vector/64k 13649380 13666667 51
BM_no_trim_orig/1 383 383 1861702
BM_no_trim_orig/8 4165 4166 168269
BM_no_trim_orig/64 61783 61825 11290
BM_no_trim_orig/512 2833573 2829960 247
BM_no_trim_orig/4k 167948902 168000000 4
BM_no_trim_orig/32k 29410523891 29410000000 1
BM_no_trim_orig/64k 137438740492 137432000000 1
BM_no_trim_forum/1 357 357 1949861
BM_no_trim_forum/8 1751 1753 400000
BM_no_trim_forum/64 11392 11384 61404
BM_no_trim_forum/512 85726 85749 8140
BM_no_trim_forum/4k 676759 676385 1029
BM_no_trim_forum/32k 8598947 8597561 82
BM_no_trim_forum/64k 19704975 19714286 35
BM_no_trim_mine/1 250 250 2788845
BM_no_trim_mine/8 1527 1528 457516
BM_no_trim_mine/64 10476 10470 66667
BM_no_trim_mine/512 78441 78449 8974
BM_no_trim_mine/4k 606494 606684 1167
BM_no_trim_mine/32k 7209706 7218750 96
BM_no_trim_mine/64k 15034847 15021739 46
BM_no_trim_mine_vector/1 224 224 3139013
BM_no_trim_mine_vector/8 1649 1649 419162
BM_no_trim_mine_vector/64 9681 9672 72165
BM_no_trim_mine_vector/512 74088 74147 9589
BM_no_trim_mine_vector/4k 585973 585750 1207
BM_no_trim_mine_vector/32k 7304332 7305263 95
BM_no_trim_mine_vector/64k 15878234 15883721 43
See the attached file for the source code.
Notes:
1. There are cases where the original version is faster than the forum version.
2. Mine version is always faster.
3. When doing trimming it is 3x or so faster. :)
4. Obviously faster code is uglier :)
wow, you really took this to heart...
well, yes, you are right, there is one case, where the original version is faster... slipped me... ;)
and yes, your version is faster, in every regard. it is nice to see what a real programmer can do. a little bit of competitive spirit always works wonders... three iterations over the vector == three times slower than only one :o
I tried to think about the sense of my changes to this particular function when benchmarking. I wondered, if testing the function for 5.000 entries would really make sense, or even be realistic. perhaps the interesting range is from 64 to 4.000 entries, at least if the string that gets tokenized is some normal file. but I don't really know.
I like your version. post a patch. it is a bit a pain in the neck to read, and it replaces 2 functions with 6, but the speed gain is great and those helper functions surely could find use in other functions too...
have you checked the user functions too which I changed with the last patch?
Quote from: frithjofh on February 06, 2016, 05:43:46 PM
wow, you really took this to heart...
Not really, I'm just playing with perf and also wanted to play with google.benchmark a bit.
And for sure I wanted to verify that what I'm talking about is correct.
And it turned out that the difference is not that big that I thought it will be.
The reason is that most of the time is wasted constructing wxString/std::string objects.
BTW the wx3.0 version is quite a lot slower, because it uses std::string
Quote from: frithjofh on February 06, 2016, 05:43:46 PM
have you checked the user functions too which I changed with the last patch?
Not yet.
For the getstringfromarray one my guess is that there won't be any difference:)
Heh, it turned out that your version of GetStringFromArray is a lot slower than the original, because you're using operator+. ::)
QuoteHeh, it turned out that your version of GetStringFromArray is a lot slower than the original, because you're using operator+. ::)
instead of two times append or += ?
corrected...
QuoteAnd it turned out that the difference is not that big that I thought it will be.
the difference between which versions? original and my version? my version and your version? I mean between your version and my version there is a factor of three, I think that is much, although in absolute numbers it is only msec scale.
corrected patch. still contains my version of GetStringFromArray() until better version from obfuscated arrives, or what ever the decision may be...
please test and comment on EscapeSpaces()
I'm talking about the non-trimming case. With my version it is slightly faster than your version.
The trimming version is way faster, because it doesn't do trimming, which is slow.
I've done more testing on the EscapeSpaces function and it seems that your fix is an improvement only in wx2.8. With wx3.0 it doesn't make a difference.
ok. so just leave it as is, right?
@obfuscated
made another attempt on GetArrayFromString(). if you don't mind, could you give it a run in your testing harness to see its performance?
wxArrayString GetArrayFromString_3(const wxString& text, const wxString& separator, bool trimSpaces)
{
if ( text.empty() )
return wxArrayString();
const size_t seplen = separator.length();
size_t start_pos = 0;
size_t sep_pos = 0;
wxArrayString out;
if (trimSpaces)
{
while (sep_pos != wxString::npos)
{
start_pos = text.find_first_not_of(_T(" \t\n\r"), start_pos);
if (start_pos == wxString::npos)
break;
sep_pos = text.find(separator, start_pos);
if (sep_pos != start_pos)
{
const size_t end_pos = text.find_last_not_of(_T(" \t\n\r"), (sep_pos == wxString::npos) ? wxString::npos : sep_pos - 1);
out.push_back(text.substr(start_pos, end_pos - start_pos + 1));
}
start_pos = sep_pos + seplen;
}
}
else
{
while (sep_pos != wxString::npos)
{
sep_pos = text.find(separator, start_pos);
if ( sep_pos != start_pos )
out.push_back(text.substr(start_pos, (sep_pos == wxString::npos) ? wxString::npos : sep_pos - start_pos));
start_pos = sep_pos + seplen;
if (start_pos >= text.length())
break;
}
}
return out;
}
No time now. You can try it for yourself -
https://github.com/google/benchmark
https://youtu.be/nXaxk27zwlk?t=691