Re: My Explode function(s) are too slow.
* FFMG:
Hi,
I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.
The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...
//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;
while(iPos>-1)
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while
// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}
//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;
char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------
Function B is slightly faster than function A.
How could I speed up my Explode?
I'd first try to
* Read the complete file into a buffer in one or a very few large
gulps -- that typically improves the reading by at least one
order of magnitude.
* Analyze whether an /explicit representation/ of the complete token
set is really required, or whether you can just proceed by handing
one at a time up to calling code or down to code that you call.
* If explicit representation is required, and performance really
suffered, I'd first try the obvious of checking whether compiler
options could fix the performance; second whether a rewrite to a
"get" function (not returning the result via function result but
via a reference argument) would fix it; third, I'd consider things
such as a vector of StringSpan objects, each such object containing
just a pointer to the start and end of a substring.
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?