Re: Splitting a string into an array words

From:

"Daniel T." <daniel_t@earthlink.net>

Newsgroups:

comp.lang.c++

Date:

Fri, 21 Jul 2006 04:35:46 GMT

Message-ID:

<daniel_t-B6D82B.00354221072006@news.west.earthlink.net>

In article <VOVvg.128148$H71.111533@newssvr13.news.prodigy.com>,
Mark P <usenet@fall2005REMOVE.fastmailCAPS.fm> wrote:

   template <typename OutIt>
void tokenize( const string& str, OutIt os, const string& delims = " ")
{
   string::size_type start = str.find_first_not_of( delims );
   while ( start != string::npos ) {
      string::size_type end = str.find_first_of( delims, start );
      *os++ = str.substr( start, end - start );
      start = str.find_first_not_of( delims, end );
   }
}

Looks good. In my case it was a bit more complicated because I also
have an additional parameter for a comment character. When a comment
character is encountered at the beginning of a token, that token is
discarded and the loop breaks. (So in my original implementation there
were multiple breakpoints out of the loop, although I hastily trimmed
these before I posted my code, thereby leaving some unattractive vestiges.)

In any event, I appreciate your comments and don't mean to simply make
excuses and argue all of your points.

No problem. Your code was rather good in general, I only saw a few nits
to pick at.

The only significant hitch to my
adopting your cleaner implementation is that I really do need support
for the comment character break. Luckily this is just a bit of a little
file parser I use for testing, so I don't stress too much about these
details, but feel free to propose a svelte implementation that supports
a comment char. :)

If I understand what you mean then:

void tokenize( const string& str, OutIt os, const string& delims = " ",
               char comment = '\0' )
{
   string::size_type start = str.find_first_not_of( delims );
   while ( start != string::npos && start[0] != comment ) {
      string::size_type end = str.find_first_of( delims, start );
      *os++ = str.substr( start, end - start );
      start = str.find_first_not_of( delims, end );
   }
}

Of course you should probably change the defaults to whatever is most
common in your code...