Re: C/C++ question about dynamic "static struct"

From:

Paavo Helde <myfirstname@osa.pri.ee>

Newsgroups:

comp.lang.c++,comp.lang.c

Date:

Sat, 27 Oct 2012 15:46:26 -0500

Message-ID:

<XnsA0F9F1D74A79Bmyfirstnameosapriee@216.196.109.131>

ImpalerCore <jadill33@gmail.com> wrote in
news:fd1e7b35-3d5d-40b4-91b5-92e6b1b66fc2@d17g2000vbv.googlegroups.com:

On Oct 27, 12:13?pm, Paavo Helde <myfirstn...@osa.pri.ee> wrote:

ImpalerCore <jadil...@gmail.com> wrote
innews:906d2617-80f6-408e-bcf2-2ad

c990a0e63@p22g2000vby.googlegroups.com:

Agreed, but I also think the "RAII for memory" is also encapsulated
in 'c_levenshtein', unless I misunderstand what you're saying by
"encapsulation". ?By that I mean that the c_levenshtein just takes
tw

strings;

That's true, but the usability of this encapsulation is on a
different level.

Okay, can you enumerate what "levels of encapsulation" you associate
std::string and c_levenshtein? Are you saying 'class' is a higher
level of encapsulation than 'function'?

No, I talked about the level of "usability". I just wanted to say that
the frequency of anyone using c_levenshtein is several magnitudes less
than anyone using std::string (in C++), just because the c_levenshtein is
a much more specialized thing. So, if I introduce some optimization like
small string optimization in std::string, it has several magnitudes more
impact than introducing it in c_levenshtein (and introducing it in all
functions similar to c_levenshtein is a lot of work which can be avoided
by introducing it in std::string instead).

, but I am making use of std::string every
day (as well as my home-grown variant class which is also using
small- string optimization).

Are you trying to point out that 'std::string (high) > char* (low)'?

Yes, std::string is inevitably using char* so it is higher level. But on
the other hand, in C++ std::string would be itself pretty much lowest
level and anything using it would be yet higher level. The fact that the
C version of c_levenshtein is containing char[] is dragging it to the
same lower level where std::string resides, where it does not actually
belong.

Encapsulation means I can build my own abstractions, and then build
other stuff on top of them, ad infinitum. Using abstractions means
the upper level code is not concerned with lower-level details. In
contrast, your c_levenshtein() function is containing lower-level
code (like setting up pm_workspace) which has absolutely nothing to
do with the actual purpose

this function. I understand this is kind of inevitable in C.

I'm a bit confused. Evaluating the levenshtein distance in the
classical method requires a matrix that you have to get memory for
from somewhere. Even if you have 'int c_levenshtein( const string&
s1, const string& s2 )' or some levenshtein member function, you still
have to provide memory for the matrix; it's not innate to 's1' and
's2'. I agree that 'pm_workspace' is a kind of small string
optimization for malloc that is not directly related to the algorithm,
but you still need to get the memory from somewhere. Can you
pseudocode your own version of levenshtein using the std::string
framework, so I can better understand where you're getting the memory
for the matrix from, and classify what parts of the function are
"high" and "low" level?

There are no two levels "high" and "low", there is a potentially open-
ended hiearchy of levels. I am very oriented to the actual working code,
so for me the notions "lower" and "higher" just mean the order of loading
dynamic-link-libraries into the process space, assuming that each feature
is implemented in its own dynamic-link library.

And I said the technique just reminded me the short-string optimization
of C++, not that it would be applicable for this very function. Anyway,
here is the translation of the levenshtein function into C++:

\code
int c_levenshtein( const std::string& s1, const std:string& s2 )
{
  // note: try-catch is probably unneeded here, it is just to replicate
  // the original function way to report any errors
  // by returning a non-informative -1 error code.
  try {
    /* If one of the strings is empty "", the edit distance is equal
       to the length of the non-empty string. */
    if (s1.empty() || s2.empty()) {
      return s1.length() + s2.length();
    }
    int m = s1.length()+1;
    int n = s2.length()+1;
    std::vector<int> proximity_matrix(n*m);
    gc_compute_levenshtein_matrix(
      s1.c_str(), s2.c_str(), &m, &n, &proximity_matrix[0] );
    return proximity_matrix[m*n-1];
  } catch(...) {
    return -1;
  }
}
\endcode

The intermediate array is using std::vector here instead of std::string,
and I have not heard any implementation of std::vector that is using any
kind of "small-string" optimization. So this C++ version probably
involves a dynamic memory allocation even in case of short input strings
and so probably runs slower than the C version in case of short input
strings. Getting it faster would involve more work, and I would not be
convinced this is needed unless the profiler told me that. On the other
hand, if more work were needed, I could encapsulate it in a class and
just replace the name std::vector with my class name.

If you're basically saying C++ > C for encapsulation, I agree with
you.

Yeah, I guess this is mostly what I wanted to say.

. However, one can still build a C interface to a resizing string
to have a kind of std::string equivalent in functionality, but the
code won't look as pretty, especially to someone accustomed to class
based design. But that doesn't mean that it can't be done, and that
it wouldn't be useful for someone using C.

Sure, C is Turing complete so one can do anything in it. It just takes
more care and discipline.

Best regards
Paavo