Re: Looking for a design pattern

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Tue, 11 Aug 2009 15:17:55 +0200

Message-ID:

<h5rrhp$ov3$1@news.eternal-september.org>

* James Kanze:

On Aug 11, 11:15 am, "Alf P. Steinbach" <al...@start.no> wrote:

* Michael Doubez:

On 10 ao?t, 23:52, Jerry Coffin <jerryvcof...@yahoo.com> wrote:

In article <4a808b83$0$426$426a7...@news.free.fr>,
michael.dou...@free.fr says...

[ ... ]

The idiomatic choice for those who have made some editors
from scratch (not just using an edit control/widget).Look
up "cursor gap" in Wikipedia or wherever.

I had a look on wikipedia ... as clear as the bottom of my
socks.

The idea is pretty simple, really. When you're editing a
text file, you separate the file into two pieces:
everything that's before the cursor, and everything that's
after the cursor.

That I understood but what do you gain except an amortized
copy ? I guess it is memory cost effective.

It's a *simple* structure that yields O(1) insertion and
deletion at the cursor position and O(1) random access read
and write, where the former is the same as with a linked list,
and the latter is much better than a linked list.

But nobody uses a linked list.

Not sure about that.

And it's O(n) for random cursor
positionning, or going to a specific line.

Well, you got that right. :-)

Making that operation O(1) generally goes at the cost of O(n) insertion (e.g.
inserting a line). Old Basic implementations provided a possibility to work
around that (although I don't think they actually exploited the possibility they
created) by having a line number explicitly specified for, that is within, each
line. Then if you exhausted the "gaps" that you'd inserted in the line numbering
you had to "renumber" the lines.

Compromise is O(log n) for random cursor positioning.

It is, in fact, O(n)
when you go to the end of the file, which is a frequent
operation (since in many cases, that's where you're inserting
text).

Above is incorrect. Cursor gap is O(1) for moving to other end. With any
reasonable implementation, since going to the ends are frequent operations.

And I don't know how it works in today's environment,
when you have several windows open on the file, each with its
own cursor. (I think, but I'm very far from sure---maybe
someone who has the sources to emacs handy could check---that
emacs moves the cursor in the file to correspond to its position
in the active window. So that if you have one window open with
the local class definitions, at the top of the file, and another
where you're coding the implementations, at the bottom, each
time you move the cursor between the windows, you copy most of
the text.)

I've never seen or made a cursor gap structure that supported multiple cursors.
Possibly it can be done for small number of cursors. I'm not sure.

For a program editor, I'd definitely use something line
oriented, along the lines of vi, if only because one of the most
common operations is going to a specific line, with the line
number coming from an error message. The fact that emacs does
this almost instantly is, in fact, proof that on today's
machines, it really doesn't matter---just use
std::vector< char >, and be done with it.

I think you mean std::vector< std::wstring >. And that's OK, at least with a COW
string implementation or C++0x move semantics, or a with any half-decent DIY
immutable string class. But as I recall, the makers of the main free C# IDE
started out with that but had to switch to a more sophisticated scheme (they
wrote a book about it, available free online).

The main cost is that, without reallocation and copying of all
the data, the maximum size of the buffer is determined up
front, and it uses that much memory regardless of the amount
of data (for an editor, text), so no, it isn't especially
memory cost effective. :-)

The real cost is when you start using OS's which penalize
programs for excessive memory use. On the earliest version of
Unix I used (no virtual memory), when the OS had to swap out a
process, it choose the biggest, because that recovered the most
memory. (Back then, vi was the biggest program on the machine.
And everytime someone started a compile, vi stopped reacting for
three to five seconds.) Under Linux, you have to worry about
the famous OOM killer, which logically might behave
similarly---at the worst, the fact that you allocate a lot of
memory without touching it immediately does create a situation
where the OOM killer will be released. (Under Windows or
Solaris, it just works, thank you, and there's no OOM killer to
be worried about.)

Of course, in C++ any complexity for a more complicated
structure can be hidden under a simpler programmatic
interface. So I guess that my comment reflected that since C++
displaced C for such tasks, implementing editors from scratch
hasn't been that much in vogue. And the only such editor I
made that I still have the source code for, on magnetic tape,
an editor for the HP3000 from 198x!, wasn't written in C but
in Pascal, and it didn't use cursor gap but a more
sophisticated list of blocks, sort of like a C++ std::deque
implementation.

Sounds like vi:-).

No, it didn't have modes, except overwrite/insert.

It was also one of my first small-language implementations, because I "had" to
invent a language to store configuration of the terminal function keys. The HP
terminals had very nice function keys, with small displays at the bottom of the
screen showing the current assignment of any function key. Oh it was nice.

Cheers,

- Alf