Re: find the largest 1000 values

From:

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam>

Newsgroups:

microsoft.public.vc.language

Date:

Mon, 12 Nov 2007 08:54:14 -0600

Message-ID:

<urJmGvTJIHA.5928@TK2MSFTNGP05.phx.gbl>

"George" <George@discussions.microsoft.com> wrote in message
news:22B66E7E-C9B9-40BB-B08D-34DF49F5E968@microsoft.com...

Sorry for sending another message to you Anthony,

I have studied your reply again, you menioned,

Worst case, the algorithm could be O(N*M) if you were startlingly unlucky

M is for what? I have read and searched all your replies in this
discussion
thread
but can not the M sign. :-)

N: the total number of elements in the array
M: the size of the group being extracted

i.e. M = 1000 in your case because you wanted the first 1000 values

regards,
George

"Anthony Wieser" wrote:

On average, the performance will be linear.
Worst case, the algorithm could be O(N*M) if you were startlingly unlucky
and constructed a sequence such that the partition always managed to find
the minimum element at the middle of the sequence, where M is the number
of
items you're selecting (1000 in this case).

Because most partition algorithms choose the middle element to partition
around, a collection that has previously had nth_element run on it with
the
same comparison operator will approach linear performance as described
earlier.

"George" <George@discussions.microsoft.com> wrote in message
news:73E3FFAA-2EBE-4128-9EAE-D3DB09773E03@microsoft.com...

Thanks for your great reply, Anthony!

I have one more comment.

Given that partion will always put the first half of the data to the
left
and the second half to the right of the middle,
and we only apply the algorithm to the half that our nth element is
in,
we
end up with
n/2 + n/4 + n/8 + ...

Even with very large N, the sum above is never greater than N,
therefore
the
algorithm is linear.

From your analysis, I agree that partition itself is linear,
i.e. one round of partition algorithm is linear.

But each time, we get the middle is an optimum and random result.
How could we ensure that each time, the pivot key is in the middle?
Maybe it is 1/3n, maybe it is 7/8n... (if correct me if I am wrong and
such
situation
can not happen), and if such situation exists, is your above analysis
and
conclusion
still working?

have a nice weekend,
George

"Anthony Wieser" wrote:

"George" <George@discussions.microsoft.com> wrote in message
news:BB67AA52-2D38-4867-B0BB-5275B5CEB258@microsoft.com...

Thanks Anthony,

Here is what I find an open source STL implementation of nth_element
algorithm. And I do not think it is linear. Could you help to review
and
comment please?

http://www.google.com/codesearch?hl=zh-CN&q=+nth_element+show:_LKSEhOaCKY:4U8FyRrKZWA:fOIA67E1HTo&sa=N&cd=1&ct=rc&cs_p=http://standards.iso.org/ittf/PubliclyAvailableStandards/c043931_ISO_IEC_14496-5_2001_Amd_9_2007_Reference_Software.zip&cs_f=C043931e_Electronic_inserts/Systems/Systems/IM1/IM1Decoders/AFX/WaveSurf/stlport/stl/_algo.c#first
// nth_element() and its auxiliary functions.

template <class _RandomAccessIter, class _Tp, class _Compare>
void __nth_element(_RandomAccessIter __first, _RandomAccessIter
__nth,
                  _RandomAccessIter __last, _Tp*, _Compare __comp) {
while (__last - __first > 3) {
   _RandomAccessIter __cut =
     __unguarded_partition(__first, __last,
                           _Tp(__median(*__first,
                                        *(__first + (__last -
__first)/2),
                                        *(__last - 1),
                                        __comp)),
                           __comp);

The partition algorithm is linear, there are (_Last - _First)
applications
of _Comp and at most (_Last - _First)/2 swaps.

The 3 is an arbitrary constant, which is for efficiency or because of
guard
conditions inside the algorithm, because once you've got that close,
insertion sort is as good as linear too.

Given that partion will always put the first half of the data to the
left
and the second half to the right of the middle,
and we only apply the algorithm to the half that our nth element is
in,
we
end up with
n/2 + n/4 + n/8 + ...

Even with very large N, the sum above is never greater than N,
therefore
the
algorithm is linear.

   if (__cut <= __nth)
     __first = __cut;
   else
     __last = __cut;
}
__insertion_sort(__first, __last, __comp);
}

regards,
George

"Anthony Wieser" wrote:

Nth element can be implemented in linear time using a modified
version
of
quicksort's partition phase.

Anthony Wieser
Wieser Software Ltd