Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 handwiki -- 1459 2022-11-07 01:33:27

Video Upload Options

We provide professional Video Production Services to translate complex research into visually appealing presentations. Would you like to try it?

Confirm

Are you sure to Delete?
Cite
If you have any further questions, please contact Encyclopedia Editorial Office.
HandWiki. Measure of Presortedness. Encyclopedia. Available online: https://encyclopedia.pub/entry/33292 (accessed on 18 November 2024).
HandWiki. Measure of Presortedness. Encyclopedia. Available at: https://encyclopedia.pub/entry/33292. Accessed November 18, 2024.
HandWiki. "Measure of Presortedness" Encyclopedia, https://encyclopedia.pub/entry/33292 (accessed November 18, 2024).
HandWiki. (2022, November 07). Measure of Presortedness. In Encyclopedia. https://encyclopedia.pub/entry/33292
HandWiki. "Measure of Presortedness." Encyclopedia. Web. 07 November, 2022.
Measure of Presortedness
Edit

In computer science, a measure of presortedness of a sequence represents how much work is required to sort the sequence. If the sequence is pre-sorted, sorting the sequence entirely require little work, hence it is expected to have a small measure of presortedness. In particular, the measure of a sorted sequence is 0. Some sorting algorithms are more efficient on pre-sorted list, as they can use this pre-work into account to avoid duplicate work. The measure of presortedness allows to formalize the notion that an algorithm is optimal for a certain measure.

sorting algorithms presortedness sorting

1. Definition

Let [math]\displaystyle{ \mathbb N^{*} }[/math] represents finite sequence of natural numbers. Let [math]\displaystyle{ m:\mathbb N^*\to\mathbb N }[/math] a function sending sequence of numbers to non-negative numbers. This function is said to be a measure of presortedness if it satisfies the following axioms:

  • if [math]\displaystyle{ X=\langle x_1,\dots,x_n\rangle }[/math] is in ascending order, then [math]\displaystyle{ m(X)=0 }[/math], that is, the measure of presortedness of sorted sequences is 0.
  • let [math]\displaystyle{ X=\langle x_1,\dots,x_n\rangle }[/math] and [math]\displaystyle{ Y=\langle y_1,\dots,y_n\rangle }[/math]. If [math]\displaystyle{ x_i\lt x_j }[/math] iff [math]\displaystyle{ y_i\lt y_j }[/math], then [math]\displaystyle{ m(X)=m(Y) }[/math]. That is, the measure do not depends on the actual numbers in the sequence, but only on their order compared to other elements of the sequences.
  • If [math]\displaystyle{ Y }[/math] is a subsequence of [math]\displaystyle{ X }[/math] then [math]\displaystyle{ m(Y)\le m(X) }[/math], that is, [math]\displaystyle{ m }[/math] is monotonic for the subsequence relation
  • let [math]\displaystyle{ X=\langle x_1,\dots,x_n\rangle }[/math] and [math]\displaystyle{ Y=\langle y_1,\dots,y_n\rangle }[/math]. If for each [math]\displaystyle{ i,j }[/math], [math]\displaystyle{ x_i\lt y_j }[/math] then [math]\displaystyle{ m(XY)\le m(X)+m(Y) }[/math], that is, the measure is subadditive
  • for each [math]\displaystyle{ x\in\mathbb N }[/math] and [math]\displaystyle{ X\in\mathbb N^{*} }[/math], [math]\displaystyle{ m(\langle x\rangle X)\le |X|+m(X) }[/math]. Intuitively, if you add an element [math]\displaystyle{ x }[/math], it can be in the wrong order compared to at most every other element of [math]\displaystyle{ X }[/math], and so the pre-sortedness can be increased by at most the length of [math]\displaystyle{ X }[/math].

Those axioms are similar to the axioms of measure theory. However, a measure of presortedness is not a measure as in measure theory, since it is not defined over a set but over sequence.

2. Optimal Algorithm

It is well known that an optimal comparison sort requires to compare [math]\displaystyle{ O(n\log n) }[/math] pairs of elements in a sequence. That is, the decision tree complexity is [math]\displaystyle{ O(n\log n) }[/math]. However, when more informations is known about how much a sequence is presorted, more optimal bounds can be devised. This notion is now formalized.

Given a measure of pre-sortedness [math]\displaystyle{ m }[/math], a sequence [math]\displaystyle{ X=\langle x_1,\dots,x_n\rangle }[/math], [math]\displaystyle{ z\ge m(X) }[/math], let [math]\displaystyle{ \mathtt{below}(z,X, m) }[/math] be the set of permutations of [math]\displaystyle{ X }[/math] whose measure is at most [math]\displaystyle{ Z }[/math]. Then let [math]\displaystyle{ \mathtt{below}(X, m)=\mathtt{below}(m(X), X, m) }[/math] the set of permutations whose measure is at most the measure of [math]\displaystyle{ X }[/math].

A sorting algorithm [math]\displaystyle{ S }[/math] which takes as input [math]\displaystyle{ X }[/math] and [math]\displaystyle{ z }[/math] is said to be weakly [math]\displaystyle{ m }[/math]-optimal if its time complexity is [math]\displaystyle{ O(\max(n, \log|\mathtt{below}(z, X, m)|)) }[/math]. That is, the number of operations is logarithmic in the number of possible permutations whose measure is at most [math]\displaystyle{ z }[/math]. This is optimal in the sens that simply selecting an element in a set of size [math]\displaystyle{ N }[/math] requires [math]\displaystyle{ \log(N) }[/math] bits of information.

Similarly, a sorting algorithm which takes as input [math]\displaystyle{ X }[/math] is [math]\displaystyle{ m }[/math]-optimal if its time complexit is [math]\displaystyle{ O(\max(n, \log|\mathtt{below}(z, X, m)|)) }[/math]. When [math]\displaystyle{ m(X) }[/math] is computable in linear time, a weakly [math]\displaystyle{ m }[/math]-optimal algorithm is automatically [math]\displaystyle{ m }[/math]-optimal. However, if [math]\displaystyle{ m }[/math] is more complex, it is possible that computing [math]\displaystyle{ m(X) }[/math] is actully more costly than sorting the sequence in the first place, and thus only an upper bound [math]\displaystyle{ z }[/math] of [math]\displaystyle{ m(X) }[/math] can be used.

Similarly, a decision tree is weakly [math]\displaystyle{ m }[/math]-optimal if its complexity is [math]\displaystyle{ O(\max(n, \log|\mathtt{below}(z, X, m)|)) }[/math] and it is [math]\displaystyle{ m }[/math]-optimal if its complexity [math]\displaystyle{ O(\max(n, \log|\mathtt{below}(X, m)|)) }[/math].

2.1. Existence of Decision Tree

For each measure [math]\displaystyle{ m }[/math], there exists a weakly [math]\displaystyle{ m }[/math]-optimal decision tree.

If the decision tree complexity of [math]\displaystyle{ m }[/math] is [math]\displaystyle{ O(\max(|X|, \log|\mathtt{below}(X, m)|)) }[/math], there exists an [math]\displaystyle{ m }[/math]-optimal decision tree. This decision tree's root compute [math]\displaystyle{ m(X) }[/math]. Once [math]\displaystyle{ m(X) }[/math] is computed in a node, the weakly [math]\displaystyle{ m }[/math]-optimal decision tree for [math]\displaystyle{ m(X) }[/math] is appended at this node.

The existence of the tree does not implie that there exists (weakly) [math]\displaystyle{ m }[/math]-optimal sorting algorithm, as computing the decision tree may be costly.

3. Examples

Here is a list of measures of presortedness [math]\displaystyle{ m }[/math] and related (weakly) [math]\displaystyle{ m }[/math]-optimal algorithms. Let [math]\displaystyle{ X=\langle x_1,\dots,x_n\rangle }[/math] be fixed for the remaining of the section.

Let us introduce a sorting algorithm that will be optimal for multiple measure. Let's call local insertion sort the algorithm that consists in iterating on the sequence and adding each element in a finger tree.

3.1. The Zero Measure

The constant function 0 is the most trivial measure of presortedness. Any sorting function running in [math]\displaystyle{ O(n\log n) }[/math]-time, such as heapsort and mergesort is [math]\displaystyle{ 0 }[/math]-optimal.

3.2. The Indicator Function

The indicator function [math]\displaystyle{ m_{01} }[/math] of sorted sequence is a measure of sortedness. This function sends sorted sequences to 0 and all other sequences to 1. Any algorithm that check whether a list is sorted, and if it is not sorted apply a [math]\displaystyle{ 0 }[/math]-optimal sorting algorithm is [math]\displaystyle{ m_{01} }[/math]-optimal.

3.3. Number of Runs

The number of runs, [math]\displaystyle{ \mathtt{Runs}(X) }[/math], is the number of increasing sequences in [math]\displaystyle{ X }[/math] minus one. It is equivalently defined as the number of [math]\displaystyle{ i }[/math] such that [math]\displaystyle{ x_{i+1}\lt x_i }[/math]. is a measure of presortedness.

Natural merge sort, which consists in using existing runs and merging them, is a [math]\displaystyle{ \mathtt{Runs} }[/math]-optimal algorithm. The local insertion sort is also Runs-optimal.

3.4. Number of Inversions

The number of inversion in [math]\displaystyle{ X }[/math], [math]\displaystyle{ \mathtt{Inv}(X) }[/math], is the number of pairs [math]\displaystyle{ i\lt j }[/math] such that [math]\displaystyle{ x_i\gt x_j }[/math]. It is a measure of presortedness and the local insertion sort is Inv-optimal.

3.5. Number of Deletions

The mininimum number [math]\displaystyle{ \mathtt{Rem}(X) }[/math] of elements that need to be removed from [math]\displaystyle{ X }[/math] to obtain a sorted sequences. It is a measure of presortedness and the local insertion sort is Rem-optimal.

3.6. Radius

The radius [math]\displaystyle{ \mathtt{Rad}(X) }[/math] of [math]\displaystyle{ X }[/math][1] is a measure of presortdness.

The following algorithm is Rad-optimal. It consists in iteratively halving the radius until it reaches 0. Halving the radius can be done by three merges of sub-sequences of length [math]\displaystyle{ \mathtt{Rad}(X) }[/math].

3.7. Minimum of Two Measures

Given two measures of presortdness [math]\displaystyle{ m_1 }[/math] and [math]\displaystyle{ m_2 }[/math], the measure [math]\displaystyle{ X\mapsto\min(m_1(X), m_2(X)) }[/math] is also a measure of presortdness. Given two [math]\displaystyle{ m_i }[/math]-optimal sorting algorithms [math]\displaystyle{ S_i }[/math], the algorithm consisting in executing [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] in parallel and returning the first output is a [math]\displaystyle{ \min(m_1,m_2) }[/math]-optimal algorithm.

References

  1. Estivill-Castro, Vladimir; Wood, Derick (October 1989). "A new measure of presortdness". Information andComputation 83 (1): 111–119. doi:10.1016/0890-5401(89)90050-3.  https://dx.doi.org/10.1016%2F0890-5401%2889%2990050-3
More
Information
Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :
View Times: 442
Entry Collection: HandWiki
Revision: 1 time (View History)
Update Date: 07 Nov 2022
1000/1000
ScholarVision Creations