June 5, 2015 - algorithms cpp

# Sorting Algorithm - Merge Sort

Xavier Geerinck

Merge Sort is a comparison algorithm that tries to sort the dataset by a **Divide And Conquer** method. The performance of this algorithm is $O(n*log(n))$.

## How

We start by splitting the array up into separate elements, then we start by merging pairs together and we make sure that the smallest element is always on the left (this makes sure that our first index on the end is correct too). If the array start getting bigger then 2 elements, then we join those larger arrays by going through their elements one by one and putting the smallest one first.

For the implementation check paragraph `§5.4`

.

## Advantages and Disadvantages

### Advantages

- Stable
- $\Theta(n*lg(n))$ Performance
- Doesn't need random access to data

### Disadvantages

- Not adaptive
- $\Theta(n)$ extra space normal
- $\Theta(lg(n))$ Extra space for LinkedLists

## Performance

Worst Case | Average Case | Best Case |
---|---|---|

O(n log n) | O(n log n) | O(n log n) |

For the performance of merge sort we will go out from the assumption that $n = 2^k$. This is because we can then split the table in 2 every time.

$T(n) = 2T(n / 2) + cn$

cn stands for the number of operations needed for merging. This is because merging 2 sub tables is $\Theta(n_1 + n_2)$. Now to solve this equation we will just divide both terms by n.

$$\frac{T(n)}{n} = \frac{T(n / 2)}{n / 2} + c$$

And we keep on doing this (recursion) because we need the table to become 1 element big.

$$\frac{T(n/2)}{n/2} = \frac{T(n / 4)}{n / 4} + c$$ $$\frac{T(n/4)}{n/4} = \frac{T(n / 8)}{n / 8} + c$$ $$\frac{T(2)}{2} = \frac{T(1)}{1} + c$$

On the end we just add them both together and we get:

$$\frac{T(n)}{n} = T(1) + ck$$

Because $$T(1)$$ is a constant we can say that $T(n) = O(n*lg(n))$. Also when n is not a power of 2, then we get the same result.

## Implementation

Implementing this algorithm is not so hard if you know how to, you need to keep 2 methods in mind:

- Merge (Combines 2 arrays with each other)
- MergeSort (Main method, accepts a dataset, splits it and calls this method for those 2 splitted parts)

Then just recursively call MergeSort and you are done.

### C++ Code

void merge(vector &v, const vector &left, const vector &right) const {// We will save our merged array herevector result;// Create indexes for the 2 arraysint idxLeft = 0;int idxRight = 0;// Make sure we are not at the end of one of the arrays, else we can't comparewhile (idxLeft < left.size() && idxRight < right.size()) {if (left[idxLeft] < right[idxRight]) {result.push_back(left[idxLeft]);idxLeft++;} else {result.push_back(right[idxRight]);idxRight++;}}// Add remaining elementswhile (idxLeft < left.size()) {result.push_back(left[idxLeft]);idxLeft++;}while (idxRight < right.size()) {result.push_back(right[idxRight]);idxRight++;}// Set our vector on the resultv = result;};void merge_sort(vector &v) const {// Stop running recursively once we have 1 element!if (v.size() == 1) {return;}// Find the middle elementvector::iterator middle = v.begin() + (v.size() / 2);// Create our 2 sub arraysvector left(v.begin(), middle);vector right(middle, v.end());// Recursively call MergeSort again, needed for the left and rightmerge_sort(left);merge_sort(right);// Merge the left and the right arraymerge(v, left, right);}

## 1.5.5. Benchmark

100 | 1.000 | 10.000 | 100.000 | 1.000.000 | |
---|---|---|---|---|---|

Random Elements | 0.000163 | 0.001637 | 0.018098 | 0.19313 | 2.01314 |

Ascending Elements | 0.000145 | 0.001474 | 0.016712 | 0.172718 | 1.85952 |

Descending Elements | 0.00015 | 0.001472 | 0.016825 | 0.173975 | 1.84874 |

## 1.5.6. Conclusion

When sorting a Linked List Mergesort is the best choice, however for normal arrays it is better to use other algorithms such as heap sort (which requires less space) or quicksort.