APS105 Lab 5

Objective

The objective of this lab is to give you practice understanding and implementing a new sorting algorithm.

Selection Sort: the Inefficiency

Please read section 9.4 of the textbook for an explanation of selection sort. Here, I describe the algorithm just enough to contrast it with the variant you will implement.

When using selection sort, we conceptually divide our array into two parts: an unsorted part, and a sorted part. At first, the unsorted part consists of the entire array and the sorted part consists of nothing. Then, on each major iteration (i.e. each iteration of the outermost loop), we find the largest element in the unsorted part, and swap it with the rightmost element in the unsorted part. Thus, each major iteration adds just one element to the sorted part.

For example, consider the array:
7 1 9 3 5 4 |
(We use the vertical bar to separate the unsorted part from the sorted part.) The largest element in the unsorted part (keeping in mind that currently the entire array is the unsorted part) is 9. Swapping this with the rightmost element of the unsorted part gives:
7 1 4 3 5 | 9
Now, the largest element in the unsorted part is 7. Swapping this with the rightmost element of the unsorted part gives:
5 1 4 3 | 7 9
We proceed in this way until the entire array is sorted.

Notice that each iteration of the outer loop gives us only one more element in the correct place. In other words, we do all of the work to scan the unsorted part, but then use that only to place one more element in sorted order. If we could remember more information about each scan of the unsorted part, we might be able to place multiple elements in sorted order each time, thereby reducing the number of times the outer loop must run.

Improved Selection Sort: Addressing the Problem

Rather than remembering only the maximum, let us remember the minimum and maximum values on each scan of the unsorted part. This will let us place two elements at a time in sorted order.

For example, again consider the array:
| 7 1 9 3 5 4 |
This time, we have two vertical bars. To the left of the first vertical bar will be the small values in sorted order; between the vertical bars will be the unsorted part of the array; to the right of the second vertical bar will be the large values in sorted order.

We begin by finding the minimum and maximum values in the unsorted part of the array (currently the entire array). The smallest value is 1, and the largest value is 9. We swap the 1 with the leftmost value in the unsorted part, and swap the 9 with the rightmost value in the unsorted part, giving:
1 | 7 4 3 5 | 9
Repeating the process, we again find the smallest and largest values in the unsorted part. The smallest is now 3; the largest is now 7. Swapping 3 with the leftmost value in the unsorted part and swapping 7 with the rightmost value in the unsorted part gives:
1 3 | 4 5 | 7 9
We have two elements left to sort: 4 and 5. The smallest is 4; the largest is 5. Swapping 4 with the leftmost value in the unsorted part and swapping 5 with the rightmost value in the unsorted part physically does nothing:
1 3 4 | | 5 7 9
but it does let us conclude that the entire array is sorted, since the middle part is now empty.

Notice how rapidly we made progress here. We took only three steps to sort a six-element array, rather than the five steps of the naive selection sort. In general, the approach of finding both the minimum and maximum will take half as many major iterations as the naive selection sort. Of course, if our major iterations now take twice as long as before, we have not made a net improvement. In particular, what we cannot do is scan once to find the minimum and scan again to find the maximum. We will discuss below how to find both the minimum and maximum faster than this.

Implementing the Improved Selection Sort

Download the files twosort.c and twosort.h. In this part, you will write the findMinMax and twoSort functions in twosort.c. Do not modify twosort.h. Also download swap.c and swap.h; you should use this swap function whenever you require a swap to be performed.

Part 1: Implementing findMinMax

The findMinMax function finds the indices of the minimum and maximum values in the subarray beginning at a[lower] and ending at a[upper] (including both endpoints). You are to create a minMax structure, fill its two members with the correct values, and then return it.

Here is an example function call:

int vals[] = {4, 2, 6, 7, 6, 4};
struct minmax mm = findMinMax (0, 5, vals);
At this point, mm.minIndex = 1 and mm.maxIndex = 3.

You must implement the following algorithm for finding the minimum and maximum values in an array. When we refer to the array's elements, we mean the elements of the subarray passed to the function.

  1. Declare min and max variables.
  2. If the first array element is smaller than the second, give min the value 0 and give max the value 1. Otherwise, the first element is greater than or equal to the second; give min the value 1 and max the value 0.
  3. Assume the next two values to check are at indices i and i+1. Figure out which of these indices holds the smaller value and which holds the larger value. If the smaller value is less than the value at index min, set min to the index of the smaller value. If the larger value is greater than the value at index max, set max to the index of the larger value.
  4. Repeat step 3 until you have gone through the entire array.

Here is how the function would operate on array [4, 2, 6, 7, 6, 4].

Warning: be very careful with arrays having an odd number of elements (including one element), as well as empty arrays! You will go off the end of the array if you indiscriminately move two-by-two without regard to the number of elements remaining. Be sure to test with both even- and odd-length arrays. It is a good idea to thoroughly test this function before continuing; it's much easier to debug twoSort if you are confident in your findMinMax.

Part 2: Implementing twoSort

findMinMax in hand, you are ready to write twoSort itself. This algorithm was described above. Here is an example function call:

int vals[] = {4, 2, 6, 7, 6, 4};
twoSort (6, vals);
At this point, vals is sorted. Notice that twoSort takes the length of the array, not the lower and upper indices as with findMinMax

Important: You must call findMinMax from within your twoSort function!

Testing twoSort

In this lab, you are submitting only your findMinMax and twoSort functions, no main function. However, you'll certainly want to stress-test your functions against a wide variety of arrays. For example:

To complement your manual testing, download the following three files:

The file test_twosort.c contains a main function that generates and sorts arrays of random values (50 arrays by default). If a particular array fails to be sorted, the tester prints that array and halts. If all sorts are successful, the program compares the elapsed time for your twosort to the textbook's selection sort.

To test your twoSort function, compile your code with the following command:
gcc -o twosort twosort.c selsort.c test_twosort.c swap.c

Follow our Specifications, Please

For all labs, but particularly here, it is very important that you follow instructions. You have not correctly completed the lab if you use a different algorithm for finding the minimum and maximum, or if you use a different sorting algorithm, even if the tests are passing.

Part 3: Timing twoSort

Once you have finished your twosort implementation and all manual and random tests pass, you will compare the efficiency of your implementation to the textbook selection sort. To do this, return to test_twosort.c.

test_twosort.c contains several constants that influence the number and size of the tests that it runs:

Increase the MOST_ELEMENTS constant to several large values, and take note of the time taken by your sort and the standard selection sort. What can you say about the time of both sorting algorithms as you increase MOST_ELEMENTS?

Now, increase LARGEST_ELEMENT. Does this affect the time taken by your sort or the standard selection sort?

Summarize your timing results in file timing.txt.

Submitting your Lab

When you have completed the lab, use the command submitaps105s 5 twosort.c timing.txt to submit your files. Make sure you name your files exactly as stated (including lower/upper case letters). You may check the status of your submission using the command submitaps105s -l 5, where -l is a hyphen followed by the letter 'ell'.