COMP225 Assignment 1, Part 1

Semester 1, 2009


Programming with pointers:

Sorting and shuffling on linked lists.

Worth 8%

Due: There are TWO DEADLINES for submitting work: see below for details. These dates are firm -- no submissions will be accepted after the stated deadlines.


Objectives

This assignment will give you practice in programming with pointers. You will also learn how to use sorting for problem solving.

Deadlines for submitting work

There are two deadlines for submitting work for this assignment.

The formats you should use for your files are specified below.

If you do not use the filenames stated above, or you do not use the file format specified below, then you will receive 0 (zero) marks. This rule is absolute - there are no exceptions. Why? Because if a student asks to resubmit his or her assignment “with the correct names this time”, there is no way to check that the files have not been altered after the cut-off date, and that is unfair to the others. Use the automarker to make sure that this does not happen to you.

Overview

We are all used to the idea of sorting data in order to process it more effectively. But for some applications the opposite is required: to have data presented to us randomly jumbled. Music players like iTunes, iPods and iPhones have a “shuffle” setting for example: this could be called an anti-“boredom” device.

It turns out that randomly shuffling data is not an easy thing to get right on a computer even if using an accurate random number generator: in fact it's easy to get wrong! For more information on why that is so have a look at the resources here , attend the lectures, and read some old reviews of the iTunes shuffle facility here.

The original shuffle algorithm was popularised by Donald Knuth. It runs in O(n) in the worst case, but uses direct addressing on an array. In this assignment you'll learn how to shuffle a list of items using sorting. This is rather slower (O(nlog(n)) rather than O(n) Why?) but conceptually much easier to understand and to get right. We can also adapt this method easily to do shuffling on a linked list.

This Assignment

In this assignment you will implement sorting and shuffling on a pointer-based linked-list, adapting the basic algorithms to program a simple “myTunes” application. You have been provided with a number of program and specification files, and with one executable. The executable called myTunesTester will run on titanic and gives you an idea of what your “myTunes” programs should do when they are correctly implemented, compiled and executed. The other files are as follows:
  1. merge.cpp This is the file which deals with implementing a basic “mergesort” for sorting a linked list. It contains the utility function merge which implements an in-place merge of two portions of a list, with the assumption that each portion is itself sorted. This function is used inside of the function mergesort, which has been implemented for you (it only needs merge to run). Task 1 asks you to implement merge, to answer some questions about it, and to compile a set of performance tests to be handed in with your written answers.
  2. myTunes.cpp This contains the functions for a small “myTunes” application. It provides the basic data structure for storing tracks in a linked list. Each node in the linked list contains the name of the track, together with a field played which is incremented each time the track is played. An additional field shuffleTag is used for assigning a random number for use by the shuffle functions.

    In Task 2 you are asked to implement the functions shuffle and smartShuffle as set out in the specifications below. To do this you will need to complete the implementations of the two merge functions alphaSort and playedSort which sort the list according to the track and played fields respectively. You are also asked to answer some questions about your implementations.

  3. myTunesTester.cpp implements the program which will be used to test your implementations. It builds a list of tracks, and assigns random play counts to the play field (by playing them a few times). It then calls the function shuffle and prints the result, and finally calls the function smartShuffle and prints the result. The executable myTunesTester is the result.

Task 1 (8 marks total: 4 for the correct implementation, and 4 for the written part.)

Your first task is to fill-in the function merge in merge.cpp in order to complete the implementation of mergesort, and to answer the questions set out below. Make sure that your implementation has worst-case linear complexity and uses O(1) extra “scratch space”.

void void merge(ptrList*&l, ptrList* mid, ptrList*&r); 
   // PRE: l,mid and r are pointers to list nodes; l, mid and r are all "connected"; 
   // Nodes conncting l and mid inclusive are sorted; nodes connecting
   // mid->next and r inclusive are sorted.
   // POST: the list portion connecting l and r are sorted.
You will have correctly implemented this function if, when it is used inside of mergeSort, the result is a sorted linked list of items. Use the testing program testMerge to test your function.

Remember when you compile your code, to have all the files merge.cpp, testMerge.cpp and mersenne.cpp in the same folder and to use g++ testMerge.cpp.

To help you with the implementation, and to understand its performance, answer the following questions. These answers should be posted in the assignment box by the due date.

  1. Describe briefly what you used in your design to make sure that the implementation used no more than O(1) scratch space, and has O(n) time complexity.
  2. (1 mark) Your program merge should contain a loop for processing the nodes in the portion of the list to be merged. Use temporary variables pf and f and ps and s to distinguish portions of the list as the loop iterates. Write down a useful invariant which can be used to show that your implementation is correct for all cases. How must pf and f and ps and s be initialised to establish the invariant? On termination, show that the invariant implies that the merge is correct.

  3. (1 mark) What are the advantages and/or disadvantages for implementing quicksort on linked lists as compared to mergesort? Mention issues such as time, space and difficulty of the coding.

  4. (1 mark) Compare experimentally the time for mergesort on lists and for mergesort on arrays. (You will need to use the implementation for mergesort on arrays given in lectures.) Check for lists and arrays of lengths 10, 100, 1000, 1000, 10000, 100000, and 1000000. Plot your results on a graph, and comment. In particular note any differences or similarities and try to explain them. What tests did you use, how did you create them and why were they appropriate? (For the best results you should plot both curves on the same graph.)

  5. (1 mark) Validate experimentally the correctness of your implementation of mergeSort. What tests did you use, how did you create them and why were they appropriate?

Task 2 (8 marks total: 4 for the correct implementation, and 4 for the written part.)

The second task is to complete the various sorting and shuffle functions (sortTrack, shuffle and smartShuffle) in the file myTunes.cpp. To do it you will need to complete the merge functions corresponding to the sorting functions mergeTrack, mergeRandom and mergePlayed. The informal specifications of the class myTunes and shuffle functions are given below. More detailed specifications are given in the myTunes header file.

To check that your code is producing the correct result, compile it with the myTunesTester.cpp file, which will generate an ordered list of track names with different played times.

Remember when you compile your code, to put all the files myTunes.cpp, myTunesTester.cpp and mersenne.cpp in the same folder and to use g++ myTunesTester.cpp.

To help you with the implementation, and to understand its performance, answer the following questions. These answers should be posted in the assignment box by the due date.

  1. (1 mark) Give a short description of your design of the function shuffle and smartShuffle, and explain why they work correctly. In your explanation you should comment on how your solutions only use O(1) scratch space, and why they have worst-case running time O(nlog(n).
  2. (1 mark) Suppose now that the specification of smartShuffle is relaxed so that the requirement is only for tracks with low play counts to be more likely to appear higher in the shuffled list than tracks with higher play counts. Can you suggest a more efficient design, using fewer sorts than were used in your implementation of Task 2?

  3. (1 mark) What about the design of the class and functionality of myTunes could be improved by using more advanced features of the C++ programming language? How exactly would you use them? (Be brief.)

  4. (1 mark) What tests did you use to check the correctness of your solutions? What issues can you identify in the implementation of the shuffling algorithms which might cause your solutions to produce weak randomisation?

The files

Template for mergeSort (partially implemented): merge.cpp

Example test program to test mergeSort: testMerge.cpp

myTunes header file: myTunes.h

myTunes class file (partially implemented): myTunes.cpp

Random number generator: mersenne.cpp

Example test code: myTunesTester.cpp

Example illustrating how to use the random number generator: random.cpp

The executable: myTunesTester

Submission instructions

Submit your electronic files for Tasks 1 and 2 using the on-line Blackboard system by the due dates.

Submit the hardcopy of your answers to the written parts of Tasks 1 amd 2 to the COMP225 Assignment Box by the due date.

At the same time as you submit your written parts, also submit the graphs you produced in your practical portfolio for weeks 2 and 4.

Assessment

This assignment will be marked both electronically, and by-hand. Task 1 and Task 2 will be marked separately; for the written answers you will be awarded marks according to the rubric scheme set out below.

You will also be "peer reviewing" your fellow students' work during the practical class, after the submission date. Details for how that will be organised will be available nearer the time. Additional credit will be awarded for participating in this exercise.

The electronic part

You may have your program “trialed” by the electronic marker before the due date. Trials will begin from Monday 16 March onwards (for Task 1) and Monday 30 March onwards (for Task 2). Any files submitted after that time and before the deadline will be presented to the electronic marker every alternate day until the submission date.

The results of the trial runs, including some feedback about your programs will be emailed to you at your student account. These trial results will be completely disregarded for marking, so do not worry that mistakes in the trials will count against you: they will not. The feedback is to help you correct errors that may be lurking in your programs, and to make sure that your code is efficient enough.

You will receive the actual results for the electronically marked part of Assignment 1 after the final deadlines for submission.

Hints for obtaining full marks in this assignment

There are a number of common mistakes that students make which prevent them receiving full marks, even if they think they have thoroughly tested their programs!

Remember that there will be several opportunities to check that your programs survive the electronic marker before the final submission date.

In spite of all these warnings, sometimes programs are submitted that, astonishingly, don’t even compile with our departmental electronic marker. This means that their authors not only ignored all the warnings above, but they did not even once use the trial-marking system.

Try not to be one of those people.

Standards-based assessment.

This table describes how the marks for the written part will be assigned.

Criterion Unsatisfactory Basic Good Excellent
Problem solving Unable to articulate a solution (in English). Able to identify (in English) known techniques and use them to design an outline for a solution which solves the problem for some cases. Able to identify known techniques, identify special cases and produce a general design to solve the problem for all special cases. Able to give a full technical account of the effectiveness of their solution.
Technical mastery Unable to provide adequate code to solve the problem. Able to provide a working solution for some of the test cases. Able to provide a working solution, and able to articulate exactly how the various parts of the program worked. Able to provide some additional evidence (besides testing, for example program invariants and big O-analysis) to support the correctness and performance of code.
Issues Unable to identify issues regarding performance and correctness of chosen solution. Able to identify (in English) the issues, and to comment. Able to identify the issues, discuss them in respect of chosen solution, and make comparisons with other approaches. Able to give a full critique of the solution given what has been learned in this exercise, and its appropriateness in other scenarios.
Self assessment Unable to give an account of why the solution is correct. Able to give a basic account of the correctness and performance. Able to state why their program is a good solution, including which test cases were used and why; able to make some statements about the performance. Able to give a well-informed assessment of the performance and programming issues, and suggestions for how the underlying datastructure and or programming structures could be changed to improve the overall solution.