Clarifications and Hints for Data Structures Problem

Problem 7 - Top Ten Words

ERROR:
The handout for this problem accidentally showed only 8 of the 10 words. The two additional lines in the Expected Output should be:

    1     another
    1     assume

NOTE:
We did the same alteration for the "auto judge" ON THIS PROBLEM as we did for Problem 5. Therefore, you need not worry about how many spaces there are between the value and the word. The two lines for the heading and title, just above the two columns of output should be printed as you see in the Expected Output sample.

HINTS (by Dr. Ron Smith):
The Decorate-Sort-Undecorate algorithm that I talked about in class can be used to solve this problem, but it requires a rather clever decoration. There is an alternate straightforward approach that is more elegant. (Of course it is not required--you may do the problem any way you wish.) First I will describe the difficulty, then outline three possible approaches to the solution.

Difficulty: Once the dictionary of word frequencies is constructed, you can make a list of (word,n) pairs. Suppose the list is named frequency. For example,

   frequency = [('a', 4), ('c', 5), ('b', 7), ('e', 4), ('d', 7), ('f', 5)]
   
We want to sort frequency so that the numbers are in one order (descending), but where there are ties, the strings are to be in the opposite order (ascending). If you decorate with just the numbers and sort in either order, the strings where ties occur will be in the same order as the numbers, and will never have the opposite order that is required.
  1. An elegant solution is to alter the way that comparisons are made so that sorting frequency puts everything in the right order. To do this, we first write our own comparison routine that replaces the routine cmp that normally does comparisons. Then we can use it in the sorting process.
    def compare(x,y):
         """
         Compares ordered pairs x,y so the second slot is decreasing and if unequal, 
         otherwise the first slot is increasing.
         """
         
         if x[1] != y[1]: 
               return -cmp(x[1],y[1])  #if different, the second slot is in decreasing order
         else:
              return cmp(x[0],y[0])    #otherwise first slot is increasing
    
    To sort using this comparison in place of the default (which is named 'cmp'), use:
         frequency.sort(cmp=compare)
    
    This gives the desired order:
         frequency = [('b', 7), ('d', 7), ('c', 5), ('f', 5), ('a', 4), ('e', 4)]
    
  2. Here is the best Decorate-Sort-Undecorate solution I could find:

    First sort frequency in descending alphabetic order.
    frequency = [('f', 5), ('e', 4), ('d', 7), ('c', 5), ('b', 7), ('a', 4)]
    
    Decorate with pairs (n,i) where i is the index of (word,n) in this list.
    decoration  = [(5, 0), (4, 1), (7, 2), (5, 3), (7, 4), (4, 5)]
    
    Sorting the decorated list in descending order and then Undecorating (and converting from a tuple to a list) will give you the desired result:
    frequency = [('b', 7), ('d', 7), ('c', 5), ('f', 5), ('a', 4), ('e', 4)]
    
  3. A third possible solution is this:

    Sort frequency in descending order of the second slot For each frequency that occurs, make a list of all words with that frequency, and sort them into increasing order.