Due October 24, 1996

Heuristics have wide applicability. Attached are three different problems. Your group will either volunteer for, or be assigned, one of the three problems. Please be prepare a short written report on the problem and be prepared for a 10 minute presentation on October 24. One group will be chosen to present their results with other groups who worked on the same problem being given the chance to comment.

The purpose of this project is to get you thinking about heuristics. While most heuristics are run by computer, you will probably solve these problems manually, perhaps with a spreadsheet to aid you.

Clustering is an activity that occurs in many fields. For instance, in the design of a flexible manufacturing system, products that require similar resources (machines, parts, and so on) are clustered into work cells. In marketing, given a universe of hundreds of products, it is useful to cluster them into a number of near-substitutes and then analyze each cluster independently.

We formalize the clustering problem as follows: given n items, we have a measure between every pair of items. means that the items are similar with a higher value meaning a stronger similarity. means the items are dissimilar, with a more negative value denoting higher dissimilarity.

To fix ideas, here is a table of information about some automobiles:

``` 1 AMC_Ambassador_8    1  0  0  0  0  0  0  1  0   0   0   0   0
2 Buick_Special_6     0  0  0  0  0  0  1  0  1   0   0   0   1
3 Buick_Special_8     0  0  0  0  0  0  1  0  0   1   0   1   1
4 Buick_8_Full        0  0  0  1  0  1  1  0  1   1   0   1   0
5 Buick_Riviera       0  0  1  1  0  0  0  0  0   1   0   0   0
6 Cadillac_Chevy_II   0  1  0  0  1  0  1  1  1   0   0   1   0
7 Chevelle_6          0  0  0  0  0  1  1  0  1   0   0   0   0
8 Chevelle_8          0  1  0  1  1  0  1  0  1   1   0   1   0
9 Chevrolet_Full      0  1  1  1  1  0  1  1  1   1   1   1   0
10 Corvair_6           0  1  0  0  1  1  0  1  0   1   1   1   1
11 Corvette            0  0  0  1  0  0  1  1  0   0   1   0   0
12 Chrysler_Newport    1  0  0  0  0  0  0  0  0   0   0   0   0
13 New_Yorker          1  0  0  0  0  0  0  1  0   0   0   0   1
14 Dodge_Full_Size     1  0  0  0  0  0  1  0  0   1   0   0   0
15 Falcon_6            0  0  0  0  0  0  1  0  0   0   1   1   0
16 Fairlane_6          0  0  0  0  0  0  1  0  0   0   0   0   0```

Each column represents an automobile feature (like power steering) and each row represents a type of car. A ``1'' says that the car has that feature while a ``0'' says the car does not. We are interested in grouping the cars into classifications that have similar characteristics (from which we can then do further analysis).

As a measure between the cars, we will take the value 13 minus 2 times the number of characteristics in which the cars differ. For instance, the first two cars differ in 5 places, so the measure of similarity is 13 - 10 equals 3. The first and ninth car differ in nine places so its similarity measure is -5.

Our goal is to cluster the cars so as to maximize the total value within the clusters. So, for instance, if 1, 2 and 9 are together, the value of that cluster is 3 (for 1 and 2) plus -5 (for 1 and 9) plus -5 (2 and 9 differ in 9 places) for a value of -7 (which is a terrible cluster). One feasible solution is to place each item in a cluster by itself. Such a solution has value 0. Another solution would be to place all of the items in one cluster.

Problem 1. Give a greedy heuristic for solving the clustering problem. Describe your heuristic in sufficient detail that a reader can unambiguously run the heuristic on their own data). This heuristic should add items to clusters one at a time and never take out an item once it is assigned to a cluster. Illustrate your heuristic on the above data. What is the resulting solution value?

Problem 2. Give an exchange heuristic for this problem. Illustrate the heuristic first by starting with the solution found in problem 1. Then illustrate your heuristic by starting with the solution where each item is in a cluster by itself. Finally, illustrate your heuristic starting from the solution where all items are in a single cluster. Note that your exchange heuristic need not find an improving exchange for all three starting points, but it must be applicable to all three.

Problem 3. How would your greedy heuristic and exchange heuristic be modified in each of the following cases:

1. Each cluster can have no more than 5 cars.
2. There must be at least 6 nonempty clusters.
3. There must be exactly 4 nonempty clusters.

A telecommunications company designs systems for client installations. These installations consist of a large number of computers that must be interconnected. One common design is to place the computers in a number of rings. Communication within a ring is very cheap and fast. Communication between rings is expensive. Unfortunately, there are limits on how big a ring can be.

More formally, there are K computers, each with a communications requirement . Between computers i and j, it is necessary to send messages. Computers are placed in a number of rings. If two computers are placed in the same ring, each message that passes from one to the other is free. Messages that are passed between rings cost \$1 each. There is a limit on the amount of traffic on a ring. If S represents the set of computers on a ring, then represents the intensity of the ring. There is generally an upper bound on intensity.

To complicate matters, it is permissable to put a computer on more than one ring, so communication with that computer is free for any computer on any ring the computer is on. It costs \$75 for each ring a computer is on (so if one computer is on three rings, it costs \$225 for that computer alone).

Your goal is to design a heuristic solution technique for finding a minimum cost solution to this problem. To illustrate your techniques, you have been given the following sample data:

The intensity of each ring must be no more than 230.

1. Provide a greedy heuristic for this problem. Illustrate your heuristic on the above sample problem.
2. Provide an improvement heuristic for this problem. Illustrate your heuristic on the above sample problem using your solution in the previous question as the starting solution.

The Twelfth Regional Bank of Topeka, Kansas has asked us to look at their check sorting machine. Any large bank must sort the checks it has cashed and return them to the banks on which they were written. In the interest of profit, it is important that these checks be returned as quickly as possible (as seen in the paper on p. 286 of the text). Checks are sorted using a personal computer based check sorting machine. Such a machine has one input hopper and perhaps twenty output pockets. Before a batch of checks is run through the machine, it may be ``programmed'' in a simple and rapid fashion so as to specify a destination pocket for each check according to its source bank. As each check is processed, the check sorter reads the encoded bank number and directs the check to its proper pocket. Obviously, if there are more than 20 source banks in a batch, then some pockets must contain checks from several banks. A pocket with multiple banks must be run through the sorter again for a second pass. If the second pass has multiple banks in a pocket, those must be run through a third pass, and so on. Clearly, if a significant fraction of the checks in a batch are from one bank, then one pocket should be programmed to receive checks only from that bank. These checks are said to be ``killed'' in that pass.

The Twelfth Regional has collected statistics over the past year and has identified the 25 banks with which it does the most business. The average number of checks processed each day is in table 1.

Table 1: Daily Checks

This bank has a four pocket check sorter which processes 200 checks per minute. The current sorting method assigns 6 banks to each of three pockets and 7 banks to the fourth pocket in the first pass. In the second pass, each of the four pockets is processed in a separate batch with three out of the 6 or 7 banks being ``killed'' during this pass. In the third pass, four batches must again be processed and all remaining banks are ``killed'' in this phase. Since setting up the sorter takes very little time, the total sorting time is roughly proportional to the number of checks which pass through the machine during all of the passes.

Once it is determined how many banks to kill on a pass, it is a simple matter to assign banks to positions: simply make certain that larger banks are killed quicker than smaller banks.

The person currently running the machine admits that this sort pattern might not be the best possible, and that perhaps she should be killing some large banks on the first or second pass. She also points out that if she knows on which pass a bank is killed, it is simple to work out the details of which bank goes into which pocket during all the passes. The only concern is that you cannot kill more banks in a pass than 4 times the number of nonkill pockets in all patches of the previous pass.

1. Derive an improvement heuristic that takes any sorting pattern and, making just a few changes, tries to find a better pattern. Your heuristic must be stated in a general, but precise, form so that it can be applied to any number of banks and to a check sorter with any number of pockets. How does this heuristic work on the sample problem and current solution?
2. Derive a solution construction heuristic that seems likely to give reasonable answers. Some concepts you might want to keep in mind include decomposition, greedy, and/or special cases. You might also want to try to formulate an integer program and use that as a starting point. How does your heuristic work on the sample data?

A consultant would be particularly interested in how this can be applied to much larger problems. For instance, the post office uses a similar machine to sort mail. In that case, there might be 10,000 different codes and 100 output pockets. How well will your method work in that case?