Ariel Kemp - Independent Research - CS490

My goal for this independent research was to find out as much as I could, in 2 credits' time, about emergent processes in computer modeling and bottom-up computation. Both refer to a type of computation in which the final behavior of a program, or system, is not thoroughly specified by the prgrammer. Rather, the final behavior depends on a (often small) set of simple interactions which are specified, and usually have little to do with the desired overall behavior. When objects interact as specified, the resultant behavior is usually much more complex and interesting than the specified interactive behavior.

'Emergent processes' refers to complex behavior that arises when a very many instances of a certain object interact in often simple ways. A good example of this is the classical motion of the planets (ignoring the negligible forces other than gravity). Even though each particle in the Solar system obeys a simple gravitational law (the gravitational force between any two objects is proportional to both their masses and inversly proportional to the square of the distance between them) the result of all those interactions leads to the orbiting behavior of planets and moons, which is in no way encoded in the interaction between any two particles. That is to say, no interaction has the 'intent' of producing the outcome we percieve, yet that behavior emerges from the system. Other good examples of bottom-up processes include schooling of fish; flocking of birds; fluctuation in economies. Craig Reynolds has an excellent page on "Boids", his exploration in bottom up processes in the form of flocking birds.

The emergent process I implemented is self reproduction in patterns on a grid. Many people are familiar with Conway's game of Life, which isn't really a game but rather a simple model of emergent processes. This model consists of a regular grid of square cells, each of which is either 'alive' or 'dead'. Each cell can have either state in any one 'generation', that is, state of the grid as a whole. In any generation, the 'alive' or 'dead' property of each cell depends on how many of its eight grid-neighbors were alive or dead in the previous generation. When this grid is viewed graphically with 'alive' and 'dead' as separate colors, interesting, regular, and often beautiful patterns arise. The individual cells of the grid are often referred to as cellular automata. See this page for a good running example of the game of Life.

Christopher Langton (editor of Artificial Life, Proceedings in a Conference) described a grid of cellular automata whose states and prespecified rules permitted the existence of a pattern which reproduces itself in a neighboring area of the grid. The states of cells in this grid, instead of being 'alive' or 'dead', are numbered 0 - 7, and are visualized as colors. Here is what the first 'seed' generation looks like:

Each cell's state in subsequent generations relies on a set of rules much like the game of Life's, with the exception that only the four compass-direction neighbors' states are examined. The list of rules for what a certain color cell will become next generation (which color) is rather long, but very simple. An example of one such rule is "if I'm green, and my North neighbor is black, and my South neighbor is blue, and my West neighbor is green, and my East neighbor is Purple, then I'll be black in the next generation." It's not difficult to see that there are 7*7*7*7*7 rules, nearly 17,000. I didn't code all these in, rather, I assumed that a situation that arises for which I did not encode a rule produced a cell whose state is 0 (black). After many generations, the grid is filled with many such patterns:
The rules may thus be thought of as the 'physics' underlying the system that supports reproduction. Even though the initial pattern capable of reproduction was hard-coded, it isn't unresonable to expect that provided with a large enough board and a random initial pattern, such structures could emerge on their own. In fact, some computation theory related attention has turned to such grids: it has been proven that Conway's game of life is a universal Turing machine (it can do anything regarded 'computationally possible'); von Neumann spent a lot of time proving that the actions of life, including reproduction, are completely encompassed by computationally possible processes, and himself devised a grid where patterns could self-reproduce - although the design of that grid is rather ungainly, consisting of 28 states and a reproducing structure that can create any pattern at all at an adjacent part of the grid (including itself). (Interestingly enough, von Neumann created a description of this pattern which is separate from its structure on the grid, on which it would rely to perform the building of a pattern, before biology discovered the genotype/phenotype distinction, which serves the same purpose).

I used Microsoft Foundation Class to do the windowing and the execution of computation, which had to be performed in a separate thread to be free of windows-messages dependence. It was quite a chore going through MFC. You can download the executable, code, and MSDev workspace to check it out.

The second topic I set out to explore is bottom-up processes. In this type of computation, the programmer sets out to solve a problem using a method that does not attempt to embody what the programmer feels he/she knows about the problem in order to solve it, but rather, through the interaction of prespecified purely local processes. The problem I chose is the recognition of handwritten digits, and the bottom up process I chose is a neural network. It would be difficult to enter an introductory discusstion of neural networks; I would refer the interested to Judith Dayhoff's book, Neural Network Architectures, an Introduction. In addition, I attempted to use a genetic algorithm in conjuction with the neural network. Genetic algorithms concern themselves with optimizing the value of certain parameters, through a competitive search through parameter-space, when an exhaustive search is impossible because of the size of the space. An implementation of a genetic algorithm would instantiate several candidate "solutions," or points on the parameter-space. I will refer to these instances (rather standartly) as "individuals." The parameters can be referred to as "genotypes," completely describing the individual as far as the implementation is concerned. Each individual in the population is assigned a certain fitness, which corresponds to how well the parameters embodied by the individual's genotype optimize the desired value. The performance of the parameters encoded in the genotype can be thought of as the individual's phenotype. After all individuals' (all members of the "population") fitness is examined, some individuals are selected, and others are not (according to a prescribed scheme), parallelling natural selection in evolution theory. Using the selected individuals, a new generation of individuals (a new population) is created, again using one of a variety of schemes. Some individuals may have their phenotype mutated. Then the cycle begins again, with evaluating fitnesses, and continues until an individual with appropriate fitness is found. The parameters embodied in that individual's genotype are considered a solution to the posed optimization problem. For a more thorough introduction to genetic algorithms, see Melanie Mitchell's book, Introduction to Genetic Algorithms.

The neural network I implemented (at first, without a genetic algorithm) uses digitized images of handwritten digits as input. The collection of this data was rather laborious and included a totally separate program I had to write to assist me in the task. Since I could not find a resource of such data, I went around campus asking various people to fill in a sheet of paper with digits. The following is the top half of such a sheet: (the colors are reversed)

Each of the 77 sheets I collected contained 144 samples, making for a total of about 10,000 sample data images. These images had to be turned into a form useful as input to the neural net. I wrote a program to take each sheet image, align it properly using primitive vision techniques, and extract the image of the handwritten digit as well as the number that handwriters had to match. Since I knew the approximate location of each box on the paper in terms of digitized pixels (by inspecting several such sheets) as well as the size of the box in pixels (about 39x39), the program first locates each such box. Next, the best 34x34 box is searched for, 'best' meaning containing the least amount of pixels that are 'on' (white in the preceeding image) - this would eliminate the edges of the box (which isn't part of the handwritten digit) from being part of the input to the net. Then, within that 34x34 box, the smallest box bounding 'on' pixels is found. This eliminates differences in size between different data. Then, that bounding box is re-sampled (as opposed to sub-sampled) to create an image being the same size as the input layer to the net, which is 144 (a 12x12 image). By re-sampling I mean that each pixel in the 12x12 box is on/off corresponding to a thresholding of an average of the several pixels in the bounding-box-size image, the average being weighted for each pixel according to how much of it corresponds to the 12x12 image pixel in consideration. This eliminates the nasty property of sub-sampling that eliminates pixels entirely if they are not sampled, even if they are important to the image. The result of those efforts produced the following images:
Then all the images were encoded as a series of two-digit hex numbers followed by a 4-digit binary number representing the expected value that the net should output (0-9). A textfile containing all that information is used as the input to the neural network program. The neural network itself uses back-propagation to train the weights between its layers, of which there are three, having 144, 144, and 4 neurons, respectively. 144 corresponds to the number of pixels in the input "image.", and 4 corresponds to the "answer" the net would provide - corresponding the four-bit representation of the number to be recognized. I also added a graphical view of the neurons' level of activation as well as the value of the weights between the layers (this time using an easy to implement DirectX library to handle windowing):

The windows, from left to right, visualize: the input layer; the weights between the input and the hidden layer; the hidden layer; the weights between the hidden layer and the output layer; the output layer.

Unfortunately, I could not get the neural net to train properly. It would always converge very early (after 2-3 training examples) on a set of weights which was obviously wrong. I suspect that the failure to train is due to the values I chose for some of the parameters for back-propagation, including the learning rate, momentum, and temperature. All interact in a non-linear way, and I had no precedent to establish good values for those parameters.

I had hoped that a genetic algorithm would alleviate that problem, in that it does not rely on back-propagation (or its associated parameters), but rather treats the weights between the layers as genotypes, the parameters to be optimized. I wrote code that would instantiate a population of 50 individuals (50 neural nets) whose weights are initially random. Each is evaluated a number of times (the net is run to produce output) to determine its fitness. Then individuals are ranked according to fitness (using quicksort) and chosen to be a 'parent' to the next generation with a probability corresponding to rank. Some low-ranked individuals are always selected to increase diversity, a known way to speed divergence in a genetic algorithm. Then, for each member of the new generation (the population is fixed in size) parents are paired at random, and two-point crossover is implemented. Crossover is the means by which a descendent individual combines the genotypes of both parents. Two-point crossover chooses two points in the genotype (considered linearly); the descendant acquires the parameters of the first parent that come before the first crossover point, the parameters of the second parent between the first and second points, and the parameters of the first parent after the second crossover point. This ensures that individual parameters residing near the ends of the genotype (considered linearly) do not always travel together, that is, they are just as likely to be crossed over as parameters residing in any other location. Two-point crossover is implemented twice for each descendant individual, once for the weights between the first and second layer, and once between the weights between the second and output layer. Unfortunately, I did not have enough time to finish the code, and although I believe it to be complete, it does not compile as it now stands.

You can get all the code, sample data, and MSDev workspace info.

Or maybe you prefer to just get my data. The only comparable thing I've seen would have cost me over $900 over the internet, so you lucked out! At least mail me so I can know my data is useful.