Articles
The following is from an interview with the distinguished scientist and engineer Michael Arbib, professor of computer science, neurobiology, biomedical engineering, electrical engineering, and psychology at the University of Southern California, as well as director of the USC Brain Projec at that institution. �It appears in �the new book TALKING NETS: An Oral History of Neural Networks, edited by James A. Anderson and Edward Rosenfeld, and is presented here with the permission of the publisher, MIT Press.
������������ In this lively, colorful, and fascinating excerpt, Australian-born Michael Arbib reminisces about his early encounters with some of the greatest figures in the history of the various disciplines in which he himself has achieved so much.
Australians live at home, usually, when they go to University, so this was the first time I had been away from home except for an occasional journey. I remember traveling halfway around the world, leaving the summer of Australia and arriving in one of the bitterest winters in living memory in Cambridge. I visited relatives in New York, took the bus up to Boston, and then took a cab to MIT. We went along Storrow Drive, and there was the totally frozen Charles River, with this wonderful view of MIT on the other side. I remember just looking out of the cab thinking, "My God, you'd better be worth it." But it was, it was.
� A friend who'd preceded me by a year or two to study chemistry at Harvard helped me find an apartment on Massachusetts Avenue. The first thing I did when I arrived was to visit the mathematics department and met Gian Carlo Rota and Ken Hoffman, the first two mathematicians I encountered. Somewhere along the line I went into the main lobby just off Mass. Avenue and phoned McCulloch, with whom I'd had no previous contact. He was very welcoming and told me to come right over. I was sure I had made a mistake because his voice sounded so young, but when I got there, there he was with his famous white beard. Some time later he explained how he had once been in a southern town and a little boy had come up to him and looked up and said, "Are you de Lord?" I can't remember whether he said yes or no.
� As an undergraduate, for my fourth year paper, my senior thesis as you'd call it here, I had written what later became a paper published in the
Journal of the ACM in 1961, called "Finite Automata, Turing Machines, and Neural Networks." That was my first publication in neural networks.
� In my first term at MIT I was a TA and was disappointed that a TA didn't teach; all I got to do was grade homework in linear algebra. I remember being amazed at how bad students at MIT were. I had had the fantasy -- well, the fear, really -- that I would go from being one of the top students in Australia to being at the bottom of the pack because here was MIT with the
creme de la creme. Once I started grading undergraduate linear algebra, all my fears were destroyed!
� McCulloch adopted me, and I became an RA in his group. Those were the good old days when there was lots of money around so that being an RA was not particularly onerous. Basically, the Navy and other agencies gave lots of money to MIT and MIT funneled it to various people, and Warren was one of the good guys, so he had quite a lot of money to support bright young students.
� The big thing that Warren McCulloch was worried about at that time was reliability: how is it that neural networks can still function although we know there are lots of perturbations? His favorite story on this line was a midnight call from John von Neumann from Princeton saying, "Warren, I've drunk a whole bottle of absinthe, and I know the thresholds of all my neurons are shot to hell. How is it I can still think?" That was the motivating problem.
� That's what Jack Cowan was working on at that time with Shmuel (Sam) Winograd, who later became a very high level person at IBM. What McCulloch had done was to handcraft small networks, making little Venn diagrams and then showing how, as the threshold shifted, if you designed the network right, you could have reasonable insensitivity to the change in threshold and still get the network to perform pretty much as advertised. Jack and Sam took a different approach, where they took Shannon's theory of reliable communication in the presence of noise and said, "What if we think of the neurons as doing a process of computing, rather than coding, and we try to make the redundancy in the network fit in with Shannon's ideas?" Their book came out a year or so later,
Reliable Computation in the Presence of Noise[1963].
� 1 think that the most influential thing in McCulloch's group for me at that time was his partnership with a guy named Bill Kilmer, who I think had come in from Michigan, later went to Montana, and eventually joined me at the University of Massachusetts. Bill was working with Warren on the idea of the reticular formation as a mode selector. Warren had been influenced by the idea that the reticular formation is involved in switching the overall organism between sleep and wakefulness. These were ideas from Magoun on the waking brain, and Warren had extended that to the idea that there were various modes of behavior.
� There's a joke in neuroscience, which is due to Karl Pribram, about the limbic system being responsible for the four Fs: Feeding, Fighting, Fleeing, and Reproduction. It was Warren's idea to extend the sleep-wakefulness idea so that perhaps the reticular formation was responsible for switching the overall state of the organism. There would be one part of the brain that would say, well, is this a feeding situation or a fleeing or whatever situation, and then the rest of the brain, when set into this mode, could do the more detailed computations. Mode switching.
� The other part of the equation came from Arnie and Madge Scheibel, who were a husband and wife anatomy team at UCLA. Arnie is still alive and well, but his wife died many years ago. They had done some lovely studies of the reticular formation and had observed that the anatomy was such that the dendritic trees ran roughly parallel to each other and orthogonal to the fibers running up and down the axis of the reticular formation. They had suggested a poker chip analogy-that you could replace all the detail by a stack of poker chips, where there'd be a lot of cells in each chip, but because of the way the dendrites were placed, they would have roughly homogeneous input and output. This suggested to Warren and Bill Kilmer the idea of modeling the reticular formation as a stack of modules corresponding to the anatomical poker chips. Each one would have a slightly different selection of input, but each would be trying to make up its mind as to which mode to go into. They would communicate back and forth, competing and cooperating until finally they reached a consensus on the basis of their divergent input and that would switch the mode of organization.
� In looking back, I think that the ideas in that paper were a tremendous influence on me because they said two things. One was that if you're going to study very complicated neural networks, you shouldn't do it all at one level, that you need an intermediate level, in this case their modules. This later became the basis for my theory of schemas, where I replace the anatomical modules by functional schemas, but the idea was that you need a high-level language in which to explain the functional interactions rather than mapping everything immediately down onto the neural net. And the second thing was the notion of competition and cooperation.
� At that time there were two ways of thinking about neural networks that I was aware of. One was the stuff that I'd done my first paper on--namely, the fact that you could build any finite automaton using a neural network. You could express the state of the system as the firing of the neurons in it, and then the input together with the state would determine the next state. You could set up the wiring in such a way as to represent any finite-state transition you cared to look at. Then of course if you added it to a control box to run a tape, you had a Turing machine and universal computation. That was the result that really went back to the '43 McCulloch-Pitts paper but written there in an unintelligible way.
� In fact, one of the first things I did when I got to MIT was go to see Walter Pitts, to go over with him the '43 paper because there were some obscurities in the logic. Pitts had adopted the logical notation of Rudolf Carnap and had written an almost impenetrable paper, so in many cases I had to rederive the results rather than follow their proofs. I wanted to check with Pitts that I had got it right. It was a terrible meeting. We got about two sentences into the conversation, and Pitts started shaking and wouldn't stop, so I had to leave. It turned out he was already far gone into the DTs.
� McCulloch's story was that Walter Pitts, as a fourteen-year-old, had been about to be forced by his family -- which was very poor at the time, early in World War II -- to leave school and go and work to raise money for the family. By chance, he was sitting on a park bench when he got into conversation with an elderly man, and fortunately for him the elderly man was Bertrand Russell, who introduced him to Carnap. Carnap knew that Warren McCulloch, who was then in Chicago, was interested in making a logical theory of the brain and brought the two together, and that's what led to the classic McCulloch and Pitts partnership. That led to a long period in which Pitts, who was an ugly but very bright person, became sort of an adopted son of the McCullochs. Unfortunately, he was terribly insecure and wasn't prepared to be loved for his brilliance; he wanted to be loved for his looks, and he had no looks. It was a very strange relationship, I think, where Pitts was the child and yet, in some ways, intellectually the more powerful of the pair, though McCulloch knew an incredible amount about the brain and had been a very successful anatomist and still was at that time. Apparently, because of all these different psychological pressures, Pitts eventually went the way of drink. I think for many years Lettvin became essentially his guardian and managed to have him maintain a research position at MIT even though he was long past being a brilliant achiever.
� As I was saying, there were two views about neural nets at the start of the '60s. One was that you could build any finite automaton and the other was the beginning of learning theory. That time was pretty early. Basically, we had the perceptron from Rosenblatt, there was some work from Taylor in England, and a few other beginnings. We were just making the transition into thinking about what has become the sine qua non for most people of neural nets today--learning theory. What was missing in the two conceptions -- (a) you could do anything, and (b) you could learn how to do it -- �was the notion that you should think of a more complicated system in which there were subsystems interacting. Those could be not necessarily doing the same thing, with each having well-assigned jobs; each could be competing. It might have part of the truth, and then some process of interaction was required, so then there's competition and cooperation. The notion of the multilevel view was I think the biggest lesson I got from my time with McCulloch.
� I also met Norbert Wiener very early in the piece and ended up as his Ph.D. student, but it turned out to my disappointment that Wiener was not very interested in cybernetics anymore and was devoting himself to statistical mechanics. But, since he was the great founder of cybernetics and I really wanted to understand how his mind worked, I signed up to do a thesis in his area. Then Norbert Wiener went on sabbatical, and during about six months I only got one letter from him. It was a charming letter, which I've kept, saying that he'd visited Cordoba in Spain where he had paid homage to Moses Maimonides, the great medieval Jewish philosopher, whom he claimed as an ancestor, but this really wasn't advancing me very much in the field of statistical mechanics. I turned to Henry P. McKean Jr., who was a superb probability theorist with some interest in statistical mechanics and transferred to him. After a while we decided this was not the right subject, and I ended up putting in a thesis proposal for fractional integration, which ties in very much with the current mathematics of fractals. The idea was that in a lot of applied mathematics there was a great interest in white noisedriven processes. McKean and Kyoshi Ito had written a lot on an approach to stochastic integration, so I proposed to generalize it to a much broader class of stochastic processes. But, unfortunately, I succeeded too well and found a very simple way of doing it. The day before my Ph.D. defense McKean asked me to take a walk. I knew what he was going to say, but cruelly enough I let him go ahead and suffer through saying it -- namely, please write another Ph.D. thesis. So I said OK.
� This may have had a very big impact on my orientation as a scientist because I had already been accepted for a full summer course at the Rand Corporation, run by Allen Newell and his colleagues, on his approach to artificial intelligence [AI]. Had I spent the summer at RAND I might well have ended up with a much more conventional Al career than I have had. Instead, I went off from Boston to rural New England because McKean lived in a small village north of Hanover, New Hampshire. Hanover was where Dartmouth College sat, and since that was the nearest mathematics library, that's where I spent the summer, driving up to McKean's house once a week past the little red school house and through the covered bridge, and turning left at Waldo Peterson's place -- and so grew to love New England. I managed over that summer to write another thesis and finish. The one sad story about that was that when I went to pick up my diploma in September of '63, 1 discovered that MIT had written the title of the thesis on the diploma itself, but they had the title of the rejected thesis not the actual thesis. What I regret is that at that time I made them change the title. I wish now, of course, that I had the amusing diploma rather than the correct diploma, but never mind.
� Actually, a lot happened in the two and a half years that I was at MIT, besides the Ph.D., which was just a small part of it. There was a lot of involvement with McCulloch's group. One of the interesting things about the involvement was that McCulloch had told me not to tell Wiener about it.
� Wiener had been a child prodigy and had to the end of his days retained many of the marks of his childhood as a prodigy. He was in many ways insecure, would need a lot of praise, and was in no way a judge of human character. He had published two books called
Ex-Prodigy and
I Am a Mathematician. Cruel people said they should be called
Ex-Mathematician and
I Am a Prodigy, but, in fact, he was a very great mathematician until the end of his days. In this century the American Mathematical Society has only twice published memorial issues of its "Bulletin," one to honor John von Neumann and the other for Norbert Wiener. I think it's very interesting that they did that because these are both men who helped found the study of cybernetics and neural networks. They're also men whose work spanned from applications to very deep pure mathematics. Wiener was a great man, but perhaps a defective human being.
� McCulloch had been a very strong neuroanatomist and the work he did with Pitts and the later work on the reliability problem showed his lifelong devotion to trying to see how to bring the methods of logic to bear in the appropriate way to probe the nervous system. On the other hand, he was a romantic, and he would rather tell a good story than be totally shackled by all the facts.
� I remember once a plane ride with Jack Eccles, the Australian Nobel Laureate in neurophysiology. We both had been at a meeting in Boston. At that time I was living in Stanford, he was living in Chicago, and so we flew as far as Chicago together. He was really anti-McCulloch because of McCulloch's somewhat romantic way of handling the facts. What I pointed out to him was that most of us who worked with McCulloch had enough sense to accept the inspiration of his ideas but knew that we then had to do the hard work of finding out which of his ideas were supported by the literature and which weren't. In this way, a lot of young people had really gained a great deal of insight into the nervous system and a great deal of inspiration for their careers from Warren. The other thing about Warren was at that time he was drinking a lot. But where it caused the DTs in poor Pitts, for Warren a bottle or so of Scotch was just the key to loquaciousness. For people like myself who saw a lot of him, it was a bit of a pain because the same stories would come out again and again, but for people who were seeing him for the first time, it was always extremely stimulating and motivating.
� I finished my Ph.D. and was going around MIT paying my farewell respects and found myself in Norbert's office for the last time. So we're chatting, and he says, "What else have you been doing while you're here7" and I think to myself, "Oh, it won't hurt to tell him," and I said, 'Well, I've been working with McCulloch," and immediately Wiener went into an apoplectic fit and said, "Why, that man, that wretched man, why, if I had the money I'd buy him a case of whiskey so he could drink himself to death." Wouff!! So I spent the next fifteen minutes trying to be loyal to McCulloch while soothing Wiener.
� I discussed this reaction with a number of colleagues. The best explanation I have -- I have no independent confirmation of it, but it rings so true with the characters of the protagonists that I refuse not to believe it -- came from Pat Wall, an expert on the neurophysiology of the pain system. Many years before, in the '50s, buoyed by the success of his book on cybernetics, Norbert had decided to develop "the" theory of the brain. So he had gone to Warren and said, '"Warren, tell me all about the brain, and what the open problems are," and Warren had told him. But, of course, Warren had told a somewhat romantic story, and Norbert, being no judge of human character, had not understood this and took it all as a totally objective presentation of the state of play. He had then spent two years developing a theory to explain all these "facts," and when he presented the theory at a physiological congress, he was howled down. Instead of realizing what the situation was, he thought McCulloch had deliberately set him up, thus robbing him of two years of his life and his chance to establish a great theory. This is Pat Wall's explanation. I must say it jibes so well with the character of both men that I'm prepared to believe it.
© Copyright MIT Press, all rights reserved. For more information on the book, see http://mitpress.mit.edu/book-home.tcl?isbn=0262511118. Other distinguished researchers interviewed for the book include Jerome Y. Lettvin, Walter J. Freeman, Bernard Widrow, Leon N. Cooper, Jack D. Cowan, Carver Mead, Teuvo Kohonen, Stephen Grossberg, Gail Carpenter, James A. Anderson, David E. Rumelhart, Robert Hecht-Nielsen, Terrence J. Sejnowski, Paul J. Werbos, Geoffrey E. Hinton, and Bart Kosko.
COMMENTS