The defining question of geek culture before the .com boom was, ‘What computer did you program on first.’ Before Microsoft became ubiquitous, there a period where many different systems, all incompatible, became available within the price range of the average family. Brian Graber worked on his Dad’s IBM PC, Cristin Herlihy had an Apple II, the O’Neil’s had an Atari computer (they had the game console, too). My cousins from New York lent us a Commodore VIC20 with a two volume set on teach yourself BASIC. My cousin Christopher came to visit for a week, and ended up staying for the summer. I read out loud out of the books and he typed. By the end of the summer, we were able to program our own text based adventure game.
Even more impressive, we could perform such amazing feats as turning the background and foreground color to black, making text entry difficult. This minor bit of wizardry was performed by using the the arcane command poke. The format was poke memory address, value. It allowed you to program at an incredibly simple level. Note I said simple, not easy. You could set any memory address on the machine to any value you wanted. Once you knew where the memory location was that controlled the text color, or the background, you could produce magic.
The VIC 20 returned to New York at the end of the Summer, but the Holidays brought along a Commodore 64 and a subscription to Computes’ Gazette. A month or two later, I talked my mom into subscribing for the Disk that accompanied the magazine. Now, you may accuse me of being lazy, but most of the programs they release were nothing more than a long string of poke instructions to be typed in. They even released a checksum program, to make sure that the numbers added up to the expected values, but I never go the Canyon Crawler program to run correctly. The Gazette, in addition to a word processing program and a slew of video games, published two tools that were very instructive. One was a font editor, and the other a sprite graphics editor. With these simple tools, you could make video games that were arcade quality (1985 arcade quality, that is). My first video game was a spy game, where you had to parachute down between two roving searchlights. If either touched you, you fell to your doom. Programming this required using the other most arcane of instructions, peek. Peek told you the value of a memory location. Armed with the peek command and the address of the joystick port, I could move the parachute left and right, while it drifted ever downward.
In retrospect I should have stayed with the Parachute idea. On the next screen you might have had to parachute onto a moving boat, or a bouncing trampoline, or perhaps avoid a flock of geese. However, I wanted to make a game that scrolled. I had a vague idea that maybe I could reset the CPU to look at any memory location for its character map, and coupled with a really cool font set you could wander through a maze of building looking to steal secret codes. What I didn’t know was that this type of machine was based on memory mapped IO. Certain fixed memory locations were actually just links for other processors, or input and output devices. There was no way to change where the CPU looked for the character map, as it was the result of the underlying electronics.
I was frustrated by the limitations of BASIC. I wanted to know what all those peeks and pokes were doing. Once I started reading about assembly language programming, I realized that the coders at Compute were distributing, not source code as they would for a program written in basic, but a sort of executable. The C64 only knew how to load and run basic programs. These long listings of pokes were actually copying instructions into memory. Not just color codes for the background or bitmaps for sprites, but instructions like, ‘Load the value from this memory location into the X register.’ I had no idea what a register was, but still, this was pretty cool. The only problem was that I never found an Assembler for the Commodore, so my hacking was limited to converting instructions into numeric codes, and loading them in by hand: my learning mostly theoretical.
I mentioned Cristin Herlihy had an Apple II. This became significant during my senior year of high school when I took a structured programming course in Pascal. I spent long hours over at the Herlihy’s debugging programs to do simple text based operations. The cool thing about Pascal over both Basic and Assembly was, get this, you didn’t need line numbers. GOTO, the standby command for BASIC programming, was forbidden by our teacher. I had learned subroutines and looping before, but you got to call everything by a friendly name like ‘do_something’ as opposed to calling with the cryptic GOSUB 65000. Also, we had floating point numbers. But where were the graphics? I never learned that, as it wasn’t on the AP exam. Programming became more practical, but more removed from the reality of the underlying hardware. It must have been a good course, though: I managed to get a 5 out of 5 on the Advanced Placement test.
After toying with the idea of going into music (I was a fairly serious Jazz Saxophone player in high school), I ended up going the opposite extreme: The United States Military Academy at West Point, or, as I tend to call it, Uncle Sam’s schools for delinquent children. The 5 on the AP test got me out of the first two levels of Computer Science, and into the Data Structures and Algorithms. Now instead of working with floats and strings, we were working with linked lists, arrays, stacks, and heaps. We learned how to sort and search, but more importantly, we learned how to analyze algorithms. I took the the standard set of courses: Language Theory, numerical analysis, discrete mathematics, operating systems, software engineering, and so forth. By having opted out of the first two classes, it opened up more electives at the latter part of the program. I got to take compilers, graphics, artificial intelligence, and databases. I was well armed to enter the workforce as a programmer.
Except that I entered the Army as an Infantry Officer. For the next several years my interaction with a computer was primarily via Calendar Creator and Microsoft Office. One time, I needed to copy a file from one computer to another, and it was too big to fit on a single floppy, so I wrote a short Pascal program that cut the file in half, byte by byte, and another that put it back together. I eventually got an America Online Account, as I hadn’t had email since graduation. Information systems at the lowest levels of the Army were still based on the time honored tradition of filling out a form and putting it in the inbox. The primitive systems worked, to a point. I learned what it really meant to be an end user. Using the applications at our disposal, we built better systems, planning training and tracking soldiers administrative needs in home built systems. We did unspeakable things with Excel spreadsheets and Powerpoint presentations. Division Headquarters had a scanner, and I showed out operations officer how we could scan in the maps and draw operational graphics on them electronically.
My first job out of the Army was at Walker Interactive Systems, a company that built accounting software that ran on IBM mainframes. The group I worked in built applications that ran on Windows machines that ran the transactions on the mainframe. My team supported the infrastructure that made communication between the two worlds possible. The mainframe stored its letters using a mapping called EPCIDIC, the Windows machines used ASCII. Even more confusing was the way the two systems stored numbers. Back on the Commodore 64, I only had to worry about a single byte of data. But Systems had grown so that a number was stored across four bytes. For historical reasons, Microsoft decided to store the least significant part of the number in the first byte, and the most significant part of the number in the last byte. IBM chose to store it the other way around. To avoid having to deal with these problems in the buffers we were sending, the architects had decided that all numbers would be sent in their string representations. While we might send a positive or negative sign, we never sent around decimal points. A certain field was just defined as 10 digits long, with the decimal point assumed to be between the eight and ninth digit. Dates had four different formats: Julian, Year Month Day, Day Month year, and that barbaric American format Month Day Year. The system was designed so that we would package up a large amount of data, write it into a buffer, and send it across the network to the mainframe. The Mainframe would plug and chug and send back the data in another buffer. This type of transaction mapped to another technology that was justing make inroads; the Hypertext Transport Protocol, the underlying workhorse of the World Wide Web.
One thing about developing code is that sometimes you are so busy you don’t know how you are going to get things done, while at other times you are just waiting for someone else to finish, or just waiting. During a long period of downtime, I got hooked on web comics. One of them, Userfriendly.org, touted the virtues of Open Source software and the Operating System built around the Linux Kernel. Intrigued, I found an old Pentium 100 and purchased a Copy of Red Hat 6. While the knowing out there might scoff at me paying for free software, it proved to be a great investment. This was my entry into the world of Free software. When I had booted that Commodore 64, instructions that had been burnt into read only memory would execute, making it impossible to tell the computer to do other things. With Linux, I had access to this same type of code, but now with the ability to look through it and change it. I learned how to compile my own Linux Kernel. Because the Ethernet Card that came with the machine was not supported by Red Hat, I had to get code from the source and compile it in myself.
In this case, the source was a guy named Don Becker, who worked for NASA. His project was making a Supercomputer by linking together lots of little computers. In a nod to his Nordic ancestry, he named it after one of the heros of Germanic legend: Beowulf. Because his Beowulf was built more like Frankenstein’s monster, sewn together from many different pieces of available hardware, he needed to be able to use all the various types of hardware he found. The Linux Kernel allowed him that flexibility. The price for the use of Linux was that, if he distributed the executable, he had to distribute the source code as well. Don became the Guru of Ethernet device drivers for Linux. This is what is known, in business speak as Win-Win. Linux and its community won because it got good drivers. Don won because he was able to build his supercomputers and spin them off into a company that specialized in Beowulf clusters. More on that in a bit.
Just before leaving Walker, I looked into rewriting the Client side of our code using a language that was really getting popular: Java. Java was yet another step away from the hardware. As a language, it was not designed to be compiled to the instruction set executed by the CPU of the machine it ran on. Instead, it was converted to a very simple set of instructions that were interpreted at runtime into the CPUs instruction set. This final step is what made Java so portable. Now your code, once compiled, could run on any machine that had a Java Virtual Machine installed. There were limitations, of course. It ran slower than code compiled for a specific CPU. The graphical user interface layer, called Swing, was especially slow. So it never really caught on for client applications (although right now I am using Open Office Writer, a Swing based word processor to type this). It was, however, a perfect fit for business logic processing, especially web site development.
So I, along with the rest of the San Francisco Bay Area, learned to develop websites. The first was Tavolo, the second incarnation of what was originally digital chef. Tavolo was a specialty food and cookware website developed by the Culinary Institute of America, or as they like to be called, the CIA. We wrote their new website using a product called Dynamo, from the Art Technology Group. Dynamo was an application server. It was a program designed to run other programs, and many of them at once. Dynamo had components for personalizing a website based on the person who used it, and a significant amount of support for ECommerce. Many of the solutions to these problems ATG put into Dynamo were parallel to, but different from, the solutions that eventually became the standards put out by Sun for Java enterprise computing. Since the marketing people at Sun decided that Java need a second version this became Java 2 Enterprise Edition, or J2EE. Maybe they thought is sounded better than JEE.
As these standards got better and better developed, various people started implementing them. Some were companies trying to sell their implementations. But many people who were doing Java programming released their code under various open source licenses. The most popular, the Tomcat Web Server was developed under the auspices of the Apache organization, the same folk who made the Apache web server. JBoss, (renamed from EJBoss due to Copyright Issues with Sun) was the transaction server and database wrapper. These performed the same job as Dynamo, but were free. Additional packages existed for various stages of website development, database access, document generatation and more. I now had open source code for an operating system, and for all the software I needed to build Enterprise Software. As the dot-com bubble burst, I headed to a small company that needed a website built. Using this stack of open source software, we brought up the website in a few weeks, and grew it over the course of the following year. All of my follow on projects have used this mix of Java and open source software.
The secret to Java’s success is also one of its shortcomings. Java comes from a long line of programming languages that try to make it hard for the programmer to do the Wrong Thing. In particular, Java allows you to use memory without having to clean up after yourself. Once an object is no longer referenced anywhere in the system, it is eligible for garbage collection. While there are numerous other features that make Java a good language in which to work, this is the one that most contributes to productivity. The drawback is that sometimes you need to know exactly where memory comes from, how long it can be used, and when it can be reclaimed. In Java, memory is difficult, if not impossible, to access directly. Probably most telling is the fact that Java is not programmed in Java, it is programmed in C and C++. Because something as critical as the Virtual Machine that Java runs on has to be fast, or all programs are slow. Where Java takes the position that programs should check for and report errors to speed development, C requires a much more dedicated quality assurance process to make sure the programs don’t have an unacceptable amount of bugs. Not that you can’t write fast code in Java, and not that you can’t quickly write bug free code in C, It is just that each language makes it easier to do its own thing.
So I made the effort to break out of the very successful track I was in, take a cut in pay, and get in to Linux Kernel development. In a sense, this was a return to my roots, being able to go right to the hardware. I spent quite a long while looking, when opportunity found me. A recruiter called me from Penguin Computing. Penguin is a hardware company, they sell Linux Servers. Cool. About a year ago, they bought Scyld. Scyld was the company spun off from Nasa’s Beowulf project, lead by Don Becker. Itold you there would be more later. The geek value was immense. I was hooked and convinced them to hire me.
Why was I drawn to computer science? I like patterns. I like being able to hear the chords of “Always look on the Bright Side of Life†and realizing they are the same as “I got Rhythm†just with the Chorus and verses reversed. I like trying to tell which of my nephew’s personality traits came from his mother and which came from his father. When it comes to programming, I like taking a solution, and extracting the generic part so I can extend it to solve a new problem. Design Patterns work for me. I’ve been interested in many portions of computer science, and enjoy learning the commonalities between tuples flowing through portions of a query, packets flowing through a network, and events flowing through a graphical interface.
The one topic in my course on Artificial Intelligence that really piqued my interest was neural networks. After several decades of trying to do it the hard way, scientists decided to try to build a processing model based on the brains of living organisms. Animal brains do two things really well. First, they process a huge amount of information in parallel. Second, they adapt. Traditional neural networks (funny to be calling such a young science traditional) are based on matrix algebra as a simplification of the model. One vector is the input set, multiplied times a matrix gives you an interim result, and then multiplied by a second matrix gives you an output set. The matrices represent the connections of neurons in the brain. At the start of the 1970s, scientists were convinced that Neural Networks were the big thing that was going to get us Artificial Intelligence. But traditional neural networks learn poorly and do little that can be called parallel processing. After a brief time in the sun, they were relegated to short chapters in books on AI. They are still used, but people no longer expect them to perform miracles.
If you believe that upstart Darwin, real intelligence is the result of millions (or some greater illion) years of evolution. Expecting a cheap imitation to learn to perform a difficult pattern analysis with a short amount if training is either a case of hubris or extreme optimisim. If I had to guess, I would say both. Around us are a vast (albeit dwindling) variety of animals that all have wonderful examples of neural networks. We are lucky in that we have such great models to work from, we should learn from them. I would like to use a neural network model as a starting point for a processor that learns and moves like a living creature. Recent work with hardware based neural networks have performed superbly at voice recognition. The focus on the timing between the neurons, an aspect not accounted for in the simplified model, was a key differentiator. The animal brain is superb at cycles such as the motion of the legs while running. Once the basic cycle is learned, the system will be taught to adjust for rough terrain, different speeds, and quick changes of direction. If the behavior of a single muscle is analyzed, we see it has a pattern of contracting and releasing timed with the activity it is performing. The brain controls all the muscles in parallel, as well as absorbing input from the various senses. This cycle can be seen as a continuously adapting system built out of: 1) a desired process (running), 2) the state of the muscles and other organ systems, 3) a prerecorded expectation of the flow odf the process, 4) and the inputs to the senses. In order for a cycle to progress, some aspect of the output must be fed back in as input. Additionally, a portion of the system must remain aloof and compare the actual end result with the desired end result, using that to tune the behavior of the system. The best result will come from an interdisciplinary approach: the system should be engineered as a mix of software and hardware, traditional engineering techniques and genetic algorithms, using everything learned from biology, especially animal physiology. The latest advancements in materials science will be needed for making motive systems that get maximal energy for minimal weight. Currently, we can program a robot that can walk. I want to develop a robot that can learn to run.
And to run it will take great advances in Operating Systems. An animal receives and processes a vast amount of information from all its senses at the same time. Layers upon layers of transformations turn this information into action. Future events are predicted in space-time with a high degree of accuracy and an even higher degree of fault tolerance. Some of this is reflected in the way that current robotic systems work, but we have much to learn. We need to develop systems where parallelism moves from being a difficult concept to handle to the primary tool of development.