About Adam Young

Once upon a time I was an Army Officer, but that was long ago. Now I work as a Software Engineer. I climb rocks, play saxophone, and spend way too much time in front of a computer.

Graph Theory

The next step in my preparation for the CS GRE is graph theory. The cool thing about graph theory is it is the basis for representing anything to do with relationships, especially spacial relationships.

I have GPS system in my car that routinely gets me to locations via routes I didn’t know. It is certainly not always the fastest route, but it works. If only I had that as an Platoon Leader. Actually, I did have a GPS, but it didn’t do route finding. Just told you where you were. I wonder how many people have tried to sneak a GPS in to Ranger School.

Back to graph theory. Take any set of entities and relationships between them, and you can represent the whole thing as a graph. Well, to an extent. According to the strict definition of a graph, two nodes can only be connected by at max one edge. Thus if you were representing a relationship between people, showing both family and work relationships, then it would not be possible to show that I worked for my Dad.

Graphs show up all over CS. The places they are discussed the most seem to be Networks, where the nodes are machines and the edges are network connections between them. But graphs show up in language parsing , data structures, graphics, object oriented design, databases. Graphs get hit heavily in algorithm courses as many of the most interesting problems come from graph theories.

Two basic problems in graph theory come up time and again: Minimum spanning tree and traveling salesman. Both problems assume that each edge has a weight. Minimum spanning tree attempts to connect all nodes in a graph into a tree with minimum total weight. Traveling salesman attempts to visit each node with a minimum weight path.

Let’s start with a simpler problem: shortest path. This is the problem that my GPS has to deal with when I say “Plot me a course to the Gym.” Given a weighted graph with one node designated the start node and a different node designated the end node, find the path with the least total weight.

There are several things to keep in mind. First is cycles. If your path visits the same node more than once, there exists a shorter path that visits that node only once. Note that this may not be true of driving in Boston, where you can’t take a left from Commonwealth Ave onto the BU bridge, but instead have to go another block, take a right, and loop around to cross the bridge from the south. But in a simple graph, you have to identify if you are visiting a node that you have visited before. There are two ways to do this depending on whether you are willing to spend more time or more memory on the problem. The time intensive way is to use a list, and add each node to the list. When you visit a new node, you can perform a search of the list to see if you have been there before. If so, disregard that path segment. If not, add it to the list. The alternative is to record a state in each node. When you visit the node, you check to see if the state reflects that you have already visited it. If not, mark the node as visited and continue.

That is what I can remember off the top of my head. Obviously, this is a much discussed topic. There are many Wikipedia articles on this and related topics, so I won’t attempt to reproduce that here.

Deploying for Operation Uphold Democracy-1995

CPT DiChairo called me into his office. “Adam, 4-87 is going to Haiti. They need Lieutenants.”

“Sir, I volunteer.”

The words might have been slightly different. It has been 12 years. It was an important moment in my life, so it has taken on a bit of it’s own weight. Memories like this that are often chewed over end up showing the teethmarks of selective memory. Still, the exchange left me tagged to move from a Platoon in C Company, 4th Battalion, 27th Infantry (Wolfhounds) to fill a slot for the understrength unit tagged to deploy to Haiti for Operation Uphold Democracy. This was early December, 1994.

The media had been showing images of Haitians arriving in Florida on makeshift rafts. Three months earlier, the Forces Armed D’Haiti (FADH, the Army of Haiti) had fled pending a United States lead invasion. The US was staged at Guantanamo Bay and on aircraft carriers in the Caribbean. The FADH had taken control of the country in a coup that prevented the elected President, Jean-Bertrand Aristead, from taking office. Conditions in the impoverished country had deteriorated to the point that the haitians preferred to risk setting off in makeshift rafts to to try and reach the coast of the USA then stay in Haiti.

Haiti has had a rough history. After the Colonies shucked off English rule to establish ourselves as the United States of America, the small Island colony did the same, throwing off French rule. America still practiced slavery itself, and was unwilling to support a slave uprising so close to it’s own borders. Over the years America has intervened in Haiti with military force several times. I remember reading about the early career of Marine General Chesty Puller who first saw combat in Haiti early in the 20th Century.

J.P. Green, my friend and house-mate, as well as a fellow LT in the Wolfhounds, was also tagged, but for the task of Brigade S5, Public Affairs officer. We were both tab-less bastards, having failed to complete the Ranger course the previous winter. I suspect that the selection process took that into account. I still am not sure what he did to get that posting, if it was a good thing or a bad thing. As we had a compressed time-frame, we decided to take a compressed Christmas vacation. Along with Mike Pickett, the Battalion Assistant S-2 (Intelligence officer), JP and I headed over to Maui from Oahu for 5 day weekend. We partied in Lahaina, drove over to the Sacred Pools, flirted with mormon girls. One night, I fell asleep in the back of a night club after a long night. JP and Mike found me there.

With that out of my system, I settled in for the train up. Until this point, the majority of my military training had been for war. As a light infantry platoon leader, I was trained to lead a group of 35 soldiers to “Close with and destroy the enemy by means of fire an maneuver in order to destroy or capture him, or to repel his assault by fire, close combat and counter attack.” as I memorized as a plebe. Now I was training for “Operations other than War” or OOW. We trained in convoy movements, crowd control, cordon and search, and defending food distribution. At the firing range we trained in “Quick Kill” exercises. We learned the graduated responses to riot, from issuing vocal commands, through pepper spray, helicopter down wash, and deadly force. We did a live fire exercise in a range on Schofield where certain targets wore civilian clothes. I donated a shirt of mine to the training, and recieved it back with a bullet hole.

We did a video conference with the folks from the 10th Mountain Division. The conference washeld at the USARPAC Headquarters on Oahu, Fort Shafter. On the way down, the Bde Commander asked what I was going to ask. I replied that I was going to listen and keep my mouth shut. He explained that this was the wrong attitude. I realized I didn’t know what we were doing this for, so I asked for more info. Not the first time I was unprepared for what the Army threw at me.

I started hanging around with Erik Anderson, on of the two other line platoon leaders in D Co, 4-87. We raced around Diamond Heights in our suits looking to find the secluded site that our Company First Sergent was getting married at. We showed up late, but still managed to with Top best of luck. I have a photo of us at the beach, on Christmas day.

Lots of the details have been lost. I know we did deployment prep at the stadium across from D quad on Schofield, took busses down to Hickam AFB and flew Tower Air to Port-Au-Prince (PAP in Milspeak), and then took smaller planes, probably C130s up to Cap Haitian Haiti. Due to the war powers resolution, we knew that the president only had 89 days before having to redeploy us. We shook hands with the outgoing 10th Mountain Division and settled in for the Haitian Vacation.

Build systems

For the non programmers, the word ‘make’ in the following paragraphs refers to a tool used to control the compilation of a software product.

At work we are in transition to using scons as our build driver. I guess enough people were frustrated with Make that they were able to change things over. Here is the problem.

You can never get away from make.

Any build system that uses code from some where else is, eventually going to have to deal with make. Your standard GNU package is a tarball that you unpack and then run:

./configure

make

make install

Or some variation on this. At work, the build team wanted to standardize on the version of autotools that they use. Makes sense, you should not check generated files into a repository if it makes more sense to regenerate them…but does it? Well, you need to be able to modify the Makefile.in and Configure.in files, so yes, every developer needs to run Automake/Libtoolize/Autoconf/etc with the same options and the same version of the tools. I’m not certain that autotools really make sense for a Linux only project, which was why we didn’t use them at Penguin and why we have to use them for the packages we have here.

People don’t like make because they either don’t like rules based programming or they don’t like debugging make files. Fair enough…but deal with it. Instead of trying to rewrite the tool, extend it. I am not talking about your average developers, just the ones that decided they can do it better using X, where X is some other language or some other library.

Personally, I’ve never found a replacement of make that didn’t end up sporting some of make’s warts. If your goal is to do incremental compilation, then the Ant approach, depending on the compiler to identify which file to build, won’t work for most languages. This is a case where you need to recognize in a Makefile that you don’t want to have a .SUFFIXES rule (.java.class) and instead, if you depend on the Java code, you let javac build everything every time. See the difference? One depends on a tool support for something, and fails when it isn’t there. The other allows you to take advantage of the tool support if it is there.

One reason that people don’t like make is that it is sensitive to white space. SCons is built in Python, also sensitive to white space. I guess those people will keep looking for something else.

Renaming Army Units

One thing I have never quite liked is the fact that in the modern US Army, there are Divisions, and there are Brigades, but there are no regiments.  And a Brigade is commanded by a Colonel.

Origianally, a Regiment was commanded by a Colonel, a Brigade by a 1 start, or Brigadier General, and a Division by a two star General.  At some point, the sizing was wrong and they collapse the Regiment and Brigade into a single unit.  But they kept the term Brigade, not Regiment.

I suppose they are able to keep more regiment guidon’s in active usage this way, but it seems to me that they should just rename the modern Army Brigades to Regiments, and then assignon historical Regiment per.  The Colonel would be the regimental commander, just like in history.

Whe I was with the 25th ID, our brigades had battalions from multiple Regiments.  Thrid Brigade, 25tjh ID was st1 and 4 th Battalion, 27th Infantry, and then we had 1-487 th Infantry in between.  We would have been a hell of a lot more cohesive as the Wolfhound Regiment 1,2,3.

I suppose the places where they have separate infantry Brigades commanded by a Brigadier General (I’ve seen this in the National Guard) would be a little awkward, but really, is it any worse than we have now?  We could even keep those units as SIBs.  Now A brigade goes back to being commanded by a General, and a Regiment by a Colonel.  It is even more straightforward.

Duty, Honor, Country

Is there any part of the Military that has a better slogan? While I know West Point jealously guards this motto, it really should be the motto for the entire Army. I mean, it isn’t just the officers who need to live to this standard. An Army of One. Be All That You Can Be. Too much focus on the individual, not on why you are doing what you do. The Army recruiting posters should use this. The Marines have it right. The Few, the Proud, the Marines. We are an exclusive club, we are good, and, if you show us you have what it takes, we’ll accept you. But who in this modern world can talk of Honor with a straight face? With our country so divided over Iraq, who talks of Patriotism not as a sling in a political battle, but with the quiet reverence due to a set of ideals that we strive to attain. What is Duty, and how does it differ from Country. Why does it come first?

I recently read a book call Absolutely American, by David Lipski. David is a reporter for Rolling Stone. I remember reading his article when it first came out. Seems he took a house in Highland Falls and stayed with the Call of 2002 for the three years (Yearling, Cow, Firstie) Until they graduated. For the past several nights I have had trouble falling asleep as the memories it awoke kept running through my head. I graduated from The United States Military Academy at West Point, class of 1993.
Digression: If you ever see someone wearing a shirt that says USMA, you are looking at someone affiliated with my Alma Mater. The funny thing is most people will read it as USMC. The Marine Corps is gets a lot of great Advertising from the Army. when I was at the recruiting station in Boston when my brother was enlisting, a kid told me he was joining the Marines because he identified with what he saw in Washington DC, them guarding the Tomb of the Unknown Soldier and so on. I told him that was the Old Guard, 3rd Regiment, United States Army. I recently Heard that the Army was returning back to wearing Army Blue as the Everyday uniform as well as the Formal. About time. But people will just keep think that Blue means Marines. Oh well. Back to the Main Thought.
Lipsky makes his shares of mistakes, but they are minor. He gets the jargon right, captures how we talked, as it was only 5 years between my graduation and the start of his research. He didn’t discuss academics at all, which is too bad, as it is the major activity at West Point, but maybe he thought that it was too similar to any other college. Too bad that, as I suspect that is not the case. There is a certain degree of academic freedom elsewhere that you just don’t have at USMA. If a student at Harvard writes an inflammatory satire claiming that your teacher is having affairs with both the men and women in his class, the first Amendment and attitude of the instructors fully protects you. This is not libel, as it isn’t published. At West Point, you get written up, punished, and possibly expelled. When I was a cadet, A couple guys did a spoof on the R.E.M. song Losing My Religion called Losing My Femininity and dedicated it to the Women’s Power Lifting Team. A segment I remember:

That’s me in the corner

That’s Me on the Squat-Rack

Losing my Femininity.

The officer in charge of the team, LTC Christopher, wrote them all up. When my class protested to our English Teacher ( a Military Police Officer, CPT P—Damn my memory is shaky.) He invited LTC Christopher to the class. He came in mad and basically bullied us into silence. I understand his desire to protect the people working for him, but there was no chance to have a dialog. At USMA, probably even more than in the Regular Army, when a superior says some thing is so, the is no argument about it.

I once got written up for running the screen save program at a 2 second interval. I had been poking around in our Unix systems and ran a program (probably named something the like xsc). It responded with a usage message (must be run with -s=X where X is a number between 1 and 10000 or something) and I re-ran it like this xsc -s=2. It ran and returned and I though nothing of it until I got called into my Tac’s office. I had been written up by my Networking instructor. My tac didn’t even understand the quill. I went down and explained myself to my P and that was the end of it. He accepted my word (West Point Honor code) that I was not trying to play a joke or anything, and I stopped poking around.

There are two problems here. The first is that a program like a screensaver should not be runnable against the X windows session unless you own the session. If it is supposed to be run regardless of who is logged in, it should require root privileges. Poor security design. But the second is that a school should not shut down the inquisitiveness of it’s students. I’m sure stuff like this happens at many colleges, but it is the force of discipline at West Point that makes it so fearsome.

The memories keep coming, almost overwhelming now. West Point was an intense four year, more so than any that came before, and only rivaled by some of my later experiences in the Army. I was in no way an model Cadet, but I was not a complete dirt bag. I came to believe in what we were there to do, and did my job. I was and still am a fierce individualist, and that worked against me at USMA. I think it is normal for someone as a professional programmer. I was not drawn inside the camaraderie of our Company as closely as some. Where others played Football or joined the drill team, I joined the Band. Music had been a huge part of my life and I wanted to hold on to it. But it was considered a second tier activity, a way to “Get Over” and avoid drill and ceremony and do something dorky. I did a lot of things like that: medieval studies, Jewish Chapel Choir, and so on. Yes, I identified with Cadet George Rash to a degree. But I had four years on the high school wrestling team, and many years working for Jim Young (Thanks Dad!) to get me physically and mentally tough.

I think West Point and the officer Corp in general would be better served by sending Cadets out to regular Army units as junior enlisted before they they make the commitment Junior Year. I was a drill cadet, and got fired up by the opportunity, but I think I would have been really well served by a six month or one year stint as a line infantry soldier. I realize Camp Buckner is tradition and all, but it would be a hell of a lot better if we went to AIT (follows after Basic Training) at Benning and Knox, then shipped to various posts around the country and be a “Joe” for the remainder of the year. North Eastern University has a Co-Op program, and it doesn’t seem to hurt their admissions process. Why can’t West Point graduate people in five years? One added benefit is that people would drop at the end of the year that really didn’t want to serve in the Army. At the end of four years, you get promoted to E-4. If you want to stay in the Army, you can keep that rank. Knock a year off of what you get for college. When you graduate West Point, you have already earned one year toward retirement. There is no reason ROTC shouldn’t/couldn’t do this as well. If you did it at the end of Plebe year, the Cadets would have a year at WP vested as motivation to come back. And, of course, the officers in their unit jacking them up all the harder if they decided to quit. But everyone would know what IT was about. Maybe they could make an exception for the prior service cadets, or let them retain their old rank out there in the Army.

Duty is strange word and concept. We do our duty as an attempt to pay back the debt we owe. Like original sin, we have a debt just by being born American. This is an interesting concept. What debt do we have? To the men of the Armed Forces that defended our freedoms in the past? To the Aboriginal Americans who died from smallpox and wars with the Europeans? To the African tribesman that were kidnapped and shipped to the agricultural hell of the cotton plantation? This is what you get as an American: the opportunity for a public education, good roads to drive on, a police and fire department that work to keep you safe, a legal system that allows you to challenge it, an economic system that allows you to move up in the world, and a medical system that keeps you healthy. Yes, these are unequally distributed. The higher your social standing, the better the community you can live in, and the better health care you can afford. Many Americans have much fewer opportunities. Perhaps we have a debt to them as well. Not to pull them upwards, but to make sure that the opportunities, the pursuit of happiness, are available to them as well.

With slavery, we destroyed the social structures of the Africans when they were brought to America. While many other ethnic groups have come here, and have been treated badly (I heard something on NPR this morning about Chinese Workers in America) they have, for them most part, come willingly. At a minimum, we owe a debt to the African American community to help repair the social devastation we created.

I think we also have a duty to the people of Iraq to find the best possible solution to the crisis there. I won’t get into whether the invasion was right or wrong. The past is prologue. We now stand at the start of the next page of Iraqi history. How can we minimize damage and maximize opportunity for the average Iraqi? Does an American presence in Iraq deter more violence than it incites? I wish I knew the answer to this question. Is the debt we are incurring in trying to support Iraq detracting from our ability to do our Duty elsewhere?

A favorite interview question of mine

Given the following API:

int bproc_numnodes() ;

void bproc_move(int nodenum);

long gettime();

Write a program that will tell me the average clock skew in a cluster.

I gave this problem to a lot of people back when I was at Penguin. The API and simple math functions really are all you need. No malloc, or socket calls. I did allow people to #include <stdio.h> for output.

If anyone reads this blog, and has an answer, post it to the comments section. I’ll post an answer with a full explanation in a bit.

Working on the Beowulf Process

I am currently listed as one of the maintainers of the BProc project on Sourceforge. Unfortunately, my current life has left me little enough time to do my job and be a father, so other projects fall by the wayside.

BPRoc portrays a cluster of computers a single system from an operating system perspective. A process running anywhere one the cluster shows up in the process tree on the head node. Signals sent on any machine were forwarded to the machine where the process was actually running. A process could voluntarily migrate from one machine to another. All of these techniques take place in the Linux Kernel. Maintaining this code requires understand of operating system concepts such as signal delivery, page table organization, dynamic library linking, as well as network programming. I’ve never had more fun coding.

Linux kernel development is done by taking a copy of the kode published at kernel.org and applying a file that contains the differences between how it looks at the start and how you want it to look at the end. This file is called a patch. THe major Linux distributions all have a version of the Linux Kernel that they select as a starting point, and then a series of patches that they apply to deal with issues they care about. For instance, I am running Red Hat Enterprise Linux 4 machine with a kernel version of 2.6.9-55.0.9. The 2.6.9 is the version that they got from kernel.org. The 55.0.9 indicates the number of major and minor upgrades they have made to that kernel. The number patches applied when last I looked was in the neighborhood of 200. All of the changes we applied to the Linux kernel was maintained in a single patch. As we maintained succeeding version of the kernel, we continued to generate newer versions of that patch. In addition to this code, we had a separate, and much larger, portion of code that was compiled into a binary format that could loaded into the Linux Kernel on demand. The majority of the code in the patch was merely hooks into the code that called out to the loadable kernel modules.

Penguin had branched from the Sourceforge BPRoc before I joined. As such, Sourceforge had already moved on to the 2.6 series Linux Kernel while we were still on the 2.4 series. This was a major difference in the code base, and there was little grounds for sharing. When we did finally start moving to the 2.6 code base, we had one Marketing requirement that the Sourceforge project did not: We needed to interoperate with the Linux Kernel shipped by RedHat for there Enterprise Linux product (RHEL). I spent a long time in research mode trying to make this happen. Two major decisions came out of this. First, PID masquerading had to go. Second, we needed to use binary patching in place of many of the source level patches.

 

Every process in an operating system has an integer process identifier (PID) that other processes and the kernel can use to access that process. A major mechanism in BProc was the ability to migrate a process from one physical machine to another. PID masquerading is a technique that ensures that the process identified does not have to change during migration. Instead, each process has two identifiers. The first is the ID as allocated on the head node, and is used when reporting information to the head node, other nodes, or user land functions. The second ID is the PID allocated on the local machine, and only used inside to local machines Kernel. When a function like getpid is called, the process identifier returned is the masqueraded PID, not the local PID. PID masquerading has both positive and negative implications. With PID masquerading, a given compute node can actually have two completely separate pools of processes that cannot communicate with each other. each of the pools of processes can be driven from a different head node. This allows the sharing of compute nodes between head nodes. A given machine can actually act as both a head node and a compute node. This was a requirement in early Beowulf clusters, but was no longer necessary by the time I worked on them. The negative impact of PID masquerading was the amount of code required to support it. Every PID reference in the Linux kernel had to be scrutinized for whether it should be a local or remote PID. If it needed to be translated, a hook was inserted that said “If the module is loaded, and this process is a masqueraded process, return the masqueraded PID, otherwise return the real PID.” This type of logic composed approximately a quarter of the BProc Linux Kernel patch. There was no practical way we could inject all of this code without source level patching the kernel.

 

 

 

 

 

Binary patching means changing the machine code on a running system. There are two assembly instructions we looked for to see if we could change code. They are CALL and JUMP. Actually, there are two types of jumps, long and short, and we can use either of them. We did analysis of the compiled Linux kernel for places with these instructions near our current set of hooks. The CALL instruction is what maps to a function call in C. In assembly it looks like CALL 0x00000000, where the zeros will be replaced by the linker or the loaded with an address in memory where the function resides. Once we know where the call operation takes place, we can replace the value with our own function. This technique is often used with malicious intent in virus and root kits, but really is not much different than how a debugger or many security software packages work. During the replacement process, we record the original value of the function, so that we can unload our module and return it to it’s original flow. The compiler will often use a JMP instruction in the place of a CALL instruction as an optimization called a “tail call.” All this means is that when the called function returns, instead of returning to the location it was called from, it continues up the call stack. I discussed this in the CS GRE problem set post.

One place that we had to hook to make this work was the function and structure that allocated the process identifiers. The function alloc_pidmap gets a PID from a bitmap. The bitmap is just a page of memory treated as an array of bytes. Bit zero of page[0] represents the PID 0, Bit 1 represents PID 1, and so on. If a given bit is set to 1, there exists a structure in memory that is using that process ID. In standard configuration, a page in Linux is 4k bytes. 1024*4*8=32768, which is the largest number that can be held in a 16 bit signed integer. PIDs have traditionally been 16 bit signed integers in Unix, and Linux. We used a couple tricks to help out here. On the Head node, we set all PIDs less than some threshold (we chose 1000) to be 1, indicating to the system that it should not allocate those pids. On compute nodes, we set all PIDs greater than the threshold to be 1. PIDs to be visible across the entire cluster were allocated on the head node. PIDs allocated for local work on the compute node would be guaranteed not to class with PIDs from the head node.

Aside:  recent versions of the Linux kernel have expanded PIDs to be 32bit signed integers.  At first it was tempting to expand the allowable PIDs, statically partition the PID space amongst the compute nodes, and allow local allocation of PIDs.  We origianlly pursued this approach, but rejected it for several reasons.  First, the Linux Kernel set an arbitrary limit of 4*1024*1024 on the number of PIDS.  We wanted to be able to support clusters of 1024.  This means that any node on the cluster had only 4*1024 PIDs to allocate.  Since the vast majority of PIDs were handed out on the head node anyway, we had to do some unbalanced scheme where the head node go something in the neighborhood of 16000 PIDs, leaving a very small pool to be handed out on each the compute nodes.  Additionally, a compute node crash erased all record of the PIDs that had been handed out on that machine.  Replacing a node meant rebuilding the pidmap from the existing process tree, a very error prone and time consuming activity.  Also, many applications still assumed a 16 bit PID, and we did not want to break those applications.

We found that there were several race conditions that hit us if we relied solely on the pidmap structure to control PIDs. This we ended up hooking  alloc_pidmap, checking for a compute node or head node, and checking that the returned PID was withing the appropriate range.  This type of code pathis frowned up in the mainline Linux kernel, but we found no noticable performance hit in our benchmark applications.

One benefit to this approach was that we could then slowly remove the PID Masquerading code.  We continued to track both the masqueraded and real PIDs, but they were assigned the same value.  Thus we never broke the system as we restructured.

The Silk Road

“In Xanadu did  Kublai Khan a stately Pleasure Dome decree”  Thus did Coleridge start his epic poem about the land controlled by the Mongol Great Khan, as visited by Marco Polo.

I’d often heard while growing up that the renaissance in Europe was spurred by trade from the far east, which was, Ironically kicked off by the Crusades.

Turkey, Kurdistan, Iraq, Iran, Afghanistan, Turkmenistan, Tajikstan, China.  These are the countries of the Silk Road. I find it amazing that we are so focused on many of these countries at this point in time, when so many of them were bypassed by history for such long stretches.

We claim that there are seven continents.  There are six.  Europe and Asia are one land mass, with nothing marking the change from one to the other but a set of mountains that are not even the largest on the continent.  Don’t feel bad, Urals.  No mountain range on the planet can compete with the Himalayas. The Urals certainly were not enough of a barrier to keep the Mongols out of Europe, nor the Russians out of Siberia.   Maybe they really shouldn’tbe the dviding line between continents.

So we have Eurasia.  Let’s face it, Eurasia is the world scene.  They only really notice America when we come over there.  Yes, humanity came from Africa.  But things really got interesting when people spewed all over Asia.

the Silk road connects the cradle of wester civilization, Uruk, with the cradle of Easter Civilization, the Central Kingdon. In both cases, founded on not one but two rivers.  The tribes that migrated from Africa to China did so by this ancient silk road, or at least vaguly followed it.  I dout any tread too far north, as the equatorial nurturing of the homo-sapiens was not very likely drawn to the cold siberian heights before encountering the far more hospitable Yangtze basin.  Farm more likely that they actually came trhrough India and southeast Asia, but that still means they traversed the  majority of the silk road before meandering off down the Khyber pass.

I’ve been reading a lot about this region lately.  Right now, I am reading a book on Afghanistan, alough stricly speaking it is mostly a book on Afghanistan’s encounters with outside powers. Aghansitan is U shaped in its lowlands, looping around the southern end of the Ural Mountains.  To use a military term, the major avenues of approach are:  From the East, via the passes in the mountains from Pakistan, from the west in a corridor between the Caspian Sea and the mountains of the Iran center, and from the north basically the whole frontier is exposed to the Transoxian countries clear up to the Siberian steppes. Adfghasnistan has been invaded via all of these approaches.  Alexander the Great and later the Persians and Caliphates came from the West.  The British came up from India, twice, not a passage I would recommend.  The Mongols, the Soviets, and the America backed Wester Alliance came from the north.

The mountains in the center of the country are the Hindu Kush.  To get from Kabul, the capitol, to Mazar e Sharif , the largest city in the north, one traditionally had to cut through various passes.  The Soviets funded a highway through the mountains, along with a two mile tunnel.  That should give a sense at how hard communication must be in this place.

I have to wonder what it would have been like to be on a merchant caravan shortly after Marco Polo published his travels and the flood from the West began.  Set sail out of Venice, land somwhere in Turkey, travel across to Mesopotamia, down the Tigris and the cut over into Iran.  Skirt the Caspain sea, and by now you are in a land that vaguely remembers Greeece, and knows nothing of the west except reports of distant wars .  You keep going to a place where Islam only has a tenuous hold, and there only in the lowlands.  Do you stay north, risk the mountains and the dester to make it to distant Cathay, or go south do Hindustan where the people are even more exotic, the languages are more diverse.  Oh, and to get there, you have to travese steep moutain ravines overwatched by tribes that demand bribes at every juncture, and even then might not leave you alive.

Growing up in America, you realize how shallow our histories are.  We have no written record of the aboriginal tribes here, just scattered remains viewed through the dubious eyes of the christian missionaries.  Yes, there were greater civilizations to the south, but that is a long way from my native Massachusetts.  I am often jealous of the history clinging to the peaks and valleys of these places, places that were old when Techonctilan and Machu Pichu were young.  Idon’t envy the citizens of those realms the hardships they face either today, nor through the centuries.  I know that many of my own ancestors spent the bulk of their lives not so very far away in the  regions of Ukraine, Russia, and Polond.  I know that my Jewish heritage comes from the Jordan rivere civilizations.  I know that there must be the blood of at least a few wild horsemen from the Asiatic steppes intermingled with my semitic and Rus bloodlines.  I think it would be fascinating to be able to view the entirety of my family tree.

Questions for GRE Prep

This list of question was from a Slashdot post. Here are my answers.

Do you remember what LALR stands for?

This is a form of grammer parser. Look Ahead Left to Right. It accepts/parses Context sensitive grammers.
Can you calculate the cost of a pipeline bubble or a branch misprediction? What information would you need to do so?

You can calculate the cost of a specific misprediction under some architectures. You would need to know what the instructions to be executed in the incorrect case are, how long they take, and whether you can undo them or not. For instance in the sample code:

if     (i){   do_x();  } else{  do_y(); }

If both do_x and do_y are non, inlined functions, you know that a function call will happen.  Assume i has to fetched from memory, if  the addresses of do_x and do_y must also be fetched from memory, there is a likelihood of fetching the wrong address for the call operation.  If, on the other hand, do_y is inlined, and the compiler determines this to be the likely path, the cost will be whatever work do_y executes while waiting on the fetch of (i).

Deeper pipelines have even worse issues, as the “most likely path” may only be executed in it’s entirety a relativelyfew number of times, meaning much of the work done by deeper pipeline stages gets discarded.

How is depth-first/breadth-first related to stacks/queues related to LIFO/FIFO?

a stack is a last in first out (LIFO) data structure

a queue is a first in first out (FIFO) data structure.

In a depth first search, traversing down the the search tree is analogous to pushing values onto a stack. A depth first search is easily implemented using recursion, where the stack is the actual program stack.

In a bredth first search, you traverse across all of the children of the start node. As you search each child, you add it to a queue. Once the current node has no more children, you remove the item at the head of the queue and process it in the same manner. Continue until you meet the search criteria or there is nothing left in the queue.

What is the stack pointer?

The element of the register file that indicates where the top of the stack is for a process/thread.

What is the frame pointer?

On some architectures, a separate register points to a location on the stack that indicates where the currently executing function’s scope begins. On x86 and later architectures, there is no frame pointer register. However, the concept of frame pointer is still valid, as the return location of the current function will be pushed onto the stack with a call operation. To return from a function, this value is popped off the stack and put in the instruction register.

An interesting note is that this value may not always refer to the calling function. For example, assume you have three functions main, outer and inner. Outer and inner have identical parameter lists, for this example, a single integer value x.

void inner(int x){

}

void outer(x){

inner(x)

}

main(){

int x;

outer(x);

}

The generated assembly for main will be something like:

mov X,%ebp
call <outer>

But for main it can be

jmp <inner>

The reason is that there is no need to clean up the paramters after the call to inner. When inner completes, the return value stored on the stack will be from main, one after the call to outer. This is called a tail call and is a very common compiler optimization.

What is an interrupt?

An indication of an asynchronous event that the operating system must handle. There are two forms of interrupts, hardware and software. Examples of events which generated interrupts are: request I/O is avaialble from a device, a clock tick has occured, a user has typed a key on a keyboard. Typically, the OS handles the interrupt in two steps, a top half and a bottom half. The Top half is everything that must happen immediately, the bottom half is everything that must be done eventually. In the top half, the systems does things like recording the type of the interrupt, and scheduling the bottom half by placing it on a queue. The bottom half is what would be responsible for reading the data from the network device or linking up the key pressed to the appropriate TTY.

What are the three primary elements of OOP?

I don’t know if I would say there are three. Really, there are only two. Encapsulation and Polymorphism. Inheritance is in most forms of object oriented programming languages, but I agree with people who call this “Class Oriented Programming.” Protype based object oriented languages such as javascript may or may not be described as having inheritance. In this kind of language, a new object is created only by cloning the original object, and then modifying it. Since this is a shallow copy, it will have the same set of functions listed as methods of the object. These can be overloaded by replacing them. This is a instance of the Prototype design pattern.

Encapsulation means that you only have access to the interface of the object, not the underlying implementation. I would argue that Javascript provides no method to hide a method or property, and thus does not really encapsulate. Then again, java provides the entire introspection API, so there are often ways to get around encapsulation using dirty VM tricks as well. In C++, you can alwyas cast the object instance to get access to the underlying implementation. It is safe to say no languages encapsulation is effective against a truely determined effort to violate it.

There are two distinct forms of polymorphism. The first is that the same symbol can point to multiple functions. In C++ this is done using a table of function pointers. The second form is that two functions of the same name can take different sets of parameters and point to differnent implementations. This requires compiler support. In C++ this is implemented via Name mangling, where the types of the parameters and the return value are added to the symbol name. In Java, all inheritance is performed via a V table, and is scoped on the class name of the class that originally defined it.

What is modus ponens/tolens (sp)?

Forms of logical arguments. A->B A Therefore B is a modus ponens argument.

A modus tollens is a logical argument of the form A->B not(B) therefore not(A).

The truth table for A->B is:

A B==A->B

0 0 ==1

1 0 ==0 (logical contradiction)

1 1 ==1

0 1 ==1 (not(a) does not imply not(b))

What kinds of race conditions are there?

Wikipedia has a write up of these, but I never learned these terms.

What is a critical section?

A section of code that can only have a single thread of execution at a given point in time. Access to a critical section is controlled by a lock.

What is the difference between a monitor and a semaphore?

Both are methods of concurancy control. A semaphore performs concurrency control by keeping a counter of requests. THe counter is implemented using an atomic test and set. When the semaphore is 0, another process is granted the permission requested. All others will be required to wait.

A monitor is a compiler generated construct that controls access to a resource. A call to code that is a critical section will have code generated that acquires the lock, and then releases the lock after exiting the critical section.

The third form of concurrency control is message passing. These three forms are all equivalent. Each can be implemented in terms of the other two.

Define NP-complete.

A problem is NP if there is a solution that can be performed in polynomial time by a non-determinisitic turing machine. A solution to the problem can be verified in polynomial time by a deterministic Turing machine. A problem is NP complete if every other problem in NP can be reduced to that problem. NP-complete problems are the hardest problems in NP.

Java Native Interface

The reason I left Java programming was due to the type of work I got in that field. However, I love the language. It provides the programmer with the right set of tools to focus on getting the task done. Memory management, while important, is not the primary focus of each and every programming task.

Don Knuth’s quote about premature optimization being the root of all evil often comes to mind. C is a great language for low level tasks, for embedding code on a really small platform, and for ensuring that you can understand the end binaries format and structure. One nice thing about Java is that it, too has a rigidly defined and easy to understand binary format. So, one approach to problem programming might be to code in Java to solve the problem, then optimize in C.

The language binding between Java and is the Java Native Interface (JNI). JNI makes use of the Java ‘native’ keyword. a Native method is one implemented external to the current Java class.

Here is a trivial example:

public class Sample

{

   private native String reverse(String prompt);   static {

      System.loadLibrary("reverse");

   }

public static void main(String[] args)

      {

         Sample jn=new Sample();

System.out.println("Got "+ jn.reverse("Some Data"));

      }

}

The ‘native’ line defines the method reverse. To resolve this, the jvm will look for a symbol: Java_Sample_reverse. From this we can infer that
parameter polymorphism is either not supported, or the jvm only resorts to name mangling when required to deal ambiguous situations.

the nexts steps are 1) run javac to convert the .java file to a .class file. and 2)run javah on the .class file to generate a .h file for the c library.

The function definition looks like this:

JNIEXPORT jstring JNICALL Java_Sample_reverse
(JNIEnv *, jobject, jstring);

This is within an extern “C” block for C++code, so we don’t get C++ semantics for the function implementation. This means you won’t be able to make a single set of objects to be shared between C+++ and Java, but this would be unwieldy no matter what.

For gcc The JNIEXPORT is resolved away to nothing. This compiler directive is to support Microsoft’s compiler which forces you to declare if a function is going to be visible outside of the scope of the current file, pretty much the opposite of ‘static’. JNICALL also resolves to nothing, but I suspect on a windows platform it would introduce code that specifies the pascal function calling convention for handling parameters. JNIEnv is a pointer to a pointer, and thus must be used like this :
(*env)->doSomething();
C++ Provides operator overloading of the -> so you can refer using this semantic instead:
env->doSomething();

Note that the string gets passed as a jstring. jni.h shows this to be a jobject. jobject is a struct with nothing inside it. Thus, once we go from java to C++, we are not supposed to know what exists on the other side except from reading the specification.

For this example, I have a function that reverses the string passed in. Here it is:

#include 

#include 

#include 

#include /*

 * Class:     MyJni

 * Method:    getLine

 * Signature: (Ljava/lang/String;)Ljava/lang/String;

 */

JNIEXPORT jstring JNICALL Java_Sample_reverse

  (JNIEnv * env, jobject obj, jstring str)

{

   jstring retval;

   char * msg;

   int i;

   int len;

const char * orig = (*env)->GetStringUTFChars(env, str,0);

len = strlen(orig);

msg = (char *)malloc(len+1);

/*disregard the null termination fro now; it will not be copied.*/

   len--;

for (i = 0; i <= len; ++i){

      msg[i] = orig[len-i];

   }

   msg[i]=0;

(*env)->ReleaseStringUTFChars(env, str, orig);

retval = (*env)->NewStringUTF(env, msg);

free(msg);

return retval;

}

Here is the Makefile I used to build it.

CFLAGS+=-I/usr/lib/jvm/java-6-sun-1.6.0.00/include -I/usr/lib/jvm/java-6-sun-1.6.0.00/include/linux
FILES=MyJni.h

all: libreverse.so Sample.class

.SUFFIXES : .c  .class .h .so .java

.class.h :
        javah $*

clean :
        rm -rf *.o *.so $(FILES) *~ *.class Sample.h

.java.class :
        javac $<

libreverse.so : Sample.h reverse.c
        gcc reverse.c -o $*.so -shared $(CFLAGS) -fPIC

test :
        LD_LIBRARY_PATH=. java Sample

$: make && make test
javac Sample.java
javah Sample
gcc reverse.c -o libreverse.so -shared -I/usr/lib/jvm/java-6-sun-1.6.0.00/include -I/usr/lib/jvm/java-6-sun-1.6.0.00/include/linux -fPIC
LD_LIBRARY_PATH=. java Sample
Got ataD emoS