This is an updated version of my interview question, made non-bproc like.
Using only the following API:
- printf(…)
- int get_node_count() // number of compute nodes attached to the head node. 0 means no other nodes
- int get_current_node()// 0 for the head node, 1-n for the compute nodes.
- int remote_fork(int node) // like fork, but returns an fd to the child/parent process
- void send_long_sync(int fd, long value)//send and wait, blocks until receipt
- long recv_long_sync(int fd)//block until value is available
- long gettime()
Calculate the average clock skew on the cluster.   Return 0 on success, and -1 on any failures.
Assume that all nodes are up and running. This is a c subset of c++. Each of these functions throw an exception upon failure. rfork has the same semantics as fork: when it returns, there are two copies of the program running, just on separate machines. The next line of code to execute will be the line immediately after the fork on both machines. However, the returned value is not a process ID, and the parent process does not need to wait for the remote processs to finish: the child process is automatically reaped by init.
Asking questions like this on an interview will deter the very best engineers. I would never work for a company asking this question on an interview.
Pete,
I’ve given this question many times in interviews. Here’s why I think it is a good question:
1. It test your ability on actual program concepts like distributed computing
2. It test your ability to quickly absorb a new API
3. The solution is actually not that tricky, but there are subtleties that show either insight or experience.
4. It reflects the type of programming that I do, and that is required by the positions I’ve worked in, and that I have had to hire for.
The very best engineers have had no problem with this problem, and have enjoyed it.
When I give this problem, I am there in the room to answer questions, and make sure that the interviewee does not get stuck. I’ve seen the “perfect” answer to this one given, and there is a path that most people go through to get there.
What is it about this question that bothers you?
Yeah… I have a PhD in distributed systems. If asked this question in an interview, I’d walk out…
Then you wouldn’t get the job, and I’ve worked at some pretty fun places. It is a fun interview question, and I enjoy the interplay in working through it.