About Adam Young

Once upon a time I was an Army Officer, but that was long ago. Now I work as a Software Engineer. I climb rocks, play saxophone, and spend way too much time in front of a computer.

OVER keyword in Oracle

I’m looking at the SQL Scripts that create and modify the database for VPXD. Specifically, I am looking at the oracle version stats[123] scripts. They included a query that uses the “OVER” keyword and I have not found a decent description of what this keyword does. This is the query:

SELECT
st.counter_id AS COUNTER_ID
,v_time_rowid AS TIME_ID
,ST.STAT_VAL AS STAT_VAL
,DENSE_RANK()

OVER (PARTITION BY ST.COUNTER_ID ORDER BY SM.SAMPLE_TIME DESC) AS DENSE_RANK
FROM vpx_sample_time1 sm,vpx_stat_counter sc, vpx_stat_def sd, vpx_hist_stat1 st
WHERE sm.rollup_counter=to_number(to_char(v_start_time_id)||to_char(v_cnt))
AND st.time_id = sm.time_id
AND st.counter_id=SC.COUNTER_ID
AND SC.STAT_ID=SD.ID
AND ((SC.DEVICE_ID =v_null_device_id and v_stat_rollup_level<3) or (v_stat_rollup_level>=3))
AND SD.STAT_LEVEL<=v_stat_rollup_level
AND SD.ROLLUP_TYPE=3)
WHERE DENSE_RANK=1

;

The one description of the Over command I have found has come from the Microsoft website:

“Determines the partitioning and ordering of the rowset before the associated window function is applied”

OK, so it is a union of two types of functionality: partitioning and ordering. The queries I am looking at use both. Let’s start with partitioning:

PARTITION BYDivides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

The window function in this place must somehow refer to DENSE_RANK. To me this looks like it is possibly trying to insert a subselect into the overall select. Here is how MSDN defines DENSE_RANK. I’d like to reiterate that these are Oracle functions I’m trying to understand, and Microsoft is doing all the work explaining them. When I tried executing this SQL Query in SQLPLUS via the Oracle 11 instant client, it balked:
SQL> 2 3 4 5 6 SQL> SP2-0734: unknown command beginning “OVER (PART…” – rest of line ignored.
SQL> SP2-0734: unknown command beginning “FROM vpx_…” – rest of line ignored. It parses fine when I do it as part of a plsql stored procedure.

Time to dig out the slightly larger guns: I picked up a copy of the SQL Cookbook by Anthony Moliaro from Our Friends in Sebastopol, CA. This covered both DENSE_RANK and the OVER keyword.

Over, it turns out, is one of a new set of functions added to SQL 2003 that the Standard calls Windowing functions. This means they operate on a subset of the results returned from the outside query. The best example I saw showed if you are selecting all employees from all departments, and next to each one you wanted to show how many employees were in that department, you could use count(*) with an over clause that specified PARTITION by department. This example is in the Cookbook.

DENSE_RANK () tells you how many rows come before the current row in some ordering scheme. The over clause is used to define that ordering scheme. If two rows have the same rank, they have equivalent values according to DENSE_RANK. For instance, if we have five people with five salaries, their DENSE_RANK of salaies will be 1 through five, highest to lowest. If the top two salaries were identical, top rank would return 1,1,2,3,4.

OK, so it looks like the original query wants to use DENSE_RANK on a subset of the returned rows of the query. The window will be defined by this portion of the clause: PARTITION BY ST.COUNTER_ID

And the order within this window will be determined by: ORDER BY SM.SAMPLE_TIME DESC

OK time to get serious about playing:

I’ve learned a little but since the above frustrations trying to run the stored procedure. Here’s what I do now:

run sqlplus. For me this is

/home/adyoung/apps/oracle/instantclient_11_1/sqlplus adyoung/adyoung@10.17.195.159/DRORACLE

Now make it so anything I print to the screen is visible:

set serveroutput on

Load my sql file in and execute it.

get myfile.sql

/

OK, I have a development method. To print debugging output:
DBMS_OUTPUT.PUT_LINE(‘Text Here’);
And we can debug. In the time honored tradition of always moving from success, I comment everything out but a simple output statement. In this case, I know that the real code uses a variable for the rollup level, so my File (minus commented out section, looks like this:
create or replace procedure stattest
is
v_stat_rollup_level int;
begin
v_stat_rollup_level := 2;
dbms_output.put_line(‘Value: ‘ || TO_CHAR(v_stat_rollup_level));
end stattest;
Which executes using the above method. Now I start adding in parts of the SELECT.
SELECT st.counter_id AS COUNTER_ID FROM vpx_hist_stat1 st;
Of course, it hates this. When I load it into sqlplus I get:
Warning: Procedure created with compilation errors.
Of course, in order to see these compilation errors, I have to jump through a couple more hoops:
SQL> set arraysize 1
SQL> set MAXDATA 6000
SQL> show errors procedure stattest
LINE/COL ERROR
——– —————————————————————–
6/1 PLS-00428: an INTO clause is expected in this SELECT statement
So I add a variable called maxcounter in the declare section and modify my select:
SELECT
max(st.counter_id) into maxcounter
FROM
vpx_hist_stat1 st;
dbms_output.put_line(‘Maxcounter: ‘ || TO_CHAR(maxcounter));
And I have a procedure that executes and does something. On to the rest of the select. After several iterations of trial and error, I have the SQL query wokring as a cursor. I’ve now figured out even a better way to work. I’ve modified my file like below, and can now cat it into the sqlplus interpreter:
set arraysize 1
set MAXDATA 6000
set serveroutput on

create or replace procedure stattest
is
v_stat_rollup_level int := 2;
maxcounter int := 0;

CURSOR MYCURSOR is SELECT
st.counter_id as COUNTER_ID,
st.STAT_VAL as STAT_VAL
FROM vpx_sample_time1 sm,
vpx_stat_counter sc,
vpx_stat_def sd,
vpx_hist_stat1 st

WHERE
st.time_id = sm.time_id
AND st.counter_id=SC.COUNTER_ID
AND SC.STAT_ID=SD.ID
AND SD.ROLLUP_TYPE=3
AND SD.STAT_LEVEL<=v_stat_rollup_level;

begin

open MYCURSOR;

FOR recname in MYCURSOR
LOOP
dbms_output.put_line(‘Value: ‘ || TO_CHAR(v_stat_rollup_level));
END LOOP;

dbms_output.put_line(‘Maxcounter: ‘ || TO_CHAR(maxcounter));

end stattest;

/

show errors procedure stattest

.

OK, I am going to bail here.  This post has been half written for a while, and is not going to get completed anytime soon.  I think I’d vote “Over” as something that would be useful in the next rev of postgres, if only for simplifying porting from Oracle.

Dependency Licenses

The following script, when run on a RPM based system, tells you what licenses your application might trip by linking with other code.  You have to  set $MYEXE to som path to an executable.

for LIB in `ldd $MYEXE | sed -e ‘s!.*=>!!’ -e ‘s! (.*!!’` ; do rpmquery –whatprovides $LIB ; done | grep -v “any package” | sort -u | xargs rpmquery –qf %-30{NAME}:%{LICENSE}”\n

Here is a sample run for perl:

glibc                         :LGPL
perl                          :Artistic or GPL

Note that this is only for the linked files that are registered with RPM.

Dependency Injection in C++

One goal of object oriented programming is to encapsulate functionality within a class.  If one class requires another in order to collaborate on a larger process, the programmer must deal with the wiring up of the two classes.  Much has been written about the correct way to do this in various object oriented languages.  The term “Inversion of Control” and the related concept of “Dependency Injection” have become part of the common language of software development, at least in the Java world, due to projects like Pico/Nano container, The Spring framework, and earlier efforts inside J2EE and ATG Dynamo.  These frameworks make use of the introspection mechanisms in Java to create instances of classes based on demand criteria.


Updated: links and using the code formatter. Changed Class from Single Letter to full names to be clearer and to not conflict with HTML tags.

Continue reading

Edjamacation

Massachusetts has one of the best public school systems in the country. It has been top in the country before, and was rated fourth last I checked. California, which has one of the largest economies in the world, is ranked in the bottom quintile.  I was lucky that my parents moved from California back to Massachusetts before I was born.  My wife and I made the same decision shortly after our son was born. Education, and public education are important to us.

Continue reading

Playing Football after Graduation

West Point has started letting Grads that are recruited by Pro sports teams go be recruiters, and play pro ball.  This has caused much heated discussion, especially amongst my fellow alumni.  I initially wrote this as a response to an email discussion, but decided to sit on it for a while and ruminate.  This is really more of a collection of my thoughts at the time than a coherent essay.

It seems particularly stark in contrast to the classmates that are headed to Iraq. Would people feel as strongly about the matter if we were not sending people into harms way?

Also, is admissions the only reason that success in Army Sports, Football in particular, is important?

What about the rest of us that “did our time” and are now sitting out this conflict? Yeah, we played Army for our total of 8 years Active and Reserve. For many, it was a great experience that has lead to success later in life. Are we any less guilty of avoiding our Duty? How about the guy that “only” goes Signal Corps as opposed to going in to a combat Arms, or that goes Artillery to avoid Infantry? There always is a way that someone who goes less than the full HUAH can be said to be shirking.

Is it really doing our Country any good to be sending our Grads over to Iraq? I think most people would say that it is not cut and dried: some yes, some know, many I don’t knows. So why is it so important that these kids go to Iraq instead of playing Football? Is it really just a question of paying your dues?

Maybe the best way this kid can server his country is by being a kick-ass footballer, getting the name of West Point up in front of the country, and help to raise the awareness of civilians that we even have service academies. Maybe He’ll have a two year career, get cut, and end up back on Active Duty. Maybe he’ll be such a kick ass recruiter that he’ll fill the Army’s quota single handedly. Or maybe the Army wasted money in training him, and it was a mistake to send him to the NFL.

Is keeping a bunch of barely post adolescents isolated from the rest of civilization for four years the best way to prepare them for the officer corps? Does the Army get as much bang for it’s buck vie the Service Academies as it does Via ROTC? Sure West Point has produced it’s share of generals, but would those same people be great generals if they had gone ROTC? Would the opportunities in the Army be different if the Largest block of officers in the Army didn’t come from the same school? I have no idea if what we are doing makes sense or not. I know I gained a lot and gave up a lot by going to West Point. I’ll never know what I would have gained if I had gone another route.

Letting Cadets go professional  will allow the coaches to recruit players who, as Seniors in High School think they have a chance to play pro ball. Most College Football players want to go pro, but few are chosen. I suspect that a good portion of these players would make decent soldiers. So Army Football gets a better team, and good but the less-than-great ball players now have the chance of a career as an Army officer.

Many kids enter West Point as an option, and only develop the drive to be Army officers while being at West Point. I suspect that this is one of the most important roles that West Point plays in support of our Officer corps.

Interview Question for Distributed Computing

This is an updated version of my interview question,  made non-bproc like.

Using only the following API:

  • printf(…)
  • int get_node_count() // number of compute nodes attached to the head node.  0 means no other nodes
  • int get_current_node()// 0 for the head node, 1-n for the compute nodes.
  • int remote_fork(int node) // like fork, but returns an fd to the child/parent process
  • void send_long_sync(int fd, long value)//send and wait, blocks until receipt
  • long recv_long_sync(int fd)//block until value is available
  • long gettime()

Calculate the average clock skew on the cluster.  Return 0 on success, and -1 on any failures.

Assume that all nodes are up and running.  This is a c subset of c++.  Each of these functions throw an exception upon failure.  rfork has the same semantics as fork:  when it returns, there are two copies of the program running, just on separate machines.  The  next line of code to execute will be the line immediately after the fork on both machines.  However, the returned value is not a process ID, and the parent process  does not need to wait for the remote process to finish: the child process is automatically reaped by init.

Oracle to Postgresql part 1

Here are the steps I am going through to port some code from Oracle PL/SQL to PostgreSQL PLPGSQL.

Here is the first line in the Oracle version

create or replace procedure stats_rollup1_proc is

This becomes

create or replace function stats_rollup1_proc() returns int as $$

DECLARE

Now Postgres is not my fulltime focus, just a part of the overall job. a PG expert could probably do it better.

The things to note here:

  • procedure is not a keyword in plpgsql. Thus function and returns. I suspect I could return a void type, but haven’t looked that hard.
  • Postgres requires the text of a stored procedure to be qutoed. The $$ is a nice way to deal with the requirement.
  • DECLARE is optional in Oracle, but required in postgreSQL

At the end of the function:

end stats_rollup1_proc;

Becomes

return 0;

end /*stats_rollup1_proc;*/

$$ LANGUAGE plpgsql

I like leaving the comment in there to match the original begin, since the functions get long enough that it is hard to track. There is no harm in returning 0, even if we don’t really use it as a return code. The final $$ closes out the one from the start of the function. We have to specify the language used, as this same mechanism can be used for any of the languages embedded inside PostgreSQL. Yes, even Python.

Ok, for some mechanics.

In the Declare section of the oracle code we have:

cursor time_cur(current_time_in date) IS
select distinct sample_time
from VPX_SAMPLE_TIME1
WHERE ROLLUP_COUNTER IS NULL
AND SAMPLE_TIME < current_time_in-1/24
order by 1 asc;
v_time VPX_SAMPLE_TIME1.sample_time%type;

which is later used like this:

open time_cur(v_rollup_start_time);
loop
fetch time_cur into v_time;
exit when time_cur%notfound;

In PostgreSQL these can be inlined like this:

for v_time in
select distinct sample_time
from VPX_SAMPLE_TIME1
WHERE ROLLUP_COUNTER IS NULL
AND SAMPLE_TIME < current_time_in-1/24
order by 1 asc
loop

Although I have not yet figured out how to handle the notfound.