OVER keyword in Oracle

I’m looking at the SQL Scripts that create and modify the database for VPXD. Specifically, I am looking at the oracle version stats[123] scripts. They included a query that uses the “OVER” keyword and I have not found a decent description of what this keyword does. This is the query:

SELECT
st.counter_id AS COUNTER_ID
,v_time_rowid AS TIME_ID
,ST.STAT_VAL AS STAT_VAL
,DENSE_RANK()

OVER (PARTITION BY ST.COUNTER_ID ORDER BY SM.SAMPLE_TIME DESC) AS DENSE_RANK
FROM vpx_sample_time1 sm,vpx_stat_counter sc, vpx_stat_def sd, vpx_hist_stat1 st
WHERE sm.rollup_counter=to_number(to_char(v_start_time_id)||to_char(v_cnt))
AND st.time_id = sm.time_id
AND st.counter_id=SC.COUNTER_ID
AND SC.STAT_ID=SD.ID
AND ((SC.DEVICE_ID =v_null_device_id and v_stat_rollup_level<3) or (v_stat_rollup_level>=3))
AND SD.STAT_LEVEL<=v_stat_rollup_level
AND SD.ROLLUP_TYPE=3)
WHERE DENSE_RANK=1

;

The one description of the Over command I have found has come from the Microsoft website:

“Determines the partitioning and ordering of the rowset before the associated window function is applied”

OK, so it is a union of two types of functionality: partitioning and ordering. The queries I am looking at use both. Let’s start with partitioning:

PARTITION BYDivides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

The window function in this place must somehow refer to DENSE_RANK. To me this looks like it is possibly trying to insert a subselect into the overall select. Here is how MSDN defines DENSE_RANK. I’d like to reiterate that these are Oracle functions I’m trying to understand, and Microsoft is doing all the work explaining them. When I tried executing this SQL Query in SQLPLUS via the Oracle 11 instant client, it balked:
SQL> 2 3 4 5 6 SQL> SP2-0734: unknown command beginning “OVER (PART…” – rest of line ignored.
SQL> SP2-0734: unknown command beginning “FROM vpx_…” – rest of line ignored. It parses fine when I do it as part of a plsql stored procedure.

Time to dig out the slightly larger guns: I picked up a copy of the SQL Cookbook by Anthony Moliaro from Our Friends in Sebastopol, CA. This covered both DENSE_RANK and the OVER keyword.

Over, it turns out, is one of a new set of functions added to SQL 2003 that the Standard calls Windowing functions. This means they operate on a subset of the results returned from the outside query. The best example I saw showed if you are selecting all employees from all departments, and next to each one you wanted to show how many employees were in that department, you could use count(*) with an over clause that specified PARTITION by department. This example is in the Cookbook.

DENSE_RANK () tells you how many rows come before the current row in some ordering scheme. The over clause is used to define that ordering scheme. If two rows have the same rank, they have equivalent values according to DENSE_RANK. For instance, if we have five people with five salaries, their DENSE_RANK of salaies will be 1 through five, highest to lowest. If the top two salaries were identical, top rank would return 1,1,2,3,4.

OK, so it looks like the original query wants to use DENSE_RANK on a subset of the returned rows of the query. The window will be defined by this portion of the clause: PARTITION BY ST.COUNTER_ID

And the order within this window will be determined by: ORDER BY SM.SAMPLE_TIME DESC

OK time to get serious about playing:

I’ve learned a little but since the above frustrations trying to run the stored procedure. Here’s what I do now:

run sqlplus. For me this is

/home/adyoung/apps/oracle/instantclient_11_1/sqlplus adyoung/adyoung@10.17.195.159/DRORACLE

Now make it so anything I print to the screen is visible:

set serveroutput on

Load my sql file in and execute it.

get myfile.sql

/

OK, I have a development method. To print debugging output:
DBMS_OUTPUT.PUT_LINE(‘Text Here’);
And we can debug. In the time honored tradition of always moving from success, I comment everything out but a simple output statement. In this case, I know that the real code uses a variable for the rollup level, so my File (minus commented out section, looks like this:
create or replace procedure stattest
is
v_stat_rollup_level int;
begin
v_stat_rollup_level := 2;
dbms_output.put_line(‘Value: ‘ || TO_CHAR(v_stat_rollup_level));
end stattest;
Which executes using the above method. Now I start adding in parts of the SELECT.
SELECT st.counter_id AS COUNTER_ID FROM vpx_hist_stat1 st;
Of course, it hates this. When I load it into sqlplus I get:
Warning: Procedure created with compilation errors.
Of course, in order to see these compilation errors, I have to jump through a couple more hoops:
SQL> set arraysize 1
SQL> set MAXDATA 6000
SQL> show errors procedure stattest
LINE/COL ERROR
——– —————————————————————–
6/1 PLS-00428: an INTO clause is expected in this SELECT statement
So I add a variable called maxcounter in the declare section and modify my select:
SELECT
max(st.counter_id) into maxcounter
FROM
vpx_hist_stat1 st;
dbms_output.put_line(‘Maxcounter: ‘ || TO_CHAR(maxcounter));
And I have a procedure that executes and does something. On to the rest of the select. After several iterations of trial and error, I have the SQL query wokring as a cursor. I’ve now figured out even a better way to work. I’ve modified my file like below, and can now cat it into the sqlplus interpreter:
set arraysize 1
set MAXDATA 6000
set serveroutput on

create or replace procedure stattest
is
v_stat_rollup_level int := 2;
maxcounter int := 0;

CURSOR MYCURSOR is SELECT
st.counter_id as COUNTER_ID,
st.STAT_VAL as STAT_VAL
FROM vpx_sample_time1 sm,
vpx_stat_counter sc,
vpx_stat_def sd,
vpx_hist_stat1 st

WHERE
st.time_id = sm.time_id
AND st.counter_id=SC.COUNTER_ID
AND SC.STAT_ID=SD.ID
AND SD.ROLLUP_TYPE=3
AND SD.STAT_LEVEL<=v_stat_rollup_level;

begin

open MYCURSOR;

FOR recname in MYCURSOR
LOOP
dbms_output.put_line(‘Value: ‘ || TO_CHAR(v_stat_rollup_level));
END LOOP;

dbms_output.put_line(‘Maxcounter: ‘ || TO_CHAR(maxcounter));

end stattest;

/

show errors procedure stattest

.

OK, I am going to bail here.  This post has been half written for a while, and is not going to get completed anytime soon.  I think I’d vote “Over” as something that would be useful in the next rev of postgres, if only for simplifying porting from Oracle.

Oracle to Postgresql part 1

Here are the steps I am going through to port some code from Oracle PL/SQL to PostgreSQL PLPGSQL.

Here is the first line in the Oracle version

create or replace procedure stats_rollup1_proc is

This becomes

create or replace function stats_rollup1_proc() returns int as $$

DECLARE

Now Postgres is not my fulltime focus, just a part of the overall job. a PG expert could probably do it better.

The things to note here:

  • procedure is not a keyword in plpgsql. Thus function and returns. I suspect I could return a void type, but haven’t looked that hard.
  • Postgres requires the text of a stored procedure to be qutoed. The $$ is a nice way to deal with the requirement.
  • DECLARE is optional in Oracle, but required in postgreSQL

At the end of the function:

end stats_rollup1_proc;

Becomes

return 0;

end /*stats_rollup1_proc;*/

$$ LANGUAGE plpgsql

I like leaving the comment in there to match the original begin, since the functions get long enough that it is hard to track. There is no harm in returning 0, even if we don’t really use it as a return code. The final $$ closes out the one from the start of the function. We have to specify the language used, as this same mechanism can be used for any of the languages embedded inside PostgreSQL. Yes, even Python.

Ok, for some mechanics.

In the Declare section of the oracle code we have:

cursor time_cur(current_time_in date) IS
select distinct sample_time
from VPX_SAMPLE_TIME1
WHERE ROLLUP_COUNTER IS NULL
AND SAMPLE_TIME < current_time_in-1/24
order by 1 asc;
v_time VPX_SAMPLE_TIME1.sample_time%type;

which is later used like this:

open time_cur(v_rollup_start_time);
loop
fetch time_cur into v_time;
exit when time_cur%notfound;

In PostgreSQL these can be inlined like this:

for v_time in
select distinct sample_time
from VPX_SAMPLE_TIME1
WHERE ROLLUP_COUNTER IS NULL
AND SAMPLE_TIME < current_time_in-1/24
order by 1 asc
loop

Although I have not yet figured out how to handle the notfound.

Oracle ODBC Setup on Ubuntu (32 on 64)

First of all, I am running a 64 bit Ubuntu 7.04 but I need a 32 Bit Oracle for the application I am using. I have a 32 bit chroot setup. The setup for this is beyond the scope of this article. This version of Ubuntu ships with unixodbc version 2.2.11-13 . There is a symbol, SQLGetPrivateProfileStringW, defined in later versions that the Oracle 11g driver requires. This symbol is not defined in unixodbc version 2.2.11-13. Thus, you have to use the 10.2 Oracle Drivers.

I Downloaded 3 files from The Oracle tech download page for 32bit Linux: the instantclient, SQL Plus, and ODBC packages. I unzipped these in my directory ~/apps/oracle32 Which now looks like this:

adyoung@adyoung-laptop$ pwd
/home/adyoung/apps/oracle32/instantclient_10_2
adyoung@adyoung-laptop$ ls
classes12.jar libocci.so.10.1 libsqora.so.10.1 ojdbc14.jar
genezi libociei.so ODBC_IC_Readme_Linux.html sqlplus
glogin.sql libocijdbc10.so ODBCRelnotesJA.htm
libclntsh.so.10.1 libsqlplusic.so ODBCRelnotesUS.htm
libnnz10.so libsqlplus.so odbc_update_ini.sh

I created an entry in /chroot/etc/odbcinst.ini:

[Oracle 10g ODBC driver]
Description = Oracle ODBC driver for Oracle 10g
Driver = /home/adyoung/apps/oracle32/instantclient_10_2/libsqora.so.10.1
Setup =
FileUsage =
CPTimeout =
CPReuse =

And another in /chroot/etc/odbc.ini

[EMO]
Application Attributes = T
Attributes = W
BatchAutocommitMode = IfAllSuccessful
BindAsFLOAT = F
CloseCursor = F
DisableDPM = F
DisableMTS = T
Driver = Oracle 10g ODBC driver
DSN = EMO
EXECSchemaOpt =
EXECSyntax = T
Failover = T
FailoverDelay = 10
FailoverRetryCount = 10
FetchBufferSize = 64000
ForceWCHAR = F
Lobs = T
Longs = T
MetadataIdDefault = F
QueryTimeout = T
ResultSets = T
ServerName = 10.10.15.15/DRORACLE
SQLGetData extensions = F
Translation DLL =
Translation Option = 0
DisableRULEHint = T
UserID = adyoung
StatementCache=F
CacheBufferSize=20

Once again, DSN and IP Address have been changed to protect the guilty. To test the datasource, run:

sudo dchroot -d LD_LIBRARY_PATH=/home/adyoung/apps/oracle32/instantclient_10_2 DataManagerII

To just test sqlplus connectivity, from inside the chroot, run:

./sqlplus adyoung/adyoung@10.10.15.15/DRORACLE

Note that using the instant client, no TNSNAMES.ORA file is required.

ODBC Setup on Ubuntu/Debian

For the base config tools:

sudo apt-get install unixodbc-bin unixodbc odbcinst1debian1

For Postgres

sudo apt-get install odbc-postgresql

Use the template file provided to setup the odbc driver entries:

sudo odbcinst -i -d -f /usr/share/psqlodbc/odbcinst.ini.template

And this sets up the sample DSNs. (all one line)

 sudo odbcinst -i -s -l  -n adyoung-pg -f /usr/share/doc/odbc-postgresql/examples/odbc.ini.template

Then modify /etc/odbc.ini to suit your DB.

For MSSQL Server and Sybase:

 sudo apt-get install tdsodbc

sudo odbcinst -i -d -f /usr/share/doc/freetds-dev/examples/odbcinst.ini

Unfortunately, this does not have a sample ODBC setup template.

The structure of a postgres database

Since my work at Penguin was primarily system work, I’ve not had to work with databases much for quite a while. Which is really a pity, since I love the elegance and simplicity of the interface, and the strength of the pre-and-post condition constraints. Anyone who has managed to push my Database hot button knows that my Database of choice is PostgreSQL. It is opensource to the core. The developers treated transactions processing as baseline, no a feature to be added later. It supports nested SQL statements (the S ins SQL is for “Structured”). I am happy to be plugging away at a Postgres Database again.

Part of my current project entails porting the backing store for an application from Oracle to Postgres. There is a nuke script that attempts to remove all objects from a database that have a certain prefix. I will change here to protect the guilty. Let’s pretend that my database is to hold overly emotional music references, and thus all of the tables, views, triggers, and so on are prefixed with ‘EMO’. The oracle script does this:

declare
sqlstmt varchar(200);
varowner varchar(50);
TYPE cur_type IS REF CURSOR;

objectcursor cur_type;

Begin
select USER into varowner from global_name;

open objectcursor for
select ‘drop ‘ || object_type || ‘ ‘ || owner || ‘.’ || object_name
|| decode(object_type,’TABLE’, ‘ cascade constraints’)
from all_objects
where owner = varowner
and (object_name like ‘EMO%’ or object_name like ‘%STAT%PROC’)
and object_type in (‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘SYNONYM’,
‘FUNCTION’,’PROCEDURE’, ‘PACKAGE’);

Loop
fetch objectcursor into sqlstmt;
exit when objectcursor%NOTFOUND;
execute immediate sqlstmt;
End Loop;
Close objectcursor;
End;

But tring to port this directly to postgres gets an error right at the beginning:

ERROR: syntax error at or near “varchar” LINE 13: sqlstmt varchar(200);

So I’ll bring this whole thing into a function instead. At then end of my script, I will just drop the function. This has the advantage that I can check the format of my code without having to execute it each time.

When working in an an unfamiliar language (an I’ve never really done PG\PLSQL before) I apply a simple approach. First, I get a “Hello, World” type program working. In this case, I used an article from ONLamp.com and got that to run. Next I strip away everything that is context specific until I get a piece of code that is minimal but syntactically correct. It looks like this:

CREATE OR REPLACE FUNCTION dbemo_nuke_postgres ()
RETURNS integer AS $$

declare

BEGIN
return 1;

End;

$$ LANGUAGE plpgsql;

Note that I modified the prefix so that it won’t Hit the match for EMO% in the like statement I am going to need later. Not sure what would happen if I dropped my function as I ran it. Most likely it would work fine, but why chance it?

Now I start adding in parts of the original function piece by piece. First step is to put in the declares clause. The postgres porting guide says I don’t need a cursor. I quickly learn that this goes between the function declaration and the BEGIN statement:

CREATE OR REPLACE FUNCTION dbemo_nuke_postgres ()
RETURNS integer AS $$

declare
sqlstmt varchar(200);
varowner varchar(50);
objectcursor RECORD;

BEGIN
return 1;

End;

$$ LANGUAGE plpgsql;

This is saved in the file EMO_nuke_postgres_part.sql. I run this from the command prompt:

adyoung@adyoung-laptop$ psql -f EMO_nuke_postgres_part.sql
CREATE FUNCTION

And we’re off. Now to the Oracle specifics.

select USER into varowner from global_name;

This command seems to be asking for info regarding the current user.

select ‘drop ‘ || object_type || ‘ ‘ || owner || ‘.’ || object_name
|| decode(object_type,’TABLE’, ‘ cascade constraints’)
from all_objects
where owner = varowner
and (object_name like ‘VPX%’ or object_name like ‘%STAT%PROC’)
and object_type in (‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘SYNONYM’,
‘FUNCTION’,’PROCEDURE’, ‘PACKAGE’)

This code is dynamically creating a drop statement for various objects in the schema. Where is this information stored in postgresql? Let’s take a look:

adyoung@adyoung-laptop$ psql
Welcome to psql 8.2.6, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
adyoung=# SELECT * from pg_
pg_aggregate pg_group pg_settings pg_stats
pg_am pg_index pg_shadow pg_stat_sys_indexes
pg_amop pg_indexes pg_shdepend pg_stat_sys_tables
pg_amproc pg_inherits pg_shdescription pg_stat_user_indexes
pg_attrdef pg_language pg_stat_activity pg_stat_user_tables
pg_attribute pg_largeobject pg_stat_all_indexes pg_tables
pg_authid pg_listener pg_stat_all_tables pg_tablespace
pg_auth_members pg_locks pg_stat_database pg_temp_1.
pg_autovacuum pg_namespace pg_statio_all_indexes pg_timezone_abbrevs
pg_cast pg_opclass pg_statio_all_sequences pg_timezone_names
pg_catalog. pg_operator pg_statio_all_tables pg_toast.
pg_class pg_pltemplate pg_statio_sys_indexes pg_trigger
pg_constraint pg_prepared_statements pg_statio_sys_sequences pg_type
pg_conversion pg_prepared_xacts pg_statio_sys_tables pg_user
pg_cursors pg_proc pg_statio_user_indexes pg_views
pg_database pg_rewrite pg_statio_user_sequences
pg_depend pg_roles pg_statio_user_tables
pg_description pg_rules pg_statistic
adyoung=# SELECT * from pg_

I know that metadata is stored in the pg_.* family of objects, so I use statement completion to show me what is there. Nothing is obvious to me yet, so I poke around at pg_tables.

adyoung=# \d pg_tables;
View “pg_catalog.pg_tables”
Column | Type | Modifiers
————-+———+———–
schemaname | name |
tablename | name |
tableowner | name |
tablespace | name |
hasindexes | boolean |
hasrules | boolean |
hastriggers | boolean |
View definition:
SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, t.spcname AS “tablespace”, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.reltriggers > 0 AS hastriggers
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
WHERE c.relkind = ‘r’::”char”;

Ah Cha! It is a view into a relation called pg_class that seems to hold all of the metadata. Let’s take a look at that:

adyoung=# \d pg_class
Table “pg_catalog.pg_class”
Column | Type | Modifiers
—————-+———–+———–
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
reltoastrelid | oid | not null
reltoastidxid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relkind | “char” | not null
relnatts | smallint | not null
relchecks | smallint | not null
reltriggers | smallint | not null
relukeys | smallint | not null
relfkeys | smallint | not null
relrefs | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhassubclass | boolean | not null
relfrozenxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
“pg_class_oid_index” UNIQUE, btree (oid)
“pg_class_relname_nsp_index” UNIQUE, btree (relname, relnamespace)

Lots of good stuff here. OK, we now have enough raw information to generate some real knowledge. In Postgres, everything is a class. To find out if something is a table you select only those fields with the right type:

adyoung=# SELECT distinct (relkind) from pg_class ;
relkind
———
S
i
r
t
v

Funny that it isn’t type ‘t’ for table, but that seems to be of type toast:

adyoung=# SELECT relname from pg_class where relkind =’t’ limit (3);
relname
—————-
pg_toast_10757
pg_toast_10762
pg_toast_10767
(3 rows)
OK, our original SQL seems to be creating a statment to then execute to drop for each class. WHat collumns it is selecting. Here’s our where clause.:

where owner = varowner
and (object_name like ‘EMO%’ or object_name like ‘%STAT%PROC’)
and object_type in (‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘SYNONYM’,
‘FUNCTION’,’PROCEDURE’, ‘PACKAGE’).

Let’s start with the like clause.

adyoung=# SELECT count(relname) from pg_class where relname like ‘vpx%’ or relname like ‘%STAT%PROC’;
count
——-
228

Looks good. But S, i, r, t v doesn’t seem to cover the array we want. I don’t see any functions. Let me chgeck the pg_proc:

adyoung=# \d pg_proc
Table “pg_catalog.pg_proc”
Column | Type | Modifiers
—————-+———–+———–
proname | name | not null
pronamespace | oid | not null
proowner | oid | not null
prolang | oid | not null
proisagg | boolean | not null
prosecdef | boolean | not null
proisstrict | boolean | not null
proretset | boolean | not null
provolatile | “char” | not null
pronargs | smallint | not null
prorettype | oid | not null
proargtypes | oidvector | not null
proallargtypes | oid[] |
proargmodes | “char”[] |
proargnames | text[] |
prosrc | text |
probin | bytea |
proacl | aclitem[] |
Indexes:
“pg_proc_oid_index” UNIQUE, btree (oid)
“pg_proc_proname_args_nsp_index” UNIQUE, btree (proname, proargtypes, pronamespace)

Not a view. OK:

adyoung=# SELECT proname from pg_proc where proname like ‘vcdb_nuke_postgres’ ;
proname
——————–
vcdb_nuke_postgres
vcdb_nuke_postgres
(2 rows)
We know that my stuff is in there.

More on this to come…