Database03 Feb 2010 02:17 pm

One undervalued aspect of Data modeling is that you actually get time to consider the form of the data before you get the data. In a Map reduce job, you kow that your map phase is going to get the data, and that it is not going to be normalized . I could have said, not likely to be normalized, but the reality is that if you are using Map-Reduced, you are not going to get structured data.

The Map step is where you deal with this. You take the data in its CLOB form and you turn it into a series of key-value pairs. Strictly speaking, this isn’t a map, it is a relation. In a map Every element of the domain has a single element in the codomain, or range as I learned it. In Hadoop and Map reduce, there is no restriction that a given key always return a unique value, although I suspect that in practice it probably should. Actually, since all of the values for a given key are collected into a list, technically you do get a map, just not at the end of this stage…and really no where in the system do you ever see all the elements of that map. Just a sublist.

Regardless of the mathematical correctness of the term “map”, a Map reduce program has an step which is responsible for creating a structured representation from an unstructured. This is very similar to what a developer does when they have to take some data format and decide how to store it in a DBMS. The assumption is that the DBMS is necessary for processing the events afterwards.

Thus a Map-Reduce operation both defers the cost of normalizing the data, but then potentially pays it multiple times. When using a RDBMS, you pay the price for normalizing the data upon data entry which is then amortized over all the queries of the data. Thus the comparison between Map Reduce and SQL can be viewed as an economic decision.

Database and Networking and Sysadmin28 Jan 2010 08:30 am

Say you want to set up postgres for use with a web application. If you are running on the same server here’s what you need to do:

If the technology you are using is smart enough to use the domain socket for local connections, in

/var/lib/pgsql/data/pg_hba.conf
Apply the following diff:

# “local” is for Unix domain socket connections only
+local myapp-db myapp-user password
local all all ident sameuser

If, on the other hand, you need tcp connections (which is the case for jdbc)  you probably want this:

# IPv4 local connections:
-host    all         all         127.0.0.1/32          ident sameuser
+host    all         all         127.0.0.1/32          md5

Although, again, you should probably change “all all” to be specific to your application.

You need to restart the postgres database server to have these changes take effect.

To test this, you want to run the following command

sudo -u postgres /usr/bin/createuser –pwprompt  myapp-user

This will create the user you want, and prompt you for a password.  To log in locally, use.  To create the database itself:

sudo -u postgres /usr/bin/createdb myapp-db “My App backend data
storage” -O myapp-user

To test local connections (domain socket) run

psql myapp-db -U myapp-user

You should be prompted for the password

To test tcp connectivity, run

psql -h localhost myapp-db -U myapp-user

And again, you should be prompted for your password.  Some alternative tests to try, to make sure you “get it.”

  • Create an alternative database as the same user as you application user.  Make sure that Postgres rejects that account from psql when connection using a domain socket.
  • Attempt to connect to the alternative database as a remote user.  You should be allowed in.
  • Try this from a remote machine.  You should be rejected across the board.

To add an entry to the pg_hba.conf file allowing a specific remote machine to connect should look  like this:

host    myapp-db         myapp-user         192.168.1.1/32          md5

Tested only on RHEL5 and Fedora11, but this should work for Linux based PostgreSQL setups.  I suspect Windows as well, but I have not tested it.   The path to the config file will be very different.

Software12 Jan 2010 09:29 am

What follows is the results of a brainstorming session on items that should be in a code review checklist.  As you can see, it needs refining and grouping.  Please feel free to add comments with any items you think should be on it, with any organizational approaches, or any criticism.  Right now, I want to focus on inclusive instead of exclusive, so please don’t recommend removing things:  that willl happen later.

  • Does this code Swallow any exceptions (Bad)
  • Does this code Create any new unnecessary dependencies (Bad)
  • Does this code Introduce any performance bottlenecks (Bad)
  • Is this code threadsafe? (Good)
  • If this code is used only in a single thread, does is use any synchronization?(Bad)
  • Are data transfer objects serializable(Good)
  • Have a unit test(good)
  • Have a functional test(Good)
  • was the unit test run at 100%(Good)
  • Was the functional test run, were there any additional test failures(Bad)
  • Does this code change a public API. (Bad) If so is the change backwards compatable (Good)
  • Does this code have any functions that are more than 30 lines. (Bad)
  • Does this code have any magic numbers or string literals (Bad)
  • Does this code use appropriate internationalization mechanisms for all text visible to the end user (Good)
  • Does this checkin remove Lava code?
  • Is this code platform specific? Should it be?
  • Is this code intended to run on the server, the client, or the managed platform.
  • Does this code reproduce functionality that is done elsewhere in the code base.
  • Does this code use an eternal library with an incompatble library.
  • If this code uses a scripting language, does it hide errors in type safety?
  • Is this code reusable?
  • Does this code go against the coding style of the rest of the project?
  • Has this code been reviewed?
  • Does this code implement the design specified?
  • Has any user interface been reviewed by UX?
  • Has any Database interaction been reviewed by a DBA?
  • Does this code introduce any network roundtrips?
  • Is the message size for any network communication larger than an ethernet packet frame.
  • Can this code gracefully handle a network failure?
  • Does this code use any deliberate casting?
  • Does this code use aany APIs that will not be available to it at runtime?
  • Is this code understandable by someone that is not on the project?
  • Do all files have appropriate Copyright and license headers?
  • Do all public APIs have appropriate Javadoc/Doxygen/perldoc information?
  • Does this code introduce any unnecessary complexity?
  • Does this code work based on undocumented assumptions?
  • Have you made any changes in the code since the last time you ran through the unit and functional tests?
  • Have you stepped through the code in a debugger?
  • Are all reads and writes performed completely, or is there the possibility of missing information?
  • Are any created classes usable when just their contructors have been called, or do they require additional property sets afterwards?
  • Are all resources released when they are no longer needed?
  • Are fields that cannot be change tagged final/const?
  • If this class is going to be called via the bean api , are the appropriate fields exposed via getters and setters?
  • Do property setters handle null? Is null a valid option for them?
  • Must any code in this checkin live inside a transaction boundary?
  • Does any of this code directly manipulate a resource that is supposed to be encapsulated inside some other class or abstraction?
  • Does this code handle all possible exceptions that can be triggered by the code it calls into?
  • Does the code follow the project’s coding standard?
  • Have all files been run through a code formatter set to project specifications?

Contributions from Victor  Erminpour

  • Have we run static analysis tools on this code?
  • Does this code introduce any unintended side effects?
  • Does this code exist somewhere else?
  • Is this code generic and maintainable (i.e., if someone changes a class member, will my function still work?) .
  • What’s the performance impact of the code?
  • Can it be optimized?
  • Is the code secure?
  • Does it introduce and buffer/heap exploits?
Family and History02 Dec 2009 06:12 pm

The members of the team had rolled out the resilite mats in the back gym. The air was barely heated, so they had been hard to the touch as the boys rolled them in three straight sheets. The kinetic energy of a pair of teenage boys transferred to the friction of the shoes applied a sheering force that would separate untaped mats. That was acceptable during a normal practice, when the mats would be shared by a half dozen pairs at once. During a real match they would be taped together, to prevent them from separating during the bouts. The tape was an expense that the cash strapped athletic department wouldn’t waste on a practice. But there was no risk of separation during the opening half of this practice. The mats were rimmed with spectators, the members of the team focused on the two participants in the center. During a normal practice, the mats might be rolled out with either side up. The lesser used side had five circles, laid out like the dots on a die showing 5.


Continue Reading »

Family and History28 Nov 2009 06:47 pm
edith-ambrose-nursery-1975

Edith Ambrose Nursery School 1975.

JBoss and Java24 Nov 2009 07:28 am

These are my notes on how to reverse engineer what tags are doing in a JSF application. In this case, I am trying to figure out what are the classes behind the Configuration tags in the RHQ.  I am trying to figure out what is being done by the tag

onc:config

This tag is activated with the following value at the top of the page:

xmlns:onc=”http://jboss.org/on/component”

To Figure out what this tag means, I look in WEB-INF/web.xml.  The web.xml value

facelets.LIBRARIES

Lets me know where the file is that defines the acceptable tags I can add to an xhtml page for this component.

/WEB-INF/tags/on.component.taglib.xml

This taglib defines the values

tag-name config
component-type org.jboss.on.Config
renderer-type org.jboss.on.Config

Note that these are JSF component names, and not Java class names.  To resolve these to Java classes, we need to find the mappings.  The mapping files are defined in web.xml under the entry:

javax.faces.CONFIG_FILES

In particular, I found what I wanted in

/WEB-INF/jsf-components/configuration-components.xml,

The values I care about are:

component-type org.jboss.on.Config
component-class org.rhq.core.gui.configuration.ConfigUIComponent

and the renderer for Config and ConfigurationSet components

component-family rhq
renderer-type org.jboss.on.Config
renderer-class org.rhq.core.gui.configuration.ConfigRenderer

This render extends javax.faces.render.Renderer.  This is a Flyweight that parses and generates HTML.  It has two main methods: decode an encode. decode parses the request that comes in, encode injects html into the response that goes out.

Decode appears to be called only on a post.  Encode seems to be called even on a get.

Mnemonics and Music21 Nov 2009 07:15 pm

To Learn the Alphabet backwards, you need to have a mnemonic.  The mnemonic for learning it forward is Twinkle Twinkle Little Star.  We can use that same song to sing it backwards.  Here is the grouping

ZYX

WV

UTS

RQP

ONM

LKJ

IHGF

EDCBA

Now I’ve said my  my Z to A

Think I’ll go outside and play

Hers how the phrases of the original map to the backwards

ABCD  -> ZYX

EFG->WV

HIJK->UTS

LMNOP->RQP

QRS->ONM

TUV->LKJ

WX->IHGF

YZ->EDCBA

Don’t be surprised that some of the phrases are longer or shorter than you sing in the orignial, They don’t even have to match the number of syllables.  In many cases, you either drop the last note of the phrase, or double up on nodes.   For instance, the note where you sing ‘D’ does not have an analogue in the backwards one, as it is really a grace note between the BC phrase and the EFG phrase.

Lyrics and The Princess Bride19 Nov 2009 06:38 pm

Tyrone you are an artist of the ultimate degree
Your work I say if the truth be told
Is awesome, terrible a marvel to behold
Unfortunately I haven’t even one half hour free

So much to do but yet so little time
trying to plan all of the festivities
the country of Florin’s anniversary
Ensuring the wedding will just be devine
I can’t do everything I want
I’m Swamped

In the parade all the nobles will march
and if Duke follows baron the unrest will spread
And all these decisions will fall on my head
Which villain to hang from the victory arch
Everybody has something they want
I’m Swamped

Then theres the seating inside of the church
Ensuring we have the right food for the feast
Who will get Salmon or Chicken or Beef
None of the Nobles can be left in the lurch

Seating my Uncle Near my Aunt
I’m swamped

Rugen:
It certainly is hard waiting to be king
If I might be so bold as to suggest
set aside some time to get sufficient rest
If your’ve not your health you don’t have anything

You look just a little gaunt
You’re swamped

C++ and Java17 Nov 2009 07:42 am

As I move more and more towards Immutable objects, I find myself extracting the building logic out from the usage of an object on a regular basis.  The basic format of this pattern is

mutable object =>  immutable object + builder

There are two aspects to using an object: creating it, and reading its state. This pattern splits those two aspects up into different object, providing a thread safe and scalable approach to sharing objects without restoring to locking. Modifications of the object are reflected by creating a new object, and then swapping out the immutable

There are two ways to go about making the builder. If the object is created as part of user interface or networking code, it make sense to use a generic map object, and in the build method, confirm that all required properties are there.
so code like:

if (inputStringKey.equals(‘x’)) obj.setX(inputStringValue);

becomes

builder.set(inlutStringKey, inputStringValue)
obj = builder.build();

If, on the other hand, the original object tends to be modified by other code, it makes sense to split the object so that the setters are on the builder object.  Thus the above code becomes:

builder.setX( x);
obj = builder.build();

The immutable object shouldn’t have getters, you should use public properties for them, make those properties public and immutable themselves. In Java, this means tagging them as final, in C++ make them const.

It the immutable object is fetched from a data store, there are two approaches to updating that datastore.  The simplest and least flexible is to have the builder update the store as part of the build function.  This mixes functionality, and means that you can only update a single object per transaction, but it does increase consistency of viewing the object across the system.  The more common approach is to have a deliberate call to store the immutable object in the datastore.  Often, this is a transactional store, with multiple immutable objects sent in bulk as an atomic commit.

Here are the steps to perform this refactoring:

1. Append the word builder to the class name

2. Create a new internal class for holding the immutable state of the the object and give it the name of the original object.  Give  the builder and instance member field of the internal type.  The internal class should have a no-args constructor.

3.  Push all fields down to the new class by  moving the fields .

4. Add a method called build that returns the internal type.

5.  Replace all calls of the form getProperty()  with

x=  builder.build() ;

x.property;

It is important to use the local variable, and to share it amongst all the places in the code that reference the local object.  If you don’t you will end up with one instance per reference, probably not what you want.

6.  Remove the getProperty methods on the builder.

7.  Give the internal class a constructor that takes all of its fields as parameters.  It should throw a standard exception if one of the parameters is null.

8.  Change the build method so that it returns a new instance, that gets all of its fields set via the new constructor.  This instance should just use the fields in the internal instance of the (soon-to-be) immutable inner object.  You will have to provide dummy initializers for the newly immutable fields.  The build method should thrown an exception if any of the values are missing or invalid.  It is usually required that you return all errors, not just the first one.  Thus this exception requires a collection of other errors.  These can be exceptions, or some other custom type, depending on the needs of your application.

9.   For each setProperty method on the builder, provide a copy of the field in the buidler object.  Make the corresponding field in the inner object immutable, and change the build method to use the field in the builder instead.

10. When you have finished step 9, you should have an immutable inner object. Provide an appropriate copy constructor.

11. Remove the no-args constructor from the immutable object.

12. remove the default values for the immutable fields.

You can now create a new builder object based on a map.  The keys of the map should be the names of the fields of the builder.  The Build method pulls the elements out of the map and uses them as the parameters for the constructor. This approach showcases one of the great limitations of Java introspection:  Parameter names are lost after compilation.  We only have access to the types and the order of the parameter names.  Thus, maintaining this map would be error prone.

A more performant approach is to extend the builder object above with a setProperty(String, String) method that sets the values of the builder fields directly.

Any Java object can act as a map if you use introspection.  Thus, you could do what the Bean API does and munge the key into the form “setX” by changeing the case on the first letter of the key name, and then calling

this.getClass().getMethod()

You could also use  property introspection like this:

this.getClass().getProperty(key)

Since you are using introspection, even though you are inside the class that should have access to private members, Java treats you as an outsider.  You can either drop permissions on the fields at compile time by making them public, or do so at runtime using one of various hacks.

This is one case where it makes sense to make the member fields public.  There is no information hiding going on here.  There may actually be some client codethat is better off calliing

builder.property = x

Than

builder.setProperty(x)

In C++, we have fewer choices, as there is no way either at run time nor at compile time to provide an iteration through the fields of a class.  The best you can do is to create a map of functors.  The keys of the map are again the fields of the builder, the values are functions which set the fields.  You end up with a setProperty function on the builder that looks like:

void setProperty(String& key, Strin& value){

propsetmap[key](value);

}

although with logic to handle erroneous keys.

A builder is a short lived, single threaded, mutable object.  It is up to the calling code to provide enough data to populate the builder.  The builder pattern works nicely with  inversion of control frameworks.  Code that uses an object should not call the builder directly, but rather fetch the object from the framework, where as code else where builds the object and adds it to the container.

If your code has interspersed sets and gets of properties, it is going to be really difficult to introduce the builder.  Chances are you are going to want to do additional refactorings to separate the concerns in your code.

Software13 Nov 2009 10:33 am

Penguin Computing has graciously provided their version of BProc to the
project. Special Thanks goes to John Hawkes, the Director or Software
Engineering and Andreas Junge VP of Engineering at Penguin for making this
happen. In the next couple of weeks, we’ll work at getting some packages
made from the code and posted on the files portion of the Sourceforge
site.

The code is targeted at a RHEL 5.4 Kernel, due to development priorities
at Penguin. Expect to see a set of RPMs for the Kernel and the userland
codes for the unified process space code.

For now, we are going to focus on the kernel side of development. Our
goal is to get the necessary changes into the upstream Kernel. We will
begin work shortly on a version of BProc that targets the current
development Linux Kernel.

As such, the beoboot and related code is going to be left untouched in the
CVS tree.

The Git Repository is live here:

http://bproc.git.sourceforge.net/git/gitweb-index.cgi

Expect it to go through some changes in the future as we get it cleaned up
and ready for distribution.

I still have an emergency hold on all email traffic to this list due to
the high percentage of SPAM. I will pass through all message that come
from real human beings, and that are regarding the BProc code base.

The code is very different from the last released version of bproc 4.0. I
will post more details in the future about how to get a Kernel built, why
we made the decisions we did, and what the plan is for the future. I am
very excited about this project, and hope to get some community momentum.

Thanks Again to everyone who made this happen.

Next Page »