Make Javascript a first class citizen
Instead of using Google Windowing Toolkit, Rich Faces, Django, or any other server side technology to perform build the user interface, we should use Javascript and a client side library like JQuery.
Make Javascript a first class citizen
Instead of using Google Windowing Toolkit, Rich Faces, Django, or any other server side technology to perform build the user interface, we should use Javascript and a client side library like JQuery.
Design guidance for dependency injection
Dependency injection can throw you into analysis paralysis. Here are some rules of thumb.
Either a dependency is assigned for the lifetime of an object or it is passed in as the parameter of a method.
The mechanism that performs dependency injection is itself a dependency. Limit its scope to the edges of a use case.
I’ve worked on numerous distributed systems. Two of the abstractions used in Java distributred systems are Remote Method Invocation (RMI) Java Message Service (JMS). Both have their uses, and inversion of control should play nicely with either.
In both cases, the edges of the use case are when remote requests come across the wire. JMS vary naturally provides a tie in with an inversion of control container in the mapping of a newly received message to the code that is supposed to handle it:
HandlerMessageHandler = context.get(message.getType());
In an RMI system, each remote may provide a tie in to the context. If there are no resources that are allocated solely for the purpose of processing the request, there is no need to create a new context. On the other hand, Remote objects tend to be long lived, and often need access to resources only for a short time. Thus the remote method will often need to create a thread scoped context for object resolution. While this context can be provided by a proxy, this leads to a really awkward setup where the Remote object knows nothing about the context. If objects downstream from the remote object need access to the context, they haveto get it by Magic, and you end up with the same type of nasty code that you get in most JEE applications: hard coded factories, JNDI lookups and the like. If the client calls:
remoteObject->munge(myMessage);
The remote object has code like:
void munge(MyMessage){
Resource r = ResourceFactory.getInstance().create();
}
One alternative is to have the dependencies passed in to the object that implements the remote interface. The awkwardness now is that the caller and implementer have two different contracts.
The client calls
remoteObject->munge(myMessage);
But the remote object implements
void munge(MyMessage myMessage, Resource resource);
Injection at the edges would lookl ike this:
void munge(MyMessage myMessage){
munge( myMessage, new Context<Resource>().get( ));
}
Here the remote object has transparency into the creation of the object. In order to write a unit test, we can call on the two parameter version of munge with a mock Resource object. The main difference between the Context version and the Factory version is the unification of the object creation mechanism in the Context version.
As an aside: If You need to have a dependency for a really short point in a time on the interior of a use case, you can use a lazy-load proxy. I don’t advise this, but it is an option. The first problem with this approach is that it doesn’t provide a clean way to clean up once the object is no longer required. The second is that object creation can fail, and the calling object may not cleanly handle that.
My “Two Main Problems With Java” Rant
This is not an Anti-Java rant Per Se. It is a rant about the two main things missing from the language that force people into code heavy work-arounds.
Java has two flaws that hurt programmers using the language. The first is that the reflection API does not provide the parameter names for a function. The second is that Java allows null pointers. This article explains why these two flaws are the impetus for many of the workarounds that require a lot of coding to do simple things. This added complexity in turn leads to code that is harder to maintain and less performant.
An object in Java does not have any of the features of many true “Object Oriented” programming languages. You can’t add properties or methods to an object after it has been created. You need another abstraction for that kind of stuff: the map. But Java provides introspection of the objects that make them map like. An Object in Java is the “realization” of a Class, which is a set of rules. The class exists to allow the programmer to define new rules about what a set of objects will do. The idea is that the Class is the primary abstraction available to the programmer. An Object has pre-conditions and post-conditions for any operations: this will be true before and after this method is called. These invariants are enforced by the Class of the Object.
This is the theory. In practice, most Java classes violate this. Java has one part of the problem built in to the language design of Garbage collection. In C++ it is pretty common for an object to represent a resource. Create the object, allocate the resource. Free the object, release the resource. But garbage collection comes with a price: you don’t know when your objects are freed, which means you can’t tie your resources to their object lifespans. This is unfortunate, but it is a limitation of the language with which I can live.
However, just because we can’t tie clean up with resource release doesn’t mean we should allocate invalid objects. However, this is done all over the place. Lets look at the dependency injection model called setter injection. Create an object, using the null constructor, and then call set, set ,set, and when you are done, you have an initialized object. Note that type 1, or interface injection, is really just a more type safe way to do the same thing. There is no way of telling what is the minimum amount of work we have to do to get a valid object. Do we have to call set on all properties? The language already has a mechanism for answering this question. THat is what the constructor is supposed to do. Type 3 injection, constructor injection, then, looks like it should be the default way to go. Why is it then so underused?
Imagine a language that gave you map, but no way to use the key. You could enumerate through all of the values, check their types, do all sorts of cool things, but you couldn’t look up values from the key. Programmers would probably complain? Yet the introspection of parameters in a java.lang.reflect.Method is limited to Types, not the names themselves. The same is true of a java.lang.reflect.Constructor object. We can get a collection of types, even a collection of annotations, but not a simple collection of strings for the names. even if we did, there would be no way to create match that value with the object passed in as the parameter.
Assume that you want to create an object of type DatabaseConnection. To create this, you need a user ID, a password, and a JDBC URL. Three strings. To those of you who use objects like this regularly, you’ll notice that I changed the order. The JDBC API usually has it as URL, Uid Password. If all you know is that your API takes three strings, what order do you put them in? You have to read the API docs. Which is really not that useful if we want to make this an automated process. Ideally, the names of the parameters in the constructor would tell us which is which.
Note that if we used a specific type for UID, Password, and URL, we would have a guaranteed solution: match the types of the parameters with the types of the objects that fill the dependencies. But as soon as you have two objects of the same type, or any amount of casting, the policy becomes non-deterministic.
C++ aside: C++ suffers from this just as much as Java, but C++ doesn’t even pretend to provide as much run time introspection that the failure there is just as bad. Interesting to note that in modern compilers, C, and by extension, C++ has allowed named parameters for structures, which can be used for this type of introspection, albeit a very chatty and non-runtime type. Any one that suggest trying to demangle C++ functions will quickly see that A) any solution is non-portable and B) you lose the parameter names anyway.
Java has one other critical failing. Null pointers. If Java required that all references had a valid object connected to it, most of the justification for the Bean API would fall away. If we defaulted most properties to final, the majority of objects would be immutable, and a whole slew of concurrency exceptions would fall away. We would then just have to deal with the cases that an property was supposed to always exist, but be mutable. This is why we have classes in the first case, and so these types of classes would be more common: Wrap a primitive, but provide additional rules about what values it can assume. Without Null pointers, there would be no need for the Bean API.
Note that it would be easy to simulate a null pointer using a collection, or even an iterator. A Collection would be empty. An iterator would throw an exception to indicate that there was no “next” object. This kind of Null pointer exception would be the exception, not the rule.
These two rules: “no null objects” and “parameter type info” would significantly reduce the quantity of code written in Java while increasing reliability and correctness.
A programming language is a tool. When choosing the right tool for the job, you want to have good information about it. I’ve worked with both C and Java, and have dealt with a lot of misconceptions about both. I’m going to try and generate some data to use in helping guide discussions about the different languages. Consider this, then as the next instalment of my comparison of programming languages that I started in my IPv6 days.
This article will start with the simplest of comparisons: what is the overhead of starting a process in C and Java. Here’s my setup:
I am currently running Fedora 11.
My gcc version is 4.4.1 20090725 .
My version of Java is 1.6.0_0 from OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
Here is my C Code:
int main(){
while(1){};
return 0;
}
It was compiled with:
gcc noop.c -o noop
Here is my Java Code
public class NoOp{
public static void main(String[] args){
while(true){}
}
}
To compile Java that Java code I ran
javac NoOp.java
I then ran both:
./noop &
java NoOp &
Looking at the basics:
diff /proc/12067/status /proc/12069/status
1,4c1,4
< Name: noop
< State: R (running)
< Tgid: 12067
< Pid: 12067
—
> Name: java
> State: S (sleeping)
> Tgid: 12069
> Pid: 12069
12,13c12,13
< VmPeak: 3884 kB
< VmSize: 3756 kB
—
> VmPeak: 2559076 kB
> VmSize: 2493592 kB
15,17c15,17
< VmHWM: 308 kB
< VmRSS: 308 kB
< VmData: 40 kB
—
> VmHWM: 12304 kB
> VmRSS: 12304 kB
> VmData: 2451848 kB
19,22c19,22
< VmExe: 4 kB
< VmLib: 1548 kB
< VmPTE: 32 kB
< Threads: 1
—
> VmExe: 32 kB
> VmLib: 10632 kB
> VmPTE: 228 kB
> Threads: 12
28c28
< SigCgt: 0000000000000000
—
> SigCgt: 0000000181005ccf
37,38c37,38
< voluntary_ctxt_switches: 1
< nonvoluntary_ctxt_switches: 506847
—
> voluntary_ctxt_switches: 2
> nonvoluntary_ctxt_switches: 2
There are two things that jump out. First, memory usage for both processes seems incredibly high. Fora no-op C program to require, at any point in its lifespan, 3884 kB seems quite high. The Java one, at a massive 2559076 kB borders on the absurd. Java does have the excuse that the Application Java has certain parameters that are set for minimum and maximum memory usage, so it is possible that a good chunk of that memory was allocated by a system policy.
Another thing that jumps out is the context switches. Something is forcing the C program to switch roughly 50k times. The Java program has no such switching.
For the number of files pulled we have, for Java
cat /proc/12069/maps | awk ‘{print $6}’ | sort -u
/lib64/ld-2.10.1.so
/lib64/libc-2.10.1.so
/lib64/libdl-2.10.1.so
/lib64/libm-2.10.1.so
/lib64/libnsl-2.10.1.so
/lib64/libnss_files-2.10.1.so
/lib64/libpthread-2.10.1.so
/lib64/librt-2.10.1.so
/lib64/libz.so.1.2.3
/tmp/hsperfdata_ayoung/12069
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/bin/java
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/jli/libjli.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/libjava.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/libverify.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/libzip.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/native_threads/libhpi.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext/gnome-java-bridge.jar
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext/pulse-java.jar
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
Whereas the C program has
cat /proc/12067/maps | awk ‘{print $6}’ | sort -u
/home/ayoung/devel/noop/noop
/lib64/ld-2.10.1.so
/lib64/libc-2.10.1.so
In both cases I removed some garbage from the input, but left the list of files.
OK, some one thing worth noticing is that Java holds on to the C libraries for dynamic library loading and for threading. Perhaps a better comparison would include those, too.
Map Reduce is kinda like “Normalize on the Fly”
One undervalued aspect of Data modeling is that you actually get time to consider the form of the data before you get the data. In a Map reduce job, you kow that your map phase is going to get the data, and that it is not going to be normalized . I could have said, not likely to be normalized, but the reality is that if you are using Map-Reduced, you are not going to get structured data.
The Map step is where you deal with this. You take the data in its CLOB form and you turn it into a series of key-value pairs. Strictly speaking, this isn’t a map, it is a relation. In a map Every element of the domain has a single element in the codomain, or range as I learned it. In Hadoop and Map reduce, there is no restriction that a given key always return a unique value, although I suspect that in practice it probably should. Actually, since all of the values for a given key are collected into a list, technically you do get a map, just not at the end of this stage…and really no where in the system do you ever see all the elements of that map. Just a sublist.
Regardless of the mathematical correctness of the term “map”, a Map reduce program has an step which is responsible for creating a structured representation from an unstructured. This is very similar to what a developer does when they have to take some data format and decide how to store it in a DBMS. The assumption is that the DBMS is necessary for processing the events afterwards.
Thus a Map-Reduce operation both defers the cost of normalizing the data, but then potentially pays it multiple times. When using a RDBMS, you pay the price for normalizing the data upon data entry which is then amortized over all the queries of the data. Thus the comparison between Map Reduce and SQL can be viewed as an economic decision.
Basic Postgres config for remote access
Say you want to set up postgres for use with a web application. If you are running on the same server here’s what you need to do:
If the technology you are using is smart enough to use the domain socket for local connections, in
/var/lib/pgsql/data/pg_hba.conf
Apply the following diff:
# “local” is for Unix domain socket connections only
+local myapp-db myapp-user password
local all all ident sameuser
If, on the other hand, you need tcp connections (which is the case for jdbc) you probably want this:
# IPv4 local connections:
-host all all 127.0.0.1/32 ident sameuser
+host all all 127.0.0.1/32 md5
Although, again, you should probably change “all all” to be specific to your application.
You need to restart the postgres database server to have these changes take effect.
To test this, you want to run the following command
sudo -u postgres /usr/bin/createuser –pwprompt myapp-user
This will create the user you want, and prompt you for a password. To log in locally, use. To create the database itself:
sudo -u postgres /usr/bin/createdb myapp-db “My App backend data
storage” -O myapp-user
To test local connections (domain socket) run
psql myapp-db -U myapp-user
You should be prompted for the password
To test tcp connectivity, run
psql -h localhost myapp-db -U myapp-user
And again, you should be prompted for your password. Some alternative tests to try, to make sure you “get it.”
To add an entry to the pg_hba.conf file allowing a specific remote machine to connect should look like this:
host myapp-db myapp-user 192.168.1.1/32 md5
Tested only on RHEL5 and Fedora11, but this should work for Linux based PostgreSQL setups. I suspect Windows as well, but I have not tested it. The path to the config file will be very different.
What follows is the results of a brainstorming session on items that should be in a code review checklist. As you can see, it needs refining and grouping. Please feel free to add comments with any items you think should be on it, with any organizational approaches, or any criticism. Right now, I want to focus on inclusive instead of exclusive, so please don’t recommend removing things: that willl happen later.
Contributions from Victor Erminpour
The members of the team had rolled out the resilite mats in the back gym. The air was barely heated, so they had been hard to the touch as the boys rolled them in three straight sheets. The kinetic energy of a pair of teenage boys transferred to the friction of the shoes applied a sheering force that would separate untaped mats. That was acceptable during a normal practice, when the mats would be shared by a half dozen pairs at once. During a real match they would be taped together, to prevent them from separating during the bouts. The tape was an expense that the cash strapped athletic department wouldn’t waste on a practice. But there was no risk of separation during the opening half of this practice. The mats were rimmed with spectators, the members of the team focused on the two participants in the center. During a normal practice, the mats might be rolled out with either side up. The lesser used side had five circles, laid out like the dots on a die showing 5.
These are my notes on how to reverse engineer what tags are doing in a JSF application. In this case, I am trying to figure out what are the classes behind the Configuration tags in the RHQ. I am trying to figure out what is being done by the tag
onc:config
This tag is activated with the following value at the top of the page:
xmlns:onc=”http://jboss.org/on/component”
To Figure out what this tag means, I look in WEB-INF/web.xml. The web.xml value
facelets.LIBRARIES
Lets me know where the file is that defines the acceptable tags I can add to an xhtml page for this component.
/WEB-INF/tags/on.component.taglib.xml
This taglib defines the values
tag-name config
component-type org.jboss.on.Config
renderer-type org.jboss.on.Config
Note that these are JSF component names, and not Java class names. To resolve these to Java classes, we need to find the mappings. The mapping files are defined in web.xml under the entry:
javax.faces.CONFIG_FILES
In particular, I found what I wanted in
/WEB-INF/jsf-components/configuration-components.xml,
The values I care about are:
component-type org.jboss.on.Config
component-class org.rhq.core.gui.configuration.ConfigUIComponent
and the renderer for Config and ConfigurationSet components
component-family rhq
renderer-type org.jboss.on.Config
renderer-class org.rhq.core.gui.configuration.ConfigRenderer
This render extends javax.faces.render.Renderer. This is a Flyweight that parses and generates HTML. It has two main methods: decode an encode. decode parses the request that comes in, encode injects html into the response that goes out.
Decode appears to be called only on a post. Encode seems to be called even on a get.