What Jar File

When working a with a new project, I often find I am searching for the Jar files that fulfill a dependency. Sometimes they come from maven, sometimes from the Fedora RPMS. My approach has been to make a cache of the Jar files in the directories that I care about that contains a map from jar file name to class name:

#!/bin/bash

CACHE_FILE=/tmp/jarcache

echo > $CACHE_FILE

for DIR in /usr/share/java /usr/lib/java
do
    for JAR in `find $DIR -name \*.jar`
    do
        #only do the non-symlinked versions
        if [ -f $JAR ]
        then
            for CLASS_FILE in `jar -tf $JAR | grep \.class`
            do
                CLASS=`echo $CLASS_FILE | sed 's!/!.!g'`
                echo $JAR $CLASS >> $CACHE_FILE
            done
        fi
    done
done

Then call it this way:

grep "org.mozilla.jss.ssl" /tmp/jarcache

Removing empy comment blocks

Eclipse can automate a lot of stuff for you. One thig is did for me was automating the serialVersionId generation for all the serializable classes in my tree.
They look like this:

    /**
     *
     */
     private static final long serialVersionUID = -9031744976450947933L;

However, it put an empty block comment in on top of them, something I didn’t notice until I had mixed in this commit with another. So, I want to remove those empty comment blocks.

#!/bin/bash

for JAVAFILE in `find . -name \*.java`
do
     sed -n '1h;1!H;${;g;s! */\*\*\n *\* *\n *\*/ *\n!!g;p;}' \
         < $JAVAFILE > $JAVAFILE.new
     mv $JAVAFILE.new $JAVAFILE
done

Thanks to this article for how to do the multiline search and replace.
http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/

Binary Java Runner

I’ve been recently thinking about the scripts that are used to run Java processes. Firing a script interpreter and running a slew of processes before running your program seems really inefficient. Additionally, programs meant to run on a Fedora system can make a few assumptions that general Java programs can’t.

Jar files should be in /usr/share/java or /usr/lib/java.We don’t want people putting files all over the place. Most of the scripts are doing classpath related operations.

Here is my Sample code

class Hello{
    public static void main(String[] args){
        for (int i =0 ; i < args.length; i++){
            System.out.println ("Hello "+args[i]);
        }
    }
}

Which I can compile into a Jar file pretty easily. I'll cll the jar hello.jar

int main()
{

char *const argv[] = {"/usr/bin/java","-cp","hello.jar","Hello"};
char *newenviron[] = { NULL };

execve(argv[0], argv,newenviron);
perror("execve");
return 0;
}

This serves as a starting point.

But, really, do we want to have to ship a Jar file and a separate executable? For command line utilities, the basic classes that it requires should be statically bound into the application. We can convert the Jar file into data an link it into the application:

ld -r -b binary -o ant-launcher.o /usr/share/java/hello.jar
cc     runjava.c hello.o   -o runjava

Now is where things get interesting.

First, we need a way to load this Embedded Jar file in as a class. As a proof of concept, the C program can read the data section and spit it out to a jar file.


const char * temp_file_template = "hello.jar.XXXXXX";
char * temp_file = "hello.jar.XXXXXX";

void build_jar(){
  temp_file = malloc(strlen(temp_file_template)+1);
  memset(temp_file, 0, strlen(temp_file_template)+1);
  strcat(temp_file,  temp_file_template);

  int retval  = mkstemp(temp_file);

  int fd = open(temp_file,
                O_WRONLY | O_CREAT |  O_TRUNC
                ,S_IRWXU  );
  if (fd < 0){
    perror("main");
    return;
  }
  ssize_t sz =
    write (
           fd,
           &_binary_hello_jar_start,
           (ssize_t)&_binary_hello_jar_size);

  if (sz < 0){
    perror("Write");
  }

  close(fd);
}



This produces the same output as if we had run it from the initial example.

But, this is wasteful. This example will litter a copy of the jar file into your directory. You can't delete it when you are done; exec does not return to your code.

So the next step is, instead of running using exec, run using JNI.

/*run the executable using an embedded JVM*/
void run_embedded(){
  JNIEnv          *env;
  JavaVM          *jvm;
  JavaVMInitArgs  vm_args;
  JavaVMOption    options[2];
  int             option_count;
  jint            res;
  jclass          main_class;
  jobject         javaGUI;
  jmethodID       main_method;
  jthrowable exception;
  jobjectArray ret;
  int i;

  const char * classpath_prefix = "-Djava.class.path=";
  int classpath_option_length =
    strlen(classpath_prefix) + strlen(temp_file) + 1;

  char * classpath_option =
    malloc(classpath_option_length);

  memset(classpath_option, 0, classpath_option_length);

  strcat(classpath_option, classpath_prefix);
  strcat(classpath_option, temp_file);

  option_count=1; //set to 2 for jni debugging
  options[0].optionString = classpath_option;

  options[1].optionString = "-verbose:jni";

  /* Specifies the JNIversion used */
  vm_args.version  = JNI_VERSION_1_6;
  vm_args.options  = options;
  vm_args.nOptions = option_count;
  /* JNI won'tcomplain about unrecognized options */
  vm_args.ignoreUnrecognized = JNI_TRUE;
  res = JNI_CreateJavaVM(&jvm,(void **)&env,&vm_args);
  free(classpath_option);

  if (res < 0 ){
    exit(res);
  }
  const char * classname =   "Hello";
  main_class   = (*env)->FindClass(env, classname);
  if (check_error(main_class,env)) {
      return;
    }

  main_method = (*env)->GetStaticMethodID(env,main_class,"main",
                               "([Ljava/lang/String;)V");

  if (check_error(main_method,env)) {
    return;
  }

  char *message[5]= { "first", "second", "third", "fourth", "fifth"};

  jstring blank = (*env)->NewStringUTF(env, "");

  jobjectArray arg_array = (jobjectArray)(*env)->NewObjectArray
    (env,5,
     (*env)->FindClass(env,"java/lang/String"),
     blank);
  if (check_error(arg_array,env)) {
    return;
  }

  for(i=0;i<5;i++) {
    jstring str = (*env)->NewStringUTF(env, message[i]);
    (*env)->SetObjectArrayElement
      ( env,arg_array,i,str);
  }

  jobject result =
    (*env)->CallStaticObjectMethod(env, main_class, main_method, arg_array );

  if (check_error(result,env)) {
    return;
  }
}

Note that we have now linked against a version of the JVM, and that commits us to a given JDK. This is just a proof-of-concept; This whole copy the Jar approach is just a stepping stone.

What we really want is a classloader which plays by the following rules:

  • lets the rt.jar classloader safely load the basic java, javax, and other JRE classes as a normal executable
  • makes sure that all the classes for our application get loaded from our specified jar file
  • reads our specified jar file directly out of the elf section of our binary executable
    • I appears to me that the standard classloading approach is fine for our needs with one exception: we can't treat an ELF file as if it were a Jar file. If we could do that, we could throw argv[0] on the front of the -Djava.class.path command line switch and be off and running. This would have the added benefit of creating a classloader extension that could be used from other Java programs as well. I'm currently thinking it should be of the form: file+elf://path-to-file/linked_jarfile_name

      as that format would allow us to link in multiple jar files.

      However, to be really efficient, we don't want to have to pay the price for the OS file operations (open, read) again, since the jar file is already loaded into memory. We should be able to create a protocol which tells the classloader: use this InputStream to load the class, and the JNI code can then pass the input stream.

Guice gets it

I’m currently working on Candlepin, a Java web service application that uses Google Guice for dependency injection. I’ve long suspected that I would like  Guice. I’m not a big fan of annotations, but I’ll admit that there are something that you can’t do in Java using other approaches.   Guice makes appropriate use of annotations to make dependency injection work in a type safe an intelligent manner.

I had to break a dependency in order to make a class testable.  Here’s the old interface:

@Inject
public JavascriptEnforcer(
     DateSource dateSource,
     RulesCurator rulesCurator,
     ProductServiceAdapter prodAdapter );

The problem with this is that RulesCurator is a database access class, which means I need a DB for my unit test.   What the JavascriptEnforcer specifically needs is the product of the database query.

Here’s a piece of code from the body of the constructor hat I also want to remove.

    
    ScriptEngineManager mgr = new ScriptEngineManager();
    jsEngine = mgr.getEngineByName("JavaScript");

This links us to one creation strategy.  I want to use a different one in  my Unit test.

My first step was to create a new constructor, and have all of the code I want to extract in the old constructor.  This would probably have been sufficient for the unit tests, although it might risk a class not found for the DB stuff.  Here’s the new constructor interface:

public JavascriptEnforcer(
    DateSource dateSource,
    Reader rulesReader,
    ProductServiceAdapter prodAdapter,
    ScriptEngine jsEngine) ;

Here’s where Guice shined.  I need two custom compnents here, one which fulfills the rulesReader dependency, the other which fulfils the jEngine.  The jsEngine one is easier, and I’ll show that.  First, create a custom Provider.  A Provider is a factory.  For the jsEngine, that factory just looks like this:

package org.fedoraproject.candlepin.guice;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import com.google.inject.Provider;
public class ScriptEngineProvider implements Provider<ScriptEngine> {
    public ScriptEngine get() {
        return new ScriptEngineManager()
            .getEngineByName("JavaScript");
    }
}

Which we then register with our custom module:

class DefaultConfig extends AbstractModule {
     @Override
     public void configure() {
     ...
        bind(ScriptEngine.class)
            .toProvider(ScriptEngineProvider.class);
    }
}

Now comes the interesting one.  We want a Reader.  But a Reader is a very common class, and we don’t want to force everything that needs a Reader to use the same creation strategy.  Here is where Guice uses Annoations.

public class RulesReaderProvider implements Provider<Reader> {
    private RulesCurator rulesCurator;
    @Inject
    public RulesReaderProvider(RulesCurator rulesCurator) {
        super();
        this.rulesCurator = rulesCurator;
    }
    public Reader get() {
        return new StringReader(rulesCurator.getRules().getRules());
    }
}

Note how the Provider itself is a component.  This allows us to use another component, the RulesCurator, without creating a direct dependency between the two classes.  Still, this does not distinguish one reader from another. That happens with another bind call.

public void configure() {
    ...
    bind(ScriptEngine.class)
        .toProvider(ScriptEngineProvider.class);
    bind(Reader.class)
        .annotatedWith(Names.named("RulesReader"))
        .toProvider(RulesReaderProvider.class);
}

Then, the inject Annotation for our Enforcer looks like this:

    
    @Inject
    public JavascriptEnforcer(
        DateSource dateSource, 
        @Named("RulesReader") Reader rulesReader, 
        [...] 
       ScriptEngine jsEngine)

The key here is that the @Named matches between the two components.

Could Maven use a single directory for archives.

Maven is too important a part of too many projects for most Java developers to ignore. However, some of the decisions made in building with Maven are suspect, mostly the blind download of binary files from a remote repository. While Maven gets more and more Open Source clean, there are still issues, and the biggest is building Maven itself. Both Debian and Fedora have fairly old versions of Maven, in the range of 2.0.7 as of this writing. Considering that the GA is 2.2.0 and There is work on 3.0, we risk a pretty serious divide in the Open Source Java world of we don’t keep up with Maven, and get a clean way to build it.

Continue reading