Could Maven use a single directory for archives.

Maven is too important a part of too many projects for most Java developers to ignore. However, some of the decisions made in building with Maven are suspect, mostly the blind download of binary files from a remote repository. While Maven gets more and more Open Source clean, there are still issues, and the biggest is building Maven itself. Both Debian and Fedora have fairly old versions of Maven, in the range of 2.0.7 as of this writing. Considering that the GA is 2.2.0 and There is work on 3.0, we risk a pretty serious divide in the Open Source Java world of we don’t keep up with Maven, and get a clean way to build it.

Deepak Bhole has started an project under Fedora to upgrade Maven.  The RPM he provides uses the Maven repositories to download the dependencies.  This is an interim solution, but not a long term viable approach.

Maven and RPM have a different view of the world.  RPM , and YUM, come from the mindset that there is one good version of a package installed on the system.  (the Linux kernel gets an explicit exception, due to the trickiness of recovering from a bad Kernel upgrade.)  Maven allows there to be many versions of a package, as different projects might be built against different versions of the same library.  However, for our purposes, we want to use the RPM version of the libraries, to maintain the audit trail of sources used to build Maven, as well as any RPMs that will then require Maven.

Maven stores individual packages in a sub directory with a name that is comparable to the java packaging scheme.  For example, a version of beanshell lives in ~/.m2/repository/org/beanshell/bsh/2.0b4/bsh-2.0b4.jar.  Thus, looking for a jar file in the repository is quite time consuming, as you have to search through all of the sub directories.  I wondered if it was required.

[ayoung@ayoung repository]$ find . -name \*.jar | wc -l

3211

From this we see that there are 3211 distinct jar files inside the repository in my home directory.

[ayoung@ayoung repository]$ find . -name \*.jar | sed ‘s!.*/!!g’ | sort -u | wc -l
3192

The sed line strips everything put the filename from the path.  So there are only 19 Java archive files that are not completely unique in the whole tree.  They are:

find ~/.m2 -name \*.jar | sed ‘s!.*/!!g’ | sort | \

awk ‘   BEGIN {OLD=”L”} $1 == OLD {print OLD} {OLD = $1}’
asm-3.1.jar
bcel-5.1.jar
bsf-2.4.0.jar
bsh-1.3.0.jar
commons-beanutils-1.7.0.jar
commons-codec-1.4.jar
commons-collections-3.1.jar
commons-httpclient-2.0.2.jar
commons-httpclient-3.0.1.jar
commons-logging-1.1.1.jar
concurrent-1.3.4.jar
jaxws-rt-2.1.3.jar
jdom-1.0.jar
log4j-1.2.14.jar
log4j-1.2.14-sources.jar
rhq-pluginAnnotations-1.4.0-SNAPSHOT.jar
rhq-pluginGen-1.4.0-SNAPSHOT.jar
rhq-pluginGen-1.4.0-SNAPSHOT-jar-with-dependencies.jar
xercesImpl-2.9.1.jar

Interesting to note that several of these are ones that are needed by Maven.  Still, if the two versions are identical, we have no problem.  We can extend the check with:

for JAR in `find . -name \*.jar | sed ‘s!.*/!!g’ | sort | awk ‘   BEGIN {OLD=”L”} $1 == OLD {print OLD} {OLD = $1}’ ` ; do COUNT=`find . -name $JAR | xargs md5sum | cut -d’ ‘ -f1  | sort -u | wc -l` ;  if [ $COUNT -gt 1 ] ; then echo $JAR $COUNT ; fi  ; done
bcel-5.1.jar 2
bsf-2.4.0.jar 2
commons-beanutils-1.7.0.jar 2
commons-httpclient-3.0.1.jar 2
commons-logging-1.1.1.jar 2
concurrent-1.3.4.jar 2
jdom-1.0.jar 2
log4j-1.2.14-sources.jar 2

I’ve hand deleted the RHQ ones, which I know are due to my coding efforts.  Of the Eight remaining, it is fairly safe to assume that the differences are an artifact of how the jars were assembled, most like merely meta-data changes, but possible binary differences in the file as well.

This shows two things.  First, that Maven can give you different results for the same jar dependency.  Second, that for the vast majority of jar files, they have unique names and could easily be stuffed into a single directory like JPackage proposes.

How about the pom files?

find . -name \*pom | wc -l
3101

[ayoung@ayoung repository]$ find . -name \*.pom | sed ‘s!.*/!!g’ | sort | awk ‘   BEGIN {OLD=”L”} $1 == OLD {print OLD} {OLD = $1}’
bcel-5.1.pom
bsf-2.4.0.pom
bsh-1.3.0.pom
commons-httpclient-2.0.2.pom
commons-httpclient-3.0.1.pom
concurrent-1.3.4.pom
dtdparser-1.21.pom
jaxws-rt-2.1.3.pom
jstl-1.1.2.pom
log4j-1.2.14.pom
servlet-api-2.4.pom
xercesImpl-2.9.1.pom

This is mostly, but not completely, the same set as the Java archives.  It is interesting to note that these are mostly very popular JARs, ones that are used in many projects.  If there are YUM versions of these, we would want to use the canonical Yum versions anyway.  Doing a little more shell magic to pipe to yum search shows that there are versions of these files available, albeit not necessarily these exact versions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.