Thursday, August 20, 2009

Self-publishing experiments (Lulu vs. BookSurge)

Jury is still out, I've decided to do some self-publishing experiments so I can get a sense of what is out there. I've uploaded a book to Lulu (I'm not telling you which one, and it is still a private book, so you can't buy it.) Some initial reactions...

  • It is very affordable to self-publish - I'm surprised at how much it costs to print a book wholesale. In fact, if you are not concerned about turn around time, uploading a PDF to Lulu and ordering a one-off copy of your own book is about half the cost of printing a document directly to Kinkos.
  • Both Lulu and Booksurge allow you to use your own ISBNs, if I do start using either service for distribution, I'm assuming that it is preferable to have your own ISBNs.... we'll see.
  • Signing up for BookSurge generated an almost immediate call from a sales agent, where Lulu is all about self-service. Also BookSurge looks like it has more of an upfront, sign-up fee. I'm going to try both services over the next year, but I think I'll start with Lulu.
  • I was looking for a 7" x 9" format which is about one inch wider than US Trade. Lulu doesn't offer this size, the best fit I could get was Royal Quarto @ 7.444" x 9.681". I guess that will do, but I'll have to figure out how that plays out. The problem with US Trade size is that the 6" width is going to mean that I have to trim code examples to less than 80 character-width. We'll see....

Monday, August 10, 2009

New Version of the Maven Definitive Guide

Edition 0.7 is out. You can read it on Scribd or online at the Sonatype site. This edition saw a marked improvement in the rendering of the PDF version of the book with the move to the newer docbkx plugin and the integration of the fop-images-pdf library to allow direct embedding of PDF vector art as figures.

Maven: The Definitive Guide

Friday, August 7, 2009

couchdb4j is a Case in Point for Git and Maven

Yesterday, I decided it was time to test out some ideas about storing content in CouchDB. I just wanted to get some preliminary numbers on performance, but I also wanted to see how the thing would scale after loading 10 GB worth of data. So, I went about this by...

  1. Downloading couchdbx which is a one-click distribution of CouchDB for Mac OSX - Once you download this distribution, CouchDB loads as an application, runs on the default port. Downloading couchdbx, unpacking the archive, copying the app to Applications, and then running it. Within 2 minutes, you've got a running instance of CouchDB.
  2. Searching for a good Java library... - Right, so even though CouchDB is really all about HTTP and JSON, I still need to find an easy way to call CouchDB from a Java program. I settled on couchdb4j, it has a simple API, and it does everything I need it do.

At this point, I noticed that couchdb4j has a Git repository at: http://github.com/mbreese/couchdb4j. I clone the repository to my local system, I run "mvn clean install"... the tests fail. After some investigation, I realize that the tests are failing because I happen to be running the latest release of CouchDB 0.9.1, and couchdb4j only works with 0.8.

Point One: I didn't have go fishing around the source code to figure out how to build this sucker. It just worked. Even though the tests fail, I'm up and running in a few seconds. The presence of a pom.xml file is a signal that I don't have to spend time rifling through someone's custom build. And, I now know that I'll be able to make changes to the code easily using m2eclipse.

Once I figure out that I'm probably going to have to get into the source code to make some changes to bring this thing up to speed with CouchDB. I can easily fork the couchdb4j project in GitHub, and create my own fork at http://github.com/tobrien/couchdb4j/tree/master.

Because the CouchDB project maintains simple API documentation on a Wiki, it is easy to make the appropriate changes to couchdb4j to bring it up to speed with the changes in the CouchDB View API and the CouchDB Document API. I know have a fork of the couchdb4j that is 0.9.1 compatible, if other people want to use my changes they can freely clone my repo, or they can pull in the specific changes I made. I haven't made a pull request, because I don't know if the changes I made are aligned with the interest of the couchdb4j project.

Point Two: Git made it easier for me to make an instant decision to fork and customize to satisfy my own requirements. I didn't have to stop and figure out what community dynamics. Because Github makes it so easy to fork and existing repository, I didn't have to ask permission or chime in with "Hey, is anyone interested in XYZ." I scratched my own itch, and if the person who maintains the original project finds my changes interesting, he or she can pull them into his own.

Within about an hour, I have an experiment with a customized version of couchdb4j that has been upgrade to 0.9.1 compatibility. Because couchdb4j followed the Maven conventions and because they decided to use Git, it was easy.

A "real" letter and an old book

As a co-author of Harnessing Hibernate, I get to hear about some of the feedback the book has received from time to time. James recently shared a letter he received about the book, a real "letter": signed, sealed, and delivered by the United States Postal Service. How archaic? Someone decided to read a computer book, type up a letter, and mail a physical artifact. My initial reaction was, "what kind of crazy jackass would bother with the USPS when our email addresses are listed in the..." But, after some reflection, it certainly worked; it gained someone's attention more so than a simple email. A real letter is much more difficult to ignore than something that shows up in my busy Gmail INBOX, and you have to consider that the person who sent this letter is more invested in his feedback than someone who fires off a random email.

Emails are easy to ignore, and the promises made in an electronic format fade quickly. Committing ink to paper is something else entirely, and you tend to take printed material more seriously than you would an electronic message. Printed letters are much more interesting, and they retain powerful lasting value that emails do not... see this blog post about letters from the recently deceased John Hughes. As effective as it was at gaining my attention, I can't help but return to my initial reaction. Is this someone with an interesting archaic quirk? Or, is this some email-avoiding conspiracy theorist?

One of the pieces of feedback in the letter was that the book was out of date. Of course the book is out of date, bulk-printing 10,000 copies of a 400 page book about a rapidly evolving open source project is certainly the definition of crazy. Unlike other books I've helped to write, Harnessing Hibernate is not free. It doesn't have a life outside of the printed product, and it is getting more and more irrelevant every single day. Harnessing Hibernate is a lost opportunity, if it were an open title, it would quickly gain an audience and a community willing to help develop the content.

Thursday, August 6, 2009

The Next Generation of Health is Genomics-driven Prevention

Steven Pinker has cardiac hypertrophy (or Ventricular Hypertrophy also known as althlete's heart). On it's own this is a wholly uninteresting fact to most people, it isn't something you'd bring up in normal conversation, "Oh, hey did you know that the esteemed cognitive scientist Steven Pinker has athlete's heart?"

What makes this revelation remarkable is the fact that he learned this via the Personal Genome Project and that you can watch him finding out about the diagnosis (at 0:40). The initial PGP-10 took a risk when they decided to publish "anonymized" health data and gene sequences as part of this research, they must have all known that everyone in that group of 10 was guaranteed to lose any notion of health privacy, but they've taken this bold step to help usher in an era of scientific and medical research that will revolutionize the way we all approach health.

In the same way that individuals like Pinker and Brin learn of genetic predisposition to serious illness, many millions of Americans will learn of similar markers over the next decade. The cost of sequencing a human genome is dropping at a factor of 10x per year, at a current cost of anywhere between $25k and $100k depending on what technology you use. In about two years, the cost of sequencing your entire genome will be on par with the cost of a simple medical test, and that's when the discussion about health care is going to get interesting.

When the cost of sequencing a human's DNA is on par with the cost of a simple blood test, it is going to be difficult to justify not performing this routine "test". We're going to have to have real discussions about the meaning of privacy and ownership (Who owns your sequence?) and we're going to have to answer some fairly unexpected ethical questions. The most important questions we're going to have to answer in the next two years:

  • What would you do with millions of individual sequences?
  • Once you've realize what the benefits of aggregate sequencing are, does the individual's "ownership" of individual DNA sequence data outweigh the benefit to society as a whole?
  • How will knowledge of genetic predisposition affect the individual in a society that is currently focused on the cost of health care?

It'll be interesting, but the one thing to keep in mind is something that Wally Gilbert told me at last year's Scifoo. After having watched some of the discussions between Dr. Fire and Dr. Church, I wanted to get Gilbert's take on the emerging ethical questions of sequencing. His first reaction was to mention the controversy that accompanied his initial success at sequencing, but he cautioned that the general perception of DNA as "predestiny" is not at all accurate. Just because someone has a marker for a serious disease does not at all mean that they will develop the disease.

This is a concept that society will have to grok quickly, because Science is moving faster than most realize.

Unintentionally stirring a bee's nest (by suggesting Maven)

I just had an odd exchange, someone has a great piece of open source software that I totally depend upon, it's a complex beast of a thing, and I wanted to a.) express gratitude, b.) offer some help with the build. You see the build for this particular system is an Ant build script with a preamble of instructions, and the project itself is this megalith of code in one big src/ directory. Every time I want to use some new component that is in development, I have to download someone's tarball, uncompress the thing and then fish around for JAR artifacts to upload to central. Some of the JARs that are included have specific Subversion revision numbers in the JAR file name. This makes using the binary artifacts from this particular project a royal pain in the neck.

I follow the project, I'm invested in the code, I thought it might make sense to *ask* if there was *any* interest in migrating the build to Maven on the grounds that it might make it easier for people to contribute and participate. Now note, I didn't volunteer to switch the build the Maven, I simply "asked if there was any interest", and not even on a development list. I asked one of the main contributors directly because I didn't want to ruffle feathers.

What I got in return was this total tirade against Maven. How it isn't flexible enough, how this particular person wanted to "send an invoice" to whoever was responsible for Maven because he had wasted so much time on it. Ending with the quote: "If our not using Maven as a build system is a problem for those who do then it's not our problem but the problem of Maven for not being flexible enough." In other words, my very diplomatic inquiry was met with "#$@! off".

Not "it's been a while since I've looked at Maven, here are the problems I had, if you can get it working alongside this Ant build, be my guest...". Or even, "No, I'm not interested in that. I had some problems in the past, and I don't think it makes sense to distract from the current development." I didn't even get a chance to make an argument on the "merits". So here it is...

The Argument

  1. If I can't go to your website and figure out how to checkout the source code and build in 5 minutes, your project is a pain in the neck to contribute to. The casual contributor has no incentive to learn how your build system works.
  2. If, in order to use your library, I have to go download some archive, unpack it and then futz around with JAR artifacts, your project is a PITA to use.
  3. I don't even care if you use Maven, all I really care about is that you publish regular SNAPSHOT builds to a repository. When I see some development release of a JAR that is wrapped in a tarball, contains a README file, and it bundles with other JARs, I end up having to upload all this noise to a repository manager crafting my own groupIds out of thin air.
  4. Yes, I understand, you hate Maven because it called you a bad name two years ago, and you didn't know anyone qualified at the time to help you. If you had problems in the past, it was probably because you didn't understand the tool. No offense. I can probably help out there.

If you disagree with some of the assumptions, that's another matter.

Tuesday, August 4, 2009

More Musing on the Maven User List

Very often someone shows up on the Maven User list with a question along the lines of: How can I get Maven to compile my project using dependencies? Your help is very much appreciated." I usually translate this into: "I'm a student who has just been asked to use Maven and I want you to do my homework." Or worse, "My boss wants me to learn Maven, will you learn it for me and after you are done send me detailed instructions. I am too lazy to read the free book." These emails are obvious, you can see them a mile away... very often they go ignored for days and days. Everyone who sticks around this list knows that you just don't engage these people, it just encourages them to come back with more questions that could have been answered with a simple use of Google.

Monday, August 3, 2009

Maven needs more opinion...

When I hear that someone has blogged about some general Maven hatred, I cringe and expect to read a post that consists of 30% incorrect assumptions about how Maven should be used, 50% ignorance of the most basic concepts, and 20% truth. What can be done:
  • The Maven Users lists needs to become a bit more opinionated for first time users. If someone enters into the discussion asking the following question:

    "I'm attempting to publish a directory full of JARs to my local repository using the Install plugin."


    The first reaction should be, "No, use a repository manager." Not, "Let me find seventeen ways to help you do a series of backflips to get Maven to do something it was never intended to do.". Maven isn't the general Swiss-army knife tool that many approach it as. While it *can* be made to do anything you want it to do, there are core assumptions that shouldn't be challenged. Half of the criticism we deal with is from people that were never told: "Don't try this, don't try to use Maven for this."

  • The Maven community needs a better FAQ. I'm of the opinion that the lack of a good FAQ is directly related to the difficulty of the APT format. If Maven had a better FAQ, we'd have less people approaching it with the wrong assumptions.