Tuesday, February 24, 2009

O'Reilly Talks Open Publishing


O'Reilly Tools of Change 2009: Tim O'Reilly makes the argument for Open Publishing from Open Publishing Lab @ RIT on Vimeo.

Interview: Carl Malamud's Grassroots Campaign for Public Printer of the United States

I just published an interview with Carl Malamud. Malamud has been working in gov't transparency forever (since I was in middle school in the 1980s). Malamud is trying to follow in the footsteps of Augustus E. Giegengack who lobbied for his own appointment to the same office under the FDR administration. If you don't know about Carl Malamud's work, take a look at public.resource.org. He has an impressive record, and I believe he's the right guy for the job. Malamud as Public Printer of the United States would help bring real change to the Federal Government.

Listen to my Interview with Carl Malamud on O'Reilly Broadcast

Sunday, February 22, 2009

CJCOOK: 0.14: Bulk of Jakarta References Removed

This is a quick follow-up to release 0.13. Most of the references to Jakarta are now removed from the book. Some projects, like the now defunct Slide or the very inactive ORO, are still in Jakarta. Projects like Lucene and HttpClient which were both parts of Jakarta when the original book was published have been updated, and I'm pointing to the project sites for Apache Lucene and Apache HttpClient (or is it HttpComponents? See a later update.)

Read Commons Java Cookbook Release 0.14 Online

Next steps:

  1. Changing all internal link elements to xrefs and getting rid of hard-coded section references. For some reason, the XML I got back from O'Reilly has hard-coded text in link elements that reference section numbers. So if you see a reference to Section 1.4, click on it, and end up looking at Section 1.2, this is because I must have deleted a section. DocBook has the facility to generate XRef text at render time so I have to modify all the internal link elements to xrefs and set the label style.
  2. Write a Unit Test that compares the text of a ulink element with the target URL. If the text of a ulink begins with "http://", I need to make sure that the anchor tag is going to link to the exact same URL. There are already a few examples in the book where there are inconsistencies between the URL that is printed and the URL that is being linked to.
  3. Write a Unit Test that tests all ulinks for 404s. This is important for consistency, I need to figure out a way to do this automatically for other books as well, so I'm thinking about a way to do this in a Maven plugin.

Long Term:

I can't tell if the hc.apache.org project has actually released 4.0 or if we're stuck in a perpetual beta. I don't want to update the book to talk about httpcore until I'm sure it isn't going to change. I do know that I have to circle back at some point and:

  1. Update the Http/DAV chapter: Change it to reference the new Http Components Project
  2. Update the Http/DAV chapter: Change the sections that discuss Jakarta Slide to reference Apache Jackrabbit
  3. Update the Lucene Samples to use the Latest Release: Right now this part of the book references version 1.9.1 because there was an API change wrt configuring the Document.

Issue Tracker

I need an issue tracker for this project, and I can't decide between Trac and JIRA. On the one hand, I like Trac's simplicity, but, on the other hand, I'm pretty sure I could benefit from some of the subtask stuff that is available in Jira. Decisions, decisions, decisions....

Saturday, February 21, 2009

CJCOOK: Removing References to Jakarta (up to ch5)

If you haven't noticed, Jakarta was dismantled. It now contains a shell of projects which are largely inactive. The community which was Jakarta once contained hundreds of committers and was the center of open source Java. It was Jakarta that produced Ant, Maven, Struts, Log4J, Lucene... among others. While Jakarta was an interesting crucible for innovation, it didn't scale very well and there were an endless series of flamewars and management problems due to the fact that it was just too large for its own good. Sometime in 2003 or 2004, a decision was made to split Jakarta into separate TLPs and Commons eventually moved to Apache Commons (http://commons.apache.org).

Release 0.13 of Common Java Cookbook updates everything up to Chapter 5, removing references to "Jakarta Commons" components in favor of "Apache Commons". Where a project was called "Jakarta Commons Collections" it is now called "Commons Collections".

Next step: I'll follow this release up with a 0.14 that removes all of the references to Jakarta XYZ.

CJCOOK: Updated All Component Versions

I did a quick pass to Common Java Cookbook to update some of the version numbers. Current release version is now 0.12, and you can expect a 0.13 release on Monday that is going to remove most of the references to "Jakarta". This book uses the following versions of components:

Component Version Notes
Commons Beanutils1.8.0
Commons Collections3.2.1
Commons Digester1.8
Commons HttpClient3.1Will update to 4 as soon as the HttpCore stuff is released
Commons JEXL1.1
Commons JXPath1.3
Commons Lang2.4
Commons Logging1.0.4
Log4J1.2.15
Commons CLI1.1
Commons Configuration1.6
Commons IO1.4
Commons Math1.2
Commons Net2.0
Velocity1.6.1
Slide2.1We're replacing this with Jackrabbit
Freemarker2.3.15
Commons Betwixt0.8
Lucene1.9.1We need to upgrade this item.
Component Version Notes

Friday, February 20, 2009

Common Java Cookbook: 0.11: Updated Lang and App Infra Component Versions

Chapter 1 and Chapter 7: I updated the sections that tell you how to download the components and changed them to include the dependency XML for a pom.xml. (This is in line with my agenda to push traffic to Maven: The Definitive Guide... :-) )

Open Source Writing: Part I: A Few Problems with Publishing...

If you are just tuning in, Common Java Cookbook is an experiment in transparent, open writing. I'm trying to develop this book and make frequent releases every one to three days. The idea behind this book is that open source writing should be no different than open source software. This is the first post in series that explores some of the reasons why I've decided to commit myself to open, transparent writing. This post focuses on the problem. What is wrong with the current approach to computer "books"? What is wrong with the current relationship between the author and the publisher? This post focused on some of the problems with the current approach to books about computer programming.

Problem: Driven by the Physical Artifact

While most writing projects are governed by the limitations of the book as a physical artifact, books like Maven: The Definitive Guide and Common Java Cookbook choose to fully embrace the idea that a book is an electronic documentation unaffected by the constraints introduced by the printing process. Most programming books you encounter today have to have a practical deadline after which no changes are introduced. In other words, if you are writing a book that needs to be printed in lots of five thousand and shipped to book stores, your process is always affected by the idea of the book as a static, physical object. You have to "finish" the book by a set deadline. Updating and radical changes to a book which has already been printed tend to decrease book, and (quite often) the original authors retain no rights for redistribution online. This attachment to the physical object is driven by the economic realities of the publishing industry, but it creates an odd situation when you are writing about a rapidly moving open source project. There is a large disconnect between how we develop open source software and how we write books about open source software. Successful open source projects usually don't have a set release date, software like Maven is released when it is ready. Imagine how awful open source would be if everyone had to run around like headless chickens to cut a CD for something like Apache HTTPD. Imagine if a Maven release vote were predicated by "People, if we don't send the Maven ZIP file to the CD factory by next week, they might cancel our contract. Can I get three +1 votes, now." It just seems odd that we have to dance around publisher deadlines when we are writing books about collaborative, unpredictable, schedule-less open source projects.

Problem: Deteriorating Economic Model

Take, as an example, the Jakarta Commons Cookbook. I wrote this book between 2002 and 2003, and I probably invested about an entire year in the effort. It was my first book, so progress was very, very slow. The book was published, I felt great about the process. I think every first-time author has this initial excitement about having published a book. I didn't write the book for acclaim, I wrote it because it was my way of giving back to the community. A year passes, and you get the sales figures back and you, the naive author, are impressed that five thousand people bought the book. You get a flood of email from people who have read the book, maybe 10% are fuming mad at typos and the other 90% is just happy to have read the book. The publisher has a totally different view, 5,000 copies is actually viewed as a quarter success, the publisher would have liked to sell 10,000. While you feel great about the idea of a community of 5,000, the publisher is lukewarm about the idea of printing a second edition. Right right right, 5,000 is a loser? Visualize 5,000 people in a line all holding $20.... If that's a failure, if that doesn't justify a second printing, then something is wrong with the model. These days, publishers don't like to commit to books that are not going to move a significant number of copies. It is becoming more and more difficult to sell a good book to a publisher because as the open source world continues to evolve every topic becomes a niche topic with a limited audience.

Problem: Where's my community....

When you sell 5,000 copies of a book, you certainly get feedback both good and bad... But, you don't get the customer relationships. You don't get a chance to interact, and you certainly don't establish any sort of persistent HTTP 1.1 connection with your readership. Publishers provide some tools to enable this support: forums, blogs, etc. If you've grown used to the "intimacy" and unstructured creative anarchy of open source communities, you'll feel a bit stifled. Efforts like Jono Bacon's The Art of Community are an attempt to address this, and publishers like Pragmatic have done a good job of creating that sense of community... But, as an author, you will want to either create that community yourself or (better yet) integrate that community with the community that has already developed around the project you are supporting. Publishers serve an important curation function they provide the necessary work to ensure that the book meets production standards has come to be expected in a book, but they often don't do a great job organizing a community. Just like an open source project manages software production, I think authors and open source projects should manage a community of readers. Publishers used to be a necessary intermediary, but as the importance of the book as a physical artifact continues to decrease, I think we're going to see authors take more initiative and publish works online.

Tuesday, February 17, 2009

Common Java Cookbook: Release 0.10: Book Examples ZIP

Release 0.10 - the book's example project is now published as a ZIP file. There are some missing resource files that I will try to address in the next few releases, but all of the Java Source code that accompanies the book is now available as a Maven 2 project. (For more information about Maven 2, see Maven: The Definitive Guide).
  1. Go to the Online Common Java Cookbook.
  2. Click on "Download Book Examples" in the upper right-hand of the page header
Unless someone gives me a good reason to, I'm not going to publish a "tar.gz" or "tar.bz2" in addition to the "zip" archive. Every machine I use, regardless of OS has some utility that can unzip a ZIP archive: Linux, OSX, Windows. Does anyone have a compelling reason for me to publish a bunch of different archive formats? What happened to release 0.9?, you ask. Nothing, I mistakenly deployed the 0.9 release only to realize that I hadn't added the examples file. Since Maven makes it so easy to cut another release, I simply reran the release process to fix the issue.

Monday, February 16, 2009

Common Java Cookbook: Release 0.8: Working External Links

Alright, two releases in a day. Earlier this evening I published version 0.7 which made sure that we have a reasonable set of public URLs and consistent internal links. Now, I'm unleashing version 0.8 which enables external links. For some reason, the DocBook O'Reilly sent me had used systemitem instead of ulink. This meant that URLs like http://jakarta.apache.org were simply being printed on the page without a link to the URL being displayed. Nevermind the fact that some of these URLs no longer exist, we're slowly getting the book to a baseline from which the content will start to rapidly evolve into something new. Read Common Java Cookbook Online Next steps: Not sure, either I'm going to figure out how to olink to Maven: The Definitive Guide or I'm going to start removing references to Jakarta. We'll see what tomorrow brings.

Common Java Cookbook: Release 0.7: Fixed ID References

Alright, this is a very simple, technical release that fixes ID references in the book. I was working with the pre-print DocBook XML from O'Reilly which contained some unreadable IDs for sections. A section about reversing a String with StringUtils had the ID "jakarta-ckbk-CH-1-SECT-6", and a chapter on Commons Collections would have the ID "jakarta-ckbk-CH-4". I'm a big believer in making the IDs of sections human readable, and I'm also against encoding section numbers into the ID for a section. In this release I assigned more reasonable identifiers to chapters and sections. Read the Common Java Cookbook Online This also has the side-effect of making the URLs for the book more comprehensible for users (not to mention analytics). IMO, URLs should give some idea of content: "/books/cjcook/reference/jakarta-ckbk-CH-5-SECT-15.html" seems useless to me, where "/books/cjcook/reference/functors-sect-iterating.html" seems more reasonable. One of the lessons I learned from the Maven book was that you should try to solidify your URLs early on in the process and introduce a few changes as possible. A common naming strategy for sections and chapters is going to make it easy to craft rewrite rules as structural changes are introduced to the text (and, trust me, structual changes are going to happen to this book). Really, I'm trying to avoid ever having to figure out what an ID like "jakarta-ckbk-CH-5-SECT-15" means and then, worse, trying to craft a meaningful rewrite rule when Chapters are rearranged.
Common Java Cookbook: 0.7: Consistent Internal ID References from Tim O'Brien on Vimeo.

Common Java Cookbook Arrives

Hello, I'm updating the Jakarta Commons Cookbook, but it isn't as simple as a refresh of the content. I'm rereleasing the content as an open book and I'm going to be expanding the boundaries of this book over the next few months. Go ahead, read Common Java Cookbook.
Common Java Cookbook: Hello World from Tim O'Brien on Vimeo. Stay tuned.

Tuesday, February 10, 2009

Open Source Application Ideas

I floated an idea to Mike Loukides a few weeks back of people posting ideas for applications and systems in a public forum - An Open Idea Exchange. Ideas that they would like to see implemented or that they themselves are considering as possibly interesting projects. Maybe there would be a site somewhere called "IdeaExchange" (what an awful name), but the idea is that there is some sort of social site where people communicate and collaborate on a set of open-source "ideas". Whether the same set of people would be actively involved in creating an open source implementation of an idea, who knows? Maybe the site would be limited to the development of ideas, that's it. If some entrepreneur wanted to come along and take one of these ideas of the shelf, implement it, making a "bazillion" dollars in the process.... great. The goal of the site isn't to make money, the goal of the site is to develop ideas for applications in a transparent and open forum. "Open Source Application Ideas".

A Reaction to "Startup Culture"

It still doesn't make any sense to you, does it? I'll take a step back and describe the problem I've noticed with "startup culture". Part of the problem with the "startup culture" in the America is that it revolves around the idea of "protected" intellectual property. Companies and individuals never have an incentive to share an idea with a community and develop ideas in a way that would benefit from the collective input of a group of users. All too often, a great idea is hatched in someone's garage in Mountain View, this idea is kept secret, maybe the idea is developed a bit and turns into the seed of a company. More often than not, these ideas decay into nothing more than isolated bursts of genius. Maybe an engineer has a great idea for a game changing way to index and search images, but they don't have enough time to take a few weeks off from work to implement the idea and nothing comes of the idea. Very rarely, a good idea turns into some poorly named startup with more money than is reasonable and no rational path toward profitability. This is the problem I'm noticing, too many times I've seen great ideas in the form of over-hyped, all-but-doomed, TechCrunch-nominated startups and I've wondered if there is a better way for a community of users and participants to come up with good ideas for applications.

Open Source Beyond Software

I'd like to find a way to start a community of developers and engineers who have little interest other than seeing good ideas realized without getting distracted by commerce or commercialization. Does anyone else want to see something like this? A place where people can discuss and develop ideas for the applications they want to see implemented? I understand that this is a Utopian and somewhat idealistic idea, but I'll be damned if I'm going to sit around and wait for a collection of poorly named Silicon Valley startups take good ideas and turn them into profitless, proprietary solutions that make someone else a ton of money between the time they get an initial investment of capital and the time to invariably fail.