when I'm not skateboarding: December 2009

Wednesday, December 16, 2009

JavaOneRadio interview 2009

Was just going through some photos from earlier this year. Here's a shot of the interview I did on JavaOneRadio about location based services and my work at deCarta.

Sunday, December 13, 2009

Saturday, December 12, 2009

I was in one of Dave Nagle's lectures at Carnegie Mellon in which he explained that he'd once spent an inordinate amount of time on a bug hunt. The cause: 'a=1' ... a typo. He meant to type 'a==1'. To insure he'd never suffer that pain again, he coded all his boolean comparisons with the literal on the left hand side, like this: '1==a'. This prevents the typo '1=a' because the compiler won't allow an assignment to a literal.

After going on the exact same bug hunt myself, many years ago, I kicked myself for not following his advice. I've coded all my Boolean expressions in the "literal on the left" style ever since, and the compiler has saved me many a bug hunt.

To that paradigm I now have a small addition to make. I've started coding my string equals comparisons in literal-left style as follows: "something".equals(a)

This has the nice effect warding off null pointer exceptions when a is null. I think this literal left style is better than (a+"").equals("something") which is another trick for warding off null values of a.

Friday, December 11, 2009

Keep the hot air flowing

This is a funny satire of the climate change talks..."now powered exclusively by wind".

Saturday, December 05, 2009

Keep it on the DL

I came across what has got to be the most rarely used HTML element. Ever heard of <dl>? me neither. Stands for "Definition List". Inside a DL you put Definition Terms (DT) and inside a DT you put a Definition Description (DD). It actually is a very handy way to markup key/value pairs, such as "firstname: bob" or "Age:34". So if you've got some kind of Object and you want to markup its fields, give the DL a try.

Thursday, December 03, 2009

JTidy and handling HTML embedded in database strings

I've been dealing with an interesting issue. A customer's app needs to store documents created by end users. The document's are really just paragraphs created by the end user, using the YUI Rich Text/HTML editor. The user might also use the "plain text" editing mode, in which case they would actually enter tags like < h1> manually. With this freedom comes the inevitable possibility that they will enter invalid HTML. Since these documents will be stored in NextDB the user might also apply an XSLT transformation over the default HTML presentation of the result set.

The problem is that if the user enters invalid XHTML, like they just throw <blort&rt; into their document, then the XSLT will crash. The problem is made gnarlier by the fact that the documents themselves are likely to be fragments (and in fact they have to be fragments so that when they appear in the context of the default HTML presentation that they get treated as a valid portion of the overall document). In other words, if the user puts tags around their document, you have to strip those body tags off, so that the default HTML presentation doesn't have a nested body.

Enter JTidy. The JTidy library is all about handling crapped-up HTML, and fixing it on the fly. The journey to getting JTidy working was longer than I expected due to an unpleasant interaction with Maven. So, despite the fact that I had the most recent version of JTidy on my Netbeans's project's classpath, when I used Maven to startup Jetty using the mvn Jetty plugin, I would get runtime errors complaining that the method I was trying to call (tidy.setPrintBodyOnly(true);) didn't exist. So, like any good bughunt, the fub began. I knew that Tidy.class was somehow sneaking into my runtime classpath. The first place I looked was my local .m2 repository -- no joy. Finally occured to me that this must be an internal inclusion of an old Tidy jar by maven. When I ran 'grep -r Tidy.class' on my maven directory, I found that maven's 'uber jar' (maven-core-2.0.7-uber.jar) did in fact contain the older version of JTidy. Turns out if you look at the internal dependencies for Maven, you find that it depends on an old version of JTidy. So I unzipped the uber-jar, replaced all the JTidy classes with the latest and greatest, rezipped the jar, and ...badabing badaboom...problem solved. Bing bang boom, very good have a drink.

The actual coding took less than 2 minutes. Argghh talk about an 80/20 rule. Anyhow, JTidy does exactly what I want. Fix any crufted HTML that the user might enter, and then extract only the content of the body element (and even better, if the user doesn't include a body, JTidy fixes that first). So it all boils down to this (just to print the tidied content):

Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.setPrintBodyOnly(true);
tidy.parse(new ByteArrayInputStream((val+"").toString().getBytes()), System.out);

Epilogue: maybe blogger.com should start using Jquery. Ironically, this rich text editor threw up on some tags I typed into this post, and I had to spend a few minutes cleaning up the mess!!!

when I'm not skateboarding

Wednesday, December 16, 2009

JavaOneRadio interview 2009

Sunday, December 13, 2009

Nerds!

Saturday, December 12, 2009

a==1 or 1==a

Friday, December 11, 2009

Keep the hot air flowing

Saturday, December 05, 2009

Keep it on the DL

Thursday, December 03, 2009

JTidy and handling HTML embedded in database strings

Followers

Blog Archive

About Me

when I'm not skateboarding

Wednesday, December 16, 2009

JavaOneRadio interview 2009

Sunday, December 13, 2009

Nerds!

Saturday, December 12, 2009

a==1 or 1==a

Friday, December 11, 2009

Keep the hot air flowing

Saturday, December 05, 2009

Keep it on the DL

Thursday, December 03, 2009

JTidy and handling HTML embedded in database strings

Subscribe To

Followers

Blog Archive

About Me