Sunday, August 27, 2006

Questions from the Wicket world

Here are some answers to questions that came up on the Wicket user list, about using Shades with the Wicket "library example"

1) how to do concise queries:

Usually the query is defined in the ORMDictionary. In
the example I showed, I defined the query on-the-fly.
Most of the time you would just grab a query from the
dicitonary. So it's this concise:

dbSess.setParameter("author", "william gibson");
dbSess.executeQuery(conn, dict.getQuery("q-by-auth"))

2) how to do arbitrarily complex joins:

There is no limit on how deep or complex the joins can
be with Shades. Here is an example where we setup a
query to find all books published by whatever
publisher publishes the works of William Gibson.
(again bear in mind that usually we would just grab
this query from the dictionary, I am only defining it
inline to be illustrative).

aBook.relatedTo(anAuthor,"book->author");
aBook.where("AUTHOR like 'william gibson');
aBook.relatedTo(aPublisher,"book<->publisher");
anotherBook.relatedTo(aPublisher, "book<->publisher");
query.setFetchGroups(anotherBook);
dbSess.excecuteQuery(conn, q);

3) performance

Ahhh, I am especially stoked on this. Shades uses
batch updates and the other tricks common to ORM's. I
learned most of these tricks doing JDOMax. I think the
biggest reason shades has good performance is because
the codebase is so tight. When it comes to defining
queries on the fly, shades will be better than any
language that compiles a query (for example JDOQL),
because Shades bypasses the parse phase, and the
translation of the AST from JDOQL to SQL. Shades does
query caching like any other ORM.

Saturday, August 26, 2006

I'm a lover not a fighter

I've been asked, rightly so, if Shades is 'better' than Hibernate or JDO implementations or EJB 3. The answer is no. It is not better. But it is different. It's also new, so it doubtless has too many bugs to even dream of playing in the same spaces as the excellent products like Kodo, Cocobase, and others, that implement a variety of standards.

What I want to say, is that I didn't write Shades for anyone else's benefit. I wrote Shades because software, for me, is fun to write. It relaxes my mind after a long day at work, and let's me play with puzzles and art all rolled into one. It's not a competition, especially since I don't get paid!

Queries

Shades has an interesting way of doing queries. It's based on the idea of a RecordCandidate. I find it much easier to explain this using code than with a paragraph.

The following code returns all the Books, in a List;

ORMDictionary dict = MyORMDict.getInstance();
Query query = QueryFactory.newQuery(dict);
RecordCandidate aBook = query.candidate(dict.getORM("BOOK"), "aBook");
RecordSet rs = dbSession.executeQuery(jdbcConn, query);
List books = new ArrayList();
rs.populateList(books, Book.class);

There are several cool things:
1) The books got put into MY list (not a proxy List created by the data access framework).
2) There is no query language. Shades uses a new form of query-by-example. In the example
above the query is told what to retrieve by requesting a candidate.
3) I passed the jdbcConnection into the query (This means I control transactions using the JDBC transaction model).

Why did we request a RecordCandidate if we did didn't use it for anything. Well OK then, let's use it for something. In the query below we retrieve only the books authored by william gibson.

ORMDictionary dict = MyORMDict.getInstance();
Query query = QueryFactory.newQuery(dict);
RecordCandidate aBook = query.candidate(dict.getORM("BOOK"), "aBook");
RecordCandidate anAuthor = query.candidate(dict.getORM("AUTHOR"), "anAuthor");
anAuthor.where("NAME like 'william Gibson'");
aBook.relatedTo(anAuthor, "book->author");
RecordSet rs = dbSession.executeQuery(jdbcConn, query);
List books = new ArrayList();
rs.populateList(books, Book.class);

Shades let you bust into a query and insert your own SQL. You can see that on the line above that says "anAuthor.where...""
Dangerous? maybe, but fear not. shades encourages you to encapsulate your queries inside the ORMDictionary, and to parameterize them. Once yo've done that, there is no hint of SQL in the code, which subsequently looks like this:

ORMDictionary dict = MyORMDict.getInstance();
dbSession.setParameter("authorName", "william gibson");
RecordSet rs = dbSession.executeQuery(jdbcConn, dict.getQuery("query:book-by-auth"));
List books = new ArrayList();
rs.populateList(books, Book.class);


Here is another very cool thing about Shades queries. They, by default, return ALL the candidates that participate in the relationship graph. So in fact, you can retrieve the Book AND its author from the RecordSet, like this:

Book book = new Book();
Author author = new Author();
while(rs.hasNext()){
rs.populate(book, aBook);
rs.populate(author,anAuthor );
System.out.println(book.title +" was authored by " + author.name);
}

Monday, August 21, 2006

cool things about shades

One of the most confusing aspects of persistence frameworks is persistant identity. Equals and hashcode typically must be implemented so that Object equalty amounts to a comparison between identity columns of the persistent instance. Since datastore identity is often implemented using autoincrementing primary keys, this leads to a framework dilema: to expose datastore identity in the pojo, or not. Some persistence frameworks force the pojo to hold onto the PK. Other frameworks use bytecode instrumentation to make it transparent.

The situation is arguably simpler for "Object Identity". It is assumed that "Object Identity", as opposed to datastore identity, means that a unique set of column values is mirrored in the pojo's fields. So the pojo naturally posseses as set of fields, that taken together uniquely identify the record in the datastore. Here we again find ourselves bitten by the false equivalence between a pojo in memory, and a record in the database. Remember, equals and hashcode must be overridden to use exactly the identity fields of the Object. This makes it extremely tricky to support "change of identity" during a transaction.

In implementing JDOMax I learned a lot about this issue. Change of identity is actually very common: changing a "User" Object's 'username' field, for example. Because most frameworks for transparent object persistence keep a cache, change of identity can have a devastating and unexpected effect on this cache. Because, equals and hashcode suddenly return different values at different times during the transaction, the cached pojos can appear to "dissappear" from cache.

So one of my biggest issues with ORMapping that claims to be transparent is that it ain't transparent. If I have to override equals and hashCode in a particular way, the mapping is not transparent.

Unlike any other O/R Mapping system that I know of, Shades does not impose any restrictions on how you have implement equals and hashcode. In fact, it doesn't care at all. Shades has a dynamic ORMapping system, as opposed to a static mapping. You can query a record from the database, and load it into a pojo whose equals and hashcode depends only on the 'lastname' field of the pojo. In the same transaction you can load a second record from the table into a pojo whose equals and hashcode depend only on "firstname". You can change the firstname or the lastname field of either pojo, during the same transaction. You can load a third record into yet another pojo whose equals and hashode depend on NONE of the fields of the object. Modifications to the objects are transparently tracked and flushed out to the database on a call to 'update'.

Shades provides a dynamic ORMapping system, in which an ORMapping can be created or chosen, at runtime, to perform I/O from table to pojo. This has an advantage of allowing the "identity" of the pojo to depend on different columns of the database in different situations. Anyone who has ever built an app using straight SQL knows that these "perspectives" on a table surface in the variety of different columns that are retrieved in different situations. Shades is designed to be fluid and adaptable to these common situations. In fact, I thought of the name "Shades" because a good data access framework should recognize the shades of grey that permeate data access programming. Perhaps most of all, shades tries to minimize the number of rules, states, and do's and don'ts.

Friday, August 11, 2006

Lot's of progress

Well, I used to work for a guy who said writing software is a lot like baking waffles. The first waffle sticks to the iron and doesn't come out very well, but it primes the iron with just enough grease to make the second waffle perfect.

So it seems I have just baked my second waffle. I'm putting the finishing touches on Shades, a framework for ORMapping. My first waffle was JDOMax. Why the hell does the world need yet another ORMapping framework? Let me tell you:

  1. Too much XML configuration
  2. Transparent object persistence misidentifed as necessary
  3. Relationships between records ARE NOT analogs to references between Objects.
  4. Inheritance relationships are rarely natural in data models.
  5. Transitive closure persistence unnecessary
  6. Too many transactional states increases complexity beyond original problem

I learned all of the above the hard way! I'm not just pontificating on this crap. I spent roughly 3 years developing JDOMax and passing the Sun Test Compatibility Suite, so my first waffle was one mother of a waffle.
Looking at 1, "Too much XML configuration", your first reaction might be to think this is an implementation detail; just a consequence of bad XML design. I argue that this configuration complexity is a natural reflection of inherent complexity. Unfortunately, the inherent complexity lies not in the problem you are trying to solve, but in the act of object relational mapping itself. It seems that a static specification of mapping fails to easily convey *contextually* relevant information. In other words, a "mapping" is too static. What is really needed is an Object/Relational Interface, the implementation of which is *code*, and therefore can, in a few lines, create contextually relevant decisions that make ORMapping decisions dynamic and flexible rather than static and ridgid.

Shades has absolutely no XML nor annotation configuration. Rather, there's an interface called ORMapping that operated really a lot like a TableModel. A DefaultORMapping is usually extended by the program, and only a few methods need to be implemented. At first I was shocked by how easy it was but it's now apparent to me that O/R Mapping is better suited to programmetic configuration than document configuration. The reason XML configuration of O/R Mapping is so complex is because...well, because in THIS CASE it's way more complex than just writing a few lines of code.

Somewhere along the line we all just accepted that XML was the way to configure O/R mapping. We bought into a real big advantage of XML - that it's externalized, and can in theory be edited by this mythical "deployer" person. The problem is that this mythical deployer does not exist. In reality he's the programmer and it turns out, now that the wow cool factor has warn off, that the XML is harder to deal with than a few lines of code.

2. Transparent Object Persistence Misidentified as Necessary
This is a big deal. Transparent Object Persistence (TOP)is the idea that you operate on Objects with no concern for the fact that they may be persisted in a relational database. And even beyond that, TOP espouses that it is the Object AND it's references that are in the database. This just turns out to be wrong. THERE IS NO *INHERENT* MAPPING BETWEEN RELATIONSHIPS IN THE DATABASE AND REFERENCES IN THE JVM. But it is *possible* to relate object references to relationships in the datastore. The problem is that the mapping is not one-to-one, and again we are back to a solution that creates as least as many problems as it solves.

Quick example. Teacher Object has a collection of Student Objects. Each Student Object has a field named teacher. Database has an FK in STUDENT_TABLE that points at the PK of teacher table. So in Java land you remove a student from a teacher's collection. This must immediately null the teacher field in the student Object. This is unexpected. And making it work leads to a complex implementation of proxy collections. When the transaction is committed, the FK in the student is set to null. It's simply not at all clear that removing an element from a collection in memory is the right way to remove a relationship from the database. It's just not an intentional way to program. In other words, in order to "forget" about the relational database, we have to learn more than we ever bargained for, and understand subtleties, that even for experts can be mind bending. And the transparency turns out to be nothing near transparent. Consider a STUDENT_TABLE that places a unique constraint on STUDENT_LNAME. If you created a new Student in memory, and added it to a Teacher Object's collection, when would you find out this was invalid? You'd find out when you committed, and a nested datastore exception was thrown. What's so transparent about that. The transparency is a myth. And how would you handle the exception when you created 19 other Students in the transaction? Yes, yes, it is *possible* to handle using an array of nested exceptions, blah blah blah. BUT FOLKS, IT'S NOT EASY. YOU HAVE TO LEARN MORE TO MAKE IT WORK THAN TO SOLVE YOUR ORIGINAL PROBLEM!!!!!!!!!!!!

Wednesday, August 02, 2006

getting our of 'nam for som R&R

A few weeks ago I read an article on TSS called Ted Neward explains ORM as "The Viet Nam of Computer Science.

I have to say, from personal experience, that I agree with him.I got interested in O/R several years ago, and wound up writing an implementaiton of JSR 12, Java Data Object called JDOMax www.jdomax.com. It took about 2.5 years to write the software and pass the Sun Test Compatibility Kit. While I was writing the software, I often got the feeling I was going down into the rabbit hole. And after some time, just passing the TCK became a goal in and of itself. When all was said and done, roughly 50,000 lines of code had been written and debugged, and was being used by a few hundred downloaders. Then I got married and haven't looked back at my hobby since.

My goal now is to produce some software that can be used to easily query , insert, and delete data from the relational database, with a small learning curve, and most of all a small internal codebase. Basically, I have to distill something that is vastly simpler than the fullblown JDO that I had implemented, if I am to have any hope of getting out of 'Nam for some R&R.

Over the last few years, there has been a lot of debate on "query by example" vs. "query language". Having written a JDOQL compiler using JavaCC, I knew that I was going to have dispatch with the query language. Too much complexity, and more code than I can maintain. The second thing I decided to do away with was transparent object persistence. Transitive closure persistence, simply isn't an essential feature, and it often confuses developers.

In short, I had to be willing to do many things differently, even if it meant joining the Viet Cong. I hoped that by freeing my mind of dogma I could produce a system that leveraged the best and most essential features of several approaches.

I began by writing lots of "fake programs". I wanted to be sure that no matter how I implemented the internals, that the software was easy to use, and produced tight code, with an absolute minimum of configuration.