Samstag, 22. März 2008

From OR to CR modeling - Hierarchies

So in the previous post, I started to think about data models in a Content Repository and the differences between relational models and, well, what?

But first a disclaimer: I am no big content repository expert. So expect me to do some classic mistakes, and ask some rather naive questions.

Here be one:
Is a hierarchy imperative to content modeling with JCR?


My thought is that hierarchical modeling feels natural in JCR. The hierarchical structure is instrinsic to structures with nodes, parents and children. So the next question would be

How do you map the hierarchies of an existing relational model to a hierarchical one?


We have the classical example once again

class Author {
static hasMany = [ books : Book ]
String name
}
class Book {
static belongsTo = [author:Author]
String title
}
which maps naturally to

/authors
/stephen king
/books
/Misery


But what if there is another hierarchy, e.g.


class Author {
static hasMany = [ books : Book ]
String name
}
class Book {
static belongsTo = [author:Author, category: Category]
String title
}

class Category {
static hasMany = [ books : Book ]
}

Now the books suddenly are part of 2 hierarchies. How to model this? I currently see 2 possibilities

  1. Have one leading hierarchy
  2. Flat it out

/authors
/stephen king
/books
/Misery
/categories
/Thriller
/Misery*
vs
/authors
/stephen king
/books
/Misery*
/categories
/Thriller
/books
/Misery*
/books
/Misery

(the * depicts a relation to a Node elsewhere)

Another solution would be to allow multiple dimensions, in which the objects can be stored (this is more or less the same as the "flat out" version, except that looking at the object always gives you the "real" one, and not the relation.

As I said, I am not an expert of data modeling.

Feel free to comment on it...

Ok, next post will be about Groovy/Grails again :) (yeah, with a dash of JCR in it)

Kommentare:

Jukka hat gesagt…

The optimal model depends on what your application is primarily designed for.

If you're a publisher that manages books by author (i.e. the primary way to look at the content is by author), then you'll want a <author>/<book> hierarchy.

If you're a book store that manages books by category, then a <category>/<book> hierarchy is probably better.

Or you could go with something like this that gives you easy views by author, category, and year:

/my:content
/authors
/K
/Stephen King
/categories
/Thriller
/books
/1987
/Misery
@author -> Stephen King
@category -> Thriller

Chrigel hat gesagt…

Hi Jukka

Congrats & thx for the first comment :).

Unfortunately the comments dont allow nice formatting via e.g. pre tags. Would you mind giving me a hint about the level of the respective nodes in your proposed model?

Thx

Jukka hat gesagt…

Here's an attempt using _ for indentation:

/my:content
__/authors
____/K
______/Stephen King
__/categories
____/Thriller
__/books
____/1987
______/Misery
______ @author -> Stephen King
______ @category -> Thriller

Chrigel hat gesagt…

Ok, I see, I think it is a matter of taste: the sample is more or less a "flat out" version (I do not like the naming, it is wrong), where the "books" hierarchy is depending on both the "authors" and "category" hierarchy.

Well, another question arises: should you design your content model with only one use case (or application) in mind? Or should you consider your model as beeing self-contained? Imagine writing both a publisher and a bookstore application ...

Jukka hat gesagt…

The nice thing about JCR is that the content hierarchy is quite flexible (unless you explicitly constraint it with strict node types), so you can evolve the structure as your application grows.

Assuming no other use cases than "manage information about books", I'd start with something simple, perhaps even just the /books tree with @author and @category as simple string properties with no need for the separate /authors and /categories trees.

Then as your application evolves and you add more use cases you can introduce new concepts like structured author information, etc. Also, if it turns out that some other hierarchy seems more natural, it should be no problem to reorganize the content tree.

To best achieve this flexibility, my rule of thumb is to always use the type of a node rather than it's location (path) to decide how it should be handled. This way you can decouple much of the application logic from the overall structure of the content model. For most of the code that manages books it should make little difference whether the book nodes are located below /authors, /categories, or /books.

Chrigel hat gesagt…

Very good info, makes a lot things clearer!

I had to spontaneously think about "duck typing" while reading your comment. Then you even wouldn't care about the type of the node as long as the required information is available...

And yet another connection between JCR and dynamic languages...