Thursday, November 29, 2007

10 Tips on JPA Domain Modelling

This post is a collection of tips on what I think is good advice, when domain modelling in Java with JPA as ORM mapping technology. Do you agree? Do you have extra advice? Please let me know!

Here they come, in no particular order.

1. Put Annotation on Methods, not Attributes
If using annotations on attributes, JPA engine will set directly in the attributes using reflection, hereby by-passing any code in setters and getters. This makes it hard to do extra work in setters and getters, if the need arises.

In addition, if you add a getter for some calculated value which has no corresponding attribute, you can mark it @Transient on the method. Had you been putting it on the attributes, you would have no attribute to put the annotation on.

Whatever you do: Try not to mix, using with annotations both on fields and methods. Some JPA providers cannot handle this!

2. Implement Serializable
The specification says you have to, but some JPA providers does not enforce this. Hibernate as JPA provider does not enforce this, but it can fail somewhere deep in its stomach with ClassCastException, if Serializable has not been implemented.

3. Use The Fine Grained Domain Modelling and Mapping Possibilities in JPA
If coming from EJBs (before EJB3), you are not used to be able to do fine grained modelling. EJB2.x was very entity centric. In JPA, you have @Embeddable and @Embedded. Doing more fine grained domain modelling can help make your domain model more expressive.

An @Embeddable is a value object, and as such, it shall be immutable. You do this by only putting getters and no (public) setters on its class. The identity of a value object is based on its state rather than its object id. This means, that the @Embeddable will have no @Id annotation.

As an example, given the domain class Person:
@Entity
public class Person implements Serializable {
...
private String address;

@Basic
public String getAddress() { return address; }
public void setAddress(String address) { this.address = address; }
}
We could express Address better, by giving it a class of its own. Not because it should be mapped to some other table, but because it makes sense in this particular model. Like this:
@Embeddable
public class Address implements Serializable {
...
private String houseNumber;
private String street;

@Transient
public String getHouseNumber() { return houseNumber; }

@Transient
public String getStreet() { return street; }

@Basic
public String getAddress() { return street + " " + houseNumber; }

// setter needed by JPA, but protected as value object is immutable to domain
protected void setAddress(String address) {
// do all the parsing and rule enforcement here
}
}

@Entity
public class Person implements Serializable {
...
private Address address;

@Embedded
public Address getAddress() { return address; }
public void setAddress(Address address) { this.address = address; }
}
The better expressiveness comes from: a) Putting a named class on a concept in the model and, b) having a place (the value object class) where to put domain logic and enforce domain rules.

4. Implement Equality using Real Domain Attribute Values
Classes marked @Entity will always have an id attribute. Often, this is a long sequence. It can be tempting to use this value when implementing equals and hashCode (which is also a requirement), but I recommend against it. I can find two good reasons: One based on modelling rules and one based on technical terms.
  • Modelling rule: A class modelled as an entity should be uniquely distinguishable from other instances, solely based on a combination of some of its domain attributes. A long sequence, used solely to obtain relational identification, does not constitute a domain attribute. If you are unable to find a unique combination, it might very well be a sign of a problem with the model.
  • Technical term: If equality is based on a database generated and assigned value, you will not be able to use equals and hashCode before an instance has been persisted. That includes putting the instance into container classes, as they rely on equals and hashCode.
5. Protect the Default Constructor
The JPA specification mandates a default constructor on mapped classes, but a default constructor seldom makes sense in modelling terms. With it, you would be able to construct an entity instance with no state. A constructor should always leave the instance created in a sane state. The requirement for the default constructor is only to make dynamic instantiation of instances of the class possible by the JPA provider.

Luckily, you can, and are allowed to, mark the default constructor as protected. Hibernate will even accept it as private, but that is not by the spec.

6. Protect Setter Method on Id Attribute
Basically the same story as above. In this case, it is just because it makes no sense for the application to assign an id.

NOTE: This is only for when the id attribute is marked as assignable by the provider.

7. Avoid Primitives when Mapping Id Attribute
Simply use Long and not long. This makes it possible to detect a not yet set value by testing for null. If using Java5 or above, auto-boxing should take away the pain.

8. Use the Basic Annotation to Override Defaults
By all means, use @Basic to override the default true value of optional to false, for those fields that are not actually optional (I often find that to be most of my attributes).

9. Go Ahead and Use the Column Annotation
Even if you are not interested in generating a schema or DDL from it, you should not hold back on using the @Column annotation. It tells the reader of the code about important information related to the attribute. This is stuff like nullability, length, scale and precision.

10. Do Not Use java.sql.Date/java.sql.Timestamp in Domain Model
Instead, use java.util.Date or java.util.Calendar. Using the types from the java.sql package is a leakage of concerns into the domain model, that we do not want, nor need.

Just remember to put @Temporal on date and calendar attributes.


21 comments:

Ignacio Coloma said...

Can you expand a little on details about the implementation of hashCode() and equals()?

In my experience that is an old practice already abandoned. The only class that needs to implement both is the primary key, which is usually provided for Long or Integer keys. I have never implemented any of them on entity classes.

Per Olesen said...

@ignacio:

When we put objects into a Map implementation, it will call .hashCode() on the instance, to determine the hash bucket to put this entry at.

En example on the need for equals(), is if you have instances in a java.util.List implementation, and then call the contains method. It will use equals(). In addition, you generally need to implement both hashCode and equals, if you choose to implement one of them. This is due to the contract of these methods.

So, it is my opinion, that these methods really should be implemented for nearly all, if not all, of the jpa entity mapped classes. They *will* end up in collection classes.

What I advocat then, is to use the domain attributes in these implementations, and not the database assigned sequence id. Like this simplified example:

public class Person {
  private Long id;
  private String name;

  public boolean equals(Object other) {
    // use "name", not "id" here
  }

  public int hashCode() {
  // use "name", not "id" here
  }
}

In this example assuming that "name" is uniquely identifying the instance.

And my other point then, is: If you find yourself unable to produce an implementation that guarantees uniqueness on some combination of domain attributes, you might be in trouble with your model. Maybe it is not an @Entity, but a value object (@Embeddable). Maybe the entity needs to be modelled in other ways.

Jay said...

To ensure you always have a unique object id, use a GUID generator. The links below speak on this and the give an example of one I use.

With a true GUID, you can no rely on a simple equals() and hashcode(). Simply compare the GUIDs, which will always be there.

http://www.jroller.com/jcarreira/entry/overcoming_the_hashcode_object_identity

http://www.theserverside.com/patterns/thread.tss?thread_id=4976

Ignacio Coloma said...

Hi again,

Agree that hashCode and equals are used to check if an object is already in a collection, but the method is called on the primary key, not the persistent entity:

http://www.hibernate.org/hib_docs/reference/en/html/persistent-classes.html#persistent-classes-equalshashcode

There are very few cases where they should be implemented for a Business Object. The Eric Evans book tend to agree in this aspect.

Ignacio Coloma said...

Oooops, I whould have used links instead:

the hibernate chapter on equals and hashcode

Alex said...

Another interesting approach
http://www.devx.com/Java/Article/30396/0/page/1

Jay said...

Ignacio, I understand your view, but since a Set uses the hashcode of elements for equality, I rely on actually comparing objects (compareTo...), and not their equals value.

It's a Java issue, but one we can easily overcome.

Jay said...

Alex, I have to say using the GUID for id's is easier and has a much smaller footprint on your domain model.

Alex said...

Jay, in general, I completely agree with you, but sometimes it is impossible to use GUID, assume you have to use legacy DB identity generator.

Per Olesen said...

@ignacio:

Hmm, ... I don't get it. You say that equals/hashCode is only called on the PK class. Do you mean that Hibernate only does this?

a) Hibernate calls equals/hashCode on the entity objects.

b) More generally, Java will need these too, when working with collection classes.

From the link you provided (the hibernate chapter on equals and hashcode), I only get support for what I wrote. It doesn't mention that it handles PK specially!?

I *does* say though, that you "only" need equals/hashCode when working with entities across Hibernate Session instances. This is due to Hibernate guaranteeing Java Object equality with equivalent database loaded instance. But most of my systems work with detached objects.

Per Olesen said...

@alex: Nice article you linked to there. Interesting.

I would be a bit reluctant to actually go with it though. In theory, it maybe could work okay. But I do fear the runtime, byte-code enhancing, quite complex, solution for what should be a simple problem (maybe it isn't).

I think of the GUID solution mentioned elsewhere in the comments as a simpler solution.

It *is* interesting though. Have you worked with it before?

Ignacio Coloma said...

AFAIK, Hibernate Sets use only the PK value to check if it contains an entity.

PK has the advantage of being immutable, while entities are not. Calculating a hashcode using mutable fields would return a different value depending on the state of the object, which would make it impossible to locate it in the cache.

Imagine that you insert an object in a Set, then change some attributes: your hashcode will change, making it impossible to locate the object again in the cache. Hibernate Sets use the PK to calculate hashcode and equals.

Per Olesen said...

@ignacio: Ooh, okay. But then I can only use my entities with Hibernate Set implementations.

Actually, come to think of it, I guess calling only PK will only work within a single Session instance too. Cause it will have to use object references for equality then, on the instances that has no PK assigned yet (not persisted yet).

Took a quick dive into the hibernate sources. Especially into org.hibernate.collection.PersistentSet. I couldn't quickly determine how it worked, but I can see references to a real Set impl contains() call. But I guess that says nothing, as that is just an interface, and can be another hibernate impl.

I think it is safe to say, still, that one should implement these methods.

Ignacio Coloma said...

It was some time ago, but I believe this has been the behavior of EJB containers for some years.

The approach that you suggest will only be valid if you can suggest an immutable hashcode() and equals() implementation. Hashcode cannot return a different value once the object has been included in a Set.

In fact, both methods are required in your key class if you are using custom primary keys (again, because it should be immutable).

Jay said...

@Ignacio: I may have misunderstood you, but regardless of the implementation, Java Sets check the hashcode of the element itself.

Hibernate, on the other hand, checks the identity (PK, GUID id, etc) of an entity to check if an entity is already loaded in the Session.

This is also why using a GUID is a strong solution. Your entities now always have an ID, persisted or not. Now persisted and transient objects can exist in the same collection, and you can allow Hibernate to figure it all out.

Per Olesen said...

@Ignacio: This is getting interesting :-)

As such, I do agree with you, that the implementation of hashCode needs to be on immutable values. But only as long as the objects are in the same Set.

When you write:

"...The approach that you suggest will only be valid if you can suggest an immutable hashcode() and equals() implementation..."

This is actually spot on. Because this is exactly such a business key, that I suggest that you must be able to find on each and every entity class mapped. If not, I would think the model needed rethinking.

Now, I would like to paste in a bit of docs from hibernate chapter 11.1.3 on "Considering object identity":

"...The developer has to override the equals() and hashCode() methods in persistent classes and implement his own notion of object equality. There is one caveat: Never use the database identifier to implement equality, use a business key, a combination of unique, usually immutable, attributes...."

"...Attributes for business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set...."

The above extract is where the docs talk about using entities outside of a hibernate session.

Interesting though, is it, that the JPA spec mentions nothing about this. It only mentions the need for equals/hashCode on a composite key class.

I guess this is due to the fact, that this is a non-JPA/non-hibernate issue, if we are strict about it. Is is though, something that most programmers--in practise--will have to do (equals/hashCode), if (or should I say when) they work with entity instances outside of a session. This is further supported by the statement "...Also note that this is not a Hibernate issue, but simply how Java object identity and equality has to be implemented...", again from the excellent hibernate docs.

In my opinion, the hibernate docs gives better information to the programmer here, than the JPA specification does.

Ignacio Coloma said...

I think the main point is that I don't use Sets of persistent entities outside of the ones handled by the JPA container. For web applications I prefer not to rely on *-to-many JPA relations to update database data but only to read it.

From my point of view, adding an extra primary key (GUID or business key) redundates the role of the PK value and is very rarely needed in a web application.

Volodymyr Zhabiuk said...

I can not agree with the first tip. The need to perform any extra work in getters and setters arises very infrequently. As far as I'm concerned I can hardly imagine such situation, moreover all the examples in the book 'Pro EJB 3, Java Persistence Api' by Mike keith use attribute annotations.

Rockhopper said...

>Using the types from the java.sql package is a leakage of concerns into the domain model, that we do not want, nor need.

But littering your code with JPA Annotations is not leakage?

Per Olesen said...

@rockhopper

.. well, yes and no :-)

I know people are opiniated about this.

I see annotations as metadata, and as such, they are not part of the model as such. They make the model work with certain technology, at runtime.

That said, I know that deep understanding of ORM technology is needed, to exactly know how a model behaves, and of course, that is a leakage.

In addition, I would say that some JPA annotations belong in the code more than others. I mean, stuff like @NamedQuery and @Column could nicely go into orm.xml, as they are not at all needed to understand the model.

Dave Insurgent said...

I am more than a little late to the show, but I have to disagree with using property access. Mind you I think that is generally the opinion now..

Persistence is about saving and loading state - not transforming.

Transformation should be done as the object is accessed in the context of a domain object. The persistence layer does (should?) not be participating in this. It is just storing and loading state and this is best done by direct access to fields.