DataNucleus v3 and HBase

DataNucleus AccessPlatform v3 provides an opportunity to bring some of our other datastore plugins closer to the standard of the more mature long-supported datastores (e.g RDBMS). In the case of HBase, the plugin provided with v2.0 offered basic persistence and querying, at best. In v3 this is already much improved.

  1. You can now run SchemaTool against HBase. This operates in either “create” or “delete” modes, and allows you to manage the schema required by your persistable classes.
  2. When a relationship was persisted with v2.0 it simply serialised the related object. This broke JDO/JPA cascade rules. In v3 the column for the relation in the owner stores the identity (or identities when persisting multi-value relations). This also provides correct cascading of persist and update.
  3. When a String field was persisted in v2.0 it was Java serialised, meaning that it was not readable with something like “hbase shell”. In v3.0 String/char fields are persisted as the bytes of the field value, hence readable.
  4. With v2.0 we provided value generation using “uuid”, “uuid-hex”, etc simple generators, but not accessible by default. In v3.0 the default (JDO “native”, JPA “auto”) is “uuid-hex”, and we also provide an “increment” generator (contrib from Peter Rainer).
  5. In v2.0 we only supported “application identity”. In v3.0 we also support “datastore identity” (surrogate identity column).
  6. In v2.0 we didn’t support storing a version against the object. In v3.0 we do allow this.
  7. You can now embed persistable fields into the table of the owning object, and also nested embedded fields.
  8. Now supports HBase 0.90

Obviously there is still much more that can be done – see the datastore features comparison table. One that will give more performance is to handle more of any query filter in the datastore, rather than just processing all in-memory. Contributions for this and other things are very welcome. But then we aren’t even at 3.0 Milestone 1 yet …

This entry was posted in hbase, JDO, JPA, NoSQL. Bookmark the permalink.

3 Responses to DataNucleus v3 and HBase

  1. Trying to make DataNucleus + JDO + MySQL/HBase + Eclipse/NetBeans + Maven work, using all the 'highly dispersed' and 'incomplete' documentation on the subject, for more than 8 days now, I have a strange feeling about the people at DataNucleus.

    It seems that whoever tries to say something about Hibernate or other JPA implementations, DataNucleus looks to be very prompt in explaining their weaknesses and spreading the word about JDO and DataNucleus. Which is good in the sense, that people come to know about great technologies like JDO (superset of JPA) and DataNucleus' support for so many database backends, with a good level of abstraction and transparent changing of whatever DBMS you like. And know the intricacies of JPA and its implementations (said that I will not start flame wars in these communities though).

    But I have a big question from DataNucleus? Don't you think that open source projects are well adopted in the community when they provide good documentation? The forum on DataNucleus is filled with so many new comers, who are desperate in trying to use this single component in their projects, on stackoverflow, and all over the Internet. But are either ignored very coldly or are discouraged in a manner that it creates serious doubt about their promotion of the software. Yes, people need to read all the hundreds of pages of JDO specs, JBoss, Glassfish etc. deployment methods and other bundles of heavy books on the subject to use DataNucleus and JDO. Since it is all 'standards based'. Despite all that, will you not find simple well written and supported tutorials on JPA, Hibernate etc. and other open source software? That's how open source software can be well adopted and blended in the community.

    Just for the sake of clearance, you don't need to have 8 – 10 years of experience in JEE or certifications from Oracle to make them work. Or should be? Newcomers are always more than experienced developers in enabling technologies.

    If DataNucleus guys had written simple up-to-date and 'working' tutorials on different aspects of DataNucleus and JDO with different JEE containers and DBMSs, then the number of people using it would be far greater than what we estimate now. But they don't have time, since the coding and commits need 18 hours of daily work, even on weekends, and they don't have resources to have people who can write good documentation, one thinks.

    Commercial support is good, but haven't we seen RedHat, Ubuntu and others provide well written, easy of understand and follow docs for their open source products? Still they are earning!!!

    Forget that Google AppEngine's use of DataNucleus will skyrocket your earnings. The gaps to be filled need quick and serious consideration, and ultimate solutions, if you want wide use of your software.

    So thumbs up, and take my words not negatively but constructively. I wanted to tell you the nature of the matter.

    Lead by 'serving'.


  2. andy says:

    If you are lacking a tutorial on some particular area you can easily raise JIRAs for such; doesn't mean anyone will have time to provide anything, but that is the way … rather than disparaging on 3rd party sites (e.g Stackoverflow, or so I'm advised by their admins earlier today).

    Maven docs are complete AFAIK, as is the NetBeans tutorial (written by a NetBeans user, which I'm not) as is the Eclipse tutorial. You make no reference to any particular problem area … just several random target areas. The vast majority of JEE servers have been used by DN users but since I don't use that platform they are the reference point for this … and the forum the place to ask their experiences.

    You make comparisons between an open source project and RedHat/Ubuntu. Do you really think that's a fair comparison? Nope. How many committers does Hibernate have ? and several of them paid as well.

    What has GAE got to do with anything ? I wrote (and continue to write) v2 of their plugin. Do I “need” huge numbers of people using DataNucleus ? Do I do it all for money ? Would I really want the same numbers of users as Hibernate? Nope.

    The simple fact is that I'm the only person spending significant amounts of my time here, and have to earn a living like everyone else. Have you got a job? and in addition to that you do lots of things for other people in your spare time also? Good to hear it.

    Anyone is free to join the project, to develop a particular plugin (like some are doing with Mongodb for example). Just saying … “you have to do more or you won't get huge numbers of users” is achieving nothing.


  3. Well Said Andy 🙂

    Though, I ended up here trying to figure out how to get HBase running with JPA. Nevertheless, this does not in any way mean there are things amiss on Datanucleus end. Mostly it is a matter of a few people writing the How To blogs. Then again, reading documentation properly.

    One thing I am grateful is in finding out a good exit strategy from RDBMS to HBase. I can now do this with JPA thanks to Datanucleus!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s