Want to assist in the development of JDO 4.0?

We have, since 2006, been reliant on the Apache JDO project to push forward the JDO standard. Politics have finally become too much for this arrangement, with Oracle involving lawyers to prohibit progress, and the Apache organisation not being as rapid as it should be at getting releases out of the door. I have now forked their API (now present in DN GitHub). The aim is for this repository to be used to develop JDO 4.0 (and maybe later dependent on people’s involvement), initially to get a TypeSafe query mechanism into the JDO API (originally contributed back in April 2010!!), and bring the API up to date wrt generics in queries themselves. There are also many other features that were requested in Apache JDO JIRA over the last few years that have been simply left, but are needed.

If you, or your company uses JDO this is your chance to get involved and push it forward to cater for these required features. Do you really want to have to cast the return from your Query to be a List etc. You could offer your real-world experience with the JDO API and what you think could be done to make it easier to use.

Should the Apache JDO project “wake up” at some point in the future we could easily enough merge our changes back into their API and discontinue this fork.

Posted in JDO, JDOQL | 7 Comments

Bytecode Enhancement contract in DataNucleus AccessPlatform v4.0

Now in GitHub master, for DataNucleus AccessPlatform v4.0, we have changed the bytecode enhancement contract.

Since the days of JPOX we’ve always used the JDO bytecode enhancement contract as defined in the JDO spec. This has always been adequate to provide the necessary hooks into the object to allow for “transparent persistence“. Saying that though, it does mean that anyone using DataNucleus would always have to have jdo-api.jar in their CLASSPATH, even when using JPA. This was clearly undesirable, but not a large price to pay for easy provision of JPA.

In v4.0 onwards we will enhance classes to implement org.datanucleus.enhancer.Persistable. This is very similar in terms of structure, just that methods are now prefixed “dn” instead of “jdo“, and there is now a method to get the ExecutionContext that is managing the object (whereas before it was a PersistenceManager, which made very little sense for JPA usage).

Why the change?

Oracle is putting significant obstacles in the way of having further releases of the JDO standard, involving lawyers etc. Additionally, following the Apache way, the Apache JDO project has not exactly operated very efficiently in terms of getting releases out of the door. Moreover we want to remove the requirement of having to have jdo-api.jar in the CLASSPATH for JPA usage. This change will also mean that we can, in principle, improve the bytecode enhancement contract to make things more efficient or add on more information to enhance the persistence process without being restricted by what JDO has bothered to standardise.

What does this mean for a typical user?

It means very little in reality, and the majority of applications will work unchanged (apart from having to re-enhance the classes). Some minor things that will change

  • Wherever you use enhanced classes, you will need datanucleus-core.jar in the CLASSPATH
  • JPA users won’t need to have jdo-api.jar in the CLASSPATH
  • Internally DataNucleus now uses its own builtin single-field identity classes, and if you refer to javax.jdo.identity.* classes (for JDO contexts) will auto-convert to our own class for internal use. See this package for the DN internal identity classes. There is really no need to use these JDO builtin classes directly since DataNucleus will always select the most appropriate id type when you have a single PK field.
  • You no longer check if a returned object is of type javax.jdo.spi.PersistenceCapable since it won’t be, instead being a org.datanucleus.enhancer.Persistable.

Please register any concerns/queries in the comments section

Posted in Bytecode, JDO, JPA | 6 Comments

Configuring persistence of fields/properties using TypeConverters with JDO

JPA 2.1 allows a user to specify a converter on the value of a field/property for how it is persisted in the datastore. The way we implement that in DataNucleus is to have the JPA converter as a wrapper to our own internal TypeConverter class. This means that we can make the TypeConverter mechanism available to JDO users too. Here’s how it works. The first thing to do is to define a TypeConverter. As you can see, it requires implementation of just 2 methods, one for use in converting the (raw) field value into the datastore value, and one for use in converting the datastore value back into the (raw) field value. This means that the user has significant flexibility on how their values are stored, and can define their own converters and not rely on them being part of DataNucleus. If we take an example, here we want to store a field as serialised, but not standard Java serialised, instead using the Kryo library. So we define our TypeConverter as

public class KryoSerialiseStringConverter implements TypeConverter
    ThreadLocal kryo = new ThreadLocal();
    public Kryo getKryo()
        Object value = this.kryo.get();
        if (value == null)
            value = new Kryo();
        return (Kryo)value;

    public String toDatastoreType(Serializable memberValue)
        if (memberValue == null)
            return null;

        Kryo kryo = getKryo();
        String str = null;
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        Output output = null;
            output = new Output(baos);
            kryo.writeClassAndObject(output, memberValue);
            str = new String(Base64.encode(baos.toByteArray()));
        return str;

    public Serializable toMemberType(String datastoreValue)
        if (datastoreValue == null)
            return null;

        Kryo kryo = getKryo();
        byte[] bytes = Base64.decode(datastoreValue);
        Object obj = null;
        ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
            Input input = null;
                input = new Input(bais);
                obj = kryo.readClassAndObject(input);
                    if (input != null)
        catch (Exception e)
            throw new NucleusException("Error Kryo deserialising " + datastoreValue, e);
        return (Serializable)obj;

Now we need to register this converter under a name with DataNucleus runtime. Here we define a plugin.xml at the root of the plugin jar as

<type-converter name="kryo-serialise" member-type="java.lang.Serializable" datastore-type="java.lang.String"

So this converter is internally known as kryo-serialise. We add a MANIFEST.MF to our plugin jar as (something like)

Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: DataNucleus Converter using Kryo
Bundle-SymbolicName: org.datanucleus.store.types.converters.kryo;singleton:=true
Bundle-Version: 3.2
Bundle-Localization: plugin
Bundle-Vendor: DataNucleus
Require-Bundle: org.datanucleus
Import-Package: com.esotericsoftware.kryo;version="2.21",

just defining the dependencies of this plugin for OSGi and the plugin mechanism. Now we just need to use it on a sample class and apply the conversion to a field/property.

public class Sample
    long id;

    @Extension(vendorName="datanucleus", key="type-converter-name", value="kryo-serialise")
    String longString;


Now whenever instances of Sample are persisted the field longString will be converted using our kryo-serialise converter. Easy! You can get the code for this example converter at DataNucleus GitHub repo

Posted in DataNucleus, java, JDO, Persistence, Uncategorized | Leave a comment

AccessPlatform 3.3 and JPA 2.1

We will soon be releasing AccessPlatform 3.3. This is coming very soon after 3.2, and the reason for this is that it is simply AccessPlatform 3.2 plus full support for JPA 2.1 (i.e an upgraded datanucleus-api-jpa plugin). From that point both of these “versions” of AccessPlatform will be maintained for a period of time. So what is provided in JPA 2.1 ?

Stored Procedures

You can execute RDBMS Stored Procedures using the JPA API. This API should give you quite complete control over execution of any stored procedure, setting of IN/OUT/INOUT parameters and obtaining result set(s). To give an example

StoredProcedureQuery spq = 
Integer.class, ParameterMode.OUT);
boolean val = spq.execute();
Object paramVal = spq.getOutputParameterValue("PARAM1");

So here we have a stored proc that we register an Integer output parameter, and retrieve the value when we execute it. This is just one mode of operation, and you can see more usages in the documentation.

Entity Graphs

JPA has only had an equivalent of the JDO “default fetch group” since its inception. In this release it finally gets some degree of control over what fields are fetched when fetching an object from the datastore. To give an example, we have a class Person and want to pull in a field “bestFriend” under some circumstances but not by default. We define a named EntityGraph in metadata

@NamedEntityGraph(name="includeFriend", attributeNodes= {@NamedAttributeNode("bestFriend")})
public class Person
Person bestFriend;


and now we want to use this entity graph when loading an object of this type. We do this as follows.

EntityGraph friendGraph = em.getEntityGraph("includeFriend");
Properties props = new Properties();
props.put("javax.persistence.loadgraph", friendGraph);
MyClass myObj = em.find(Person.class, id, props);

So we retrieved the EntityGraph, and then used it in the find method. Equally we could have used it in a Query. You can read more about this topic in the documentation.

Schema Generation

JPA 2.1 allows generation of the schema as an up front task, or via Persistence.generateSchema(). You can specify this by making use of persistence properties, for example javax.persistence.schema-generation.database.action set to create. See the available persistence properties for details.

Foreign-Keys and Indexes

JPA 2.1 adds on the ability to specify RDBMS schema foreign-keys and indexes, for use during schema generation. By default a JPA implementation is free to generate whatever foreign-keys it decides are appropriate, but this ability allows a user to override this and control what is generated.

@JoinColumn(name="BESTFRIEND_ID", foreignKey=
@ForeignKey(name="BESTFRIEND_FK", foreignKeyDefinition=
Person bestFriend;

which will create a foreign key called “BESTFRIEND_FK” for this purposes. Similarly we can define indexes on a table of a class.

@Table(indexes={@Index(name="FIRSTNAME_IDX", columnList="FIRST_NAME")})
public class Person
String firstName;


so the firstName field is now indexed. You can read more about this topic in the documentation.

Criteria UPDATE/DELETE queries

Whilst the JPA Criteria API is overengineered and verbose it now has the ability to generate UPDATE and DELETE queries. For example

CriteriaUpdate crit = qb.createCriteriaUpdate(Person.class);
Root candidate = crit.from(Person.class);
crit.set(candidate.get(Person_.firstName), "Freddie");
Predicate teamName = qb.equal(candidate.get(Person.firstName), "Fred");
Query q = em.createQuery(crit);
int num = q.executeUpdate();

which will create the JPQL “UPDATE Person p SET p.firstName = ‘Freddie’ WHERE p.firstName = ‘Fred'”. You can do similar things for DELETE queries. You can read more about this topic in the documentation

Attribute Converters

By default a JPA implementation will persist a field in a datastore column of its chosen type. You can now override this to use a converter, performing the conversion from the field type to the datastore type in your converter class. The example we use in the documentation is where we have a field in our class of type URL and want to persist this as a String-type in the datastore (VARCHAR, CHAR etc).

public class URLStringConverter implements AttributeConverter
public URL convertToEntityAttribute(String str)
if (str == null)
return null;

URL url = null;
url = new java.net.URL(str.trim());
catch (MalformedURLException mue)
throw new IllegalStateException("Error converting the URL", mue);
return url;

public String convertToDatabaseColumn(URL url)
return url != null ? url.toString() : null;

and then in our class that has a URL field we mark the field to use this converter

URL url;

You can read more about this topic in the documentation.

JPQL FROM “ON” clauses

When joining in JPQL previously we could not add additional constraints on the join. You can now do this using the “ON” clause.

List result = em.createQuery(
"SELECT Object(A) FROM Account A LEFT OUTER JOIN A.login L ON L.userName = 'fred'").getResultList();


No matter what features you put in a query language some people will always want to make use of SQL functions specific to a particular datastore. With JPQL you can now do this using the FUNCTION keyword, like this

Query q = em.createQuery(
"SELECT p FROM Person p WHERE FUNCTION('UPPER', p.firstName) = 'FRED'");

As you can see, JPA 2.1 is a minor iteration on JPA, and you can now benefit from all of these features in DataNucleus AccessPlatform 3.3

Posted in Uncategorized | Leave a comment

What does open-source owe you?

If you ever embark on a period of time writing open source (free) software you’ll almost certainly come across many different attitudes in the people who choose to use this software.

There will be some people who’ll actually say thanks for providing this software, that it’s helped them with their project, saved them many hours of time that otherwise they would have had to spend writing something similar. The software was suitable for their project needs, so great, your effort in making the software open source has benefitted people. Feel good.

There’ll be people who maybe say thanks but typically just accept the software as it is, some sort of given, and when they have a problem they take the source code and try to work out where the problem is. They may ask you questions about how the code works, or where to look to get started in resolving the problem. And some time later they may come back with a patch for their problem, so this makes the software better. Again, a good thing.

There will also be a group who take the software and use it for their projects. If a problem occurs they will report it. Their report may provide a way of reproducing the problem, or it may not. If a problem is reported with a way of being reproduced then it can be fixed when the people writing the software have some spare time to do it. Another good thing. If there is no way provided to demonstrate it then the problem report is of little use to anyone … unless the person who has the problem is willing to get their hands dirty, get the code and fix it (with help where necessary) since only they can see it.

A final group will take the software and if a problem occurs they keep it to themselves; it’s like they expect you to be aware of everything that could possibly happen with the software, the developer is a crystal ball wielder. The people who develop that software only have a certain amount of time, and they typically will use it and test it against what their own project requires. That doesn’t mean that their use-cases are the same as yours. So don’t expect open source developers testing to cover what you need for your project; you could contribute tests to their suite, or donate for their time to run against other datastores if this is important to you.

You may find people who ask the question “should I ditch use of your software?” when faced with a problem, something that your software doesn’t cater for, or fails on. Maybe this is in some kind of “threat” sense, fix this problem or I leave? Well the answer to that is simple really. People should do what is right for their project. They’ve demonstrated one way or another whether they wish to contribute anything to the open source software (problem reports, testcases, patches, documentation, blogs, testimonials, donations, etc, there are many ways). If they haven’t demonstrated the willingness to do anything for the project then their input won’t be missed if they go off somewhere else. Do they pay the people who develop the software ? well no. Does the license of that software imply any guarantee that all problems will be fixed immediately when the toys are thrown from the pram? nope. Maybe this software is not the correct tool for their project? in which case use the correct tool for the job, and don’t vent your frustration at your choices on the people who have provided something for nothing. Further to this, stick to the old addage “don’t ask someone to do what you wouldn’t be prepared to do yourself“, didn’t your mum teach you that?

People seem to have got accustomed to having an open source solution these days, and that somehow it’s their “right” to have it and their “right” to have any problems found fixed. While open source (free) software gives projects a leg up in reaching their end goal more rapidly and is a great thing for software developers, open source software owes the end user nothing. Best understand this. The end user has the opportunity to do many things to contribute to that software, make it better, repay those people who put their time into developing it. The time of these people who wrote it is important to them, even if it isn’t to you; at least respect that.

Some things are for sure, when you embark on writing open source software, it can be very rewarding, very beneficial if you want a way to demonstrate to potential employers of your coding skills, excellent possibilities for exploring other technologies and gaining experience, working with other people with different viewpoints, but don’t go into it for the gratitude :-)

[Disclaimer : while there is such a thing as commercial open source software, providing the source code yet charging for the software, what is being discussed here is the much more common open source free software]

Posted in Uncategorized | Leave a comment

Performance – effect of various features

2 years ago I made a post about performance/benchmarking, and the fact that some groups like some magic black and white “X is better than Y” (and that there is only one measure of performance so it doesn’t matter what object graphs are used it will always be the same). The evidence is that they are wrong. Needless to say there will always be groups that don’t share our philosophy, or don’t have time to do a complete analysis (though publish their results knowing that they are incomplete and likely invalid, after all it’s not their software they’re maybe not presenting in a fair light). Recently we had another performance exercise. This came to the conclusion “Hibernate is better than DataNucleus, and you should really just get ObjectDB“. So we’re back in the territory of black and white. Yes, an OODBMS ought to be way faster than RDBMS, particular when the RDBMS has a persistence layer in front of it (and you have to pay for the OODBMS besides), but that is not the subject of this post. We’ll concentrate on the former component of that conclusion.

There is nothing to add to the previous blog post in terms of correctness, we stand by all of it and nothing has been demonstrated to the contrary. This blog post simply takes the recent exercise sample and demonstrates how enabling/disabling certain features has a major impact on (DataNucleus) performance. The author of that exercise demonstrated results showing that JDO and JPA with DataNucleus were on a par in terms of performance, but below Hibernate in terms of INSERTs (anything between 1.5 and 2 times) and on a par for SELECTs (some faster, some slower but more or less the same). Since JDO and JPA are shown to be equivalent, we’ll just run the exercise with JDO here, but the same is easily demonstratable using JPA (because in DataNucleus you have full control over all persistence properties and features regardless of API).

The sample data used by this case is that of 3 classes. Student has a (1-N unidirectional) List of Credit and has a (1-1 unidirectional) Thesis. We persist 100000 Students each with 1 Credit and 1 Thesis. So that’s 300000 objects to be inserted, and then 100000 Students queried.

The INSERT is as follows

for (int x = 0; x < 100000; x++);
Student student = new Student();
Thesis thesis = new Thesis();
List credits = new ArrayList();
Credit credit = new Credit();

and the SELECT is as follows
Query q = pm.newQuery(
"select from " + Student.class.getName() +
" where thesis.complete == true && credits.size()==1");
Collection result = (Collection) q.execute();
... loop through results, so we know they're loaded

So we’ll run (on H2 database, on a Core i5 64-bit PC running Linux, 4Gb RAM) and vary our persistence properties to see the effect.

Original persistence properties (from original author)

optimistic=true, L2 cache=true, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 120s, SELECT = 6.5s

Disabled L2 cache

Since we’re persisting huge numbers of objects and it takes time to cache those, and in the original authors case Hibernate had no L2 cache enabled, lets turn the L2 cache off. So we now have
optimistic=true, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 106s, SELECT = 4.0s
Why the improvement? : because objects didn’t need caching, so DataNucleus didn’t need to generate the cacheable form of those 300000 objects on INSERT, and 100000 objects on SELECT.

Disabled Optimistic Locking

Now instead of using optimistic locking (queue all operations until commit/flush), we allow all persists to be auto-flushed. As our exercise is bulk-insert we don’t care about optimistic locking since we’re creating the objects. So we now have
optimistic=false, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 42s, SELECT = 4.0s
Why the improvement ? : because objects are flushed as they are encountered so we don’t have to hang on to a large number of changes, so the memory impact is less. Note that we could have observed a noticeable speed up also if we had instead called “pm.flush()” in the loop after every 1000 or 10000 objects. See the performance tuning guide for that.

Use BoneCP connection-pooling

Use BoneCP instead of built-in DBCP, so we have
optimistic=false, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=bonecp
INSERT = 42s, SELECT = 3.8s
Why the (slight) improvement ? : because BoneCP has benchmarks showing that it has less overhead than DBCP


As you can see, with very minimal tweaking we’ve reduced the INSERT time by a factor of 3, and the SELECT time by a factor of 1.7! That would equate to being noticeably faster than Hibernate in the authors original timings (for both INSERT and SELECT). Note that we already had the detach flags set to not detach anything, so they didn’t need tuning (but should be included if you hadn’t already looked at those in your performance tests, similarly all of the other features listed in the Performance Tuning Guide referenced above).

Does the above mean that “DataNucleus is faster than Hibernate” ? Not as such, it is in some situations and not in others. We can turn on/off many things and get different results, just as Hibernate likely can (though I’d say DataNucleus is more configurable than the majority if not all of the other persistence solutions so at least you have significant flexibility to do this with DataNucleus). In the same way we could persist other object graphs and get different results due to some parts of the persistence process being more optimised than others. One thing you can definitely say is that DataNucleus has very good performance (300000 objects persisted in 42secs on a PC, and 100000 objects queried in less than 4secs) and that performance can be significantly tuned.

The other thing that we said in the original blog post and repeat here, if you are serious about performance analysis you have to dig into the details to understand why and, as a consequence, you have an idea what to tune. You also need to assess what your application really needs to perform and what is considered acceptable performance; if you’re not going to make a proper attempt at tuning a persistence solution (whether that is DataNucleus, Hibernate, or any other), best not bother at all and just use what you were going to use anyway since you don’t have the time to give a fair representation (which is why we don’t present any Hibernate results here, so nothing hypocritical in that).

One important thing to note is that it is extremely useful to have the ability to set many of these properties on a PersistenceManager (or EntityManager) basis (so you could have a PM just for bulk inserts and disable L2 caching, or set the transaction to not be “optimistic”). JDO 3.1 adds the ability to set persistence properties on the PersistenceManager, though DataNucleus only currently supports a minimal set there – SVN trunk now has the ability to turn off the L2 cache in a PM while have it enabled for the PMF as a whole.

Posted in Uncategorized | Leave a comment

Enhancing in v3.2

Whilst a “final release” of version 3.2 of DataNucleus is still some way off, some important changes have been made to the enhancement process that people need to be aware of, and can benefit from.

JDO : Ability to enhance all classes as “detachable” without updating metadata

When you enhance classes for the JPA API they are all made detachable without a need to specify anything in the metadata (since JPA doesn’t have a concept of not being detachable). With JDO the default is not detachable (for backwards compatibility with JDO1 which didn’t have the detachment concept). In v3.2 of DataNucleus you can set the alwaysDetachable option (see the enhancer docs ) and all classes will be enhanced detachable without the need to touch the metadata; much easier than opening up every class or metadata file and adding detachable=”true” !

JPA : Throwing of exceptions due to the bytecode enhancement contract

The bytecode enhancement contract requires that classes throw exceptions under some specific situations where information is either not present or not valid. These always used JDO-based exceptions before to match the JDO bytecode enhancement contract exactly. These are now changed to better suit the JPA API, and remove a need to understand JDO when using JPA.

  • if a non-detached field was accessed then a JDODetachedFieldAccessException was thrown; this is now changed to an (java.lang.)IllegalAccessException.
  • in some cases where an internal error occurred a JDOFatalInternalException would be thrown; this is now changed to an (java.lang.)IllegalStateException.

No “datanucleus-enhancer.jar”, and no need of external “asm.jar”

The DataNucleus enhancer was always maintained as a separate project, but is now merged into datanucleus-core.jar and so will be available directly whenever you have DataNucleus in your CLASSPATH. Taking this further, the enhancer makes use of the excellent ASM library and in v3.2 datanucleus-core.jar includes a repackaged version of the ASM v4.1 classes internally. This means that you have one less dependency also and can do enhancement with less thinking.
PS Remember, bytecode enhancement is “evil”, developers of some other persistence solution told you that back in 2003, and you should never forget it! ;-)
Posted in Uncategorized | Leave a comment