Query DSLs - power of expressiveness

From WikiEducator
Jump to: navigation, search


Query DSLs: the power of expressiveness
Convenor: Rickard Öberg
Participants:
  • ...
  • ...
Summary:

The discussion was about how to use Query DSL's to replace using strings for querying datastores and various indexes. The reasons for doing queries this way includes (but is not limited to):

  • Enabling exploratory building of queries. Help with code completion
  • By encapsulating the query in an object we can then either execute it or render it
  • Minimize potential for SQL injection-type security issues
  • Separation of concerns. The domain model can supply the where-part of the query, and the UI can do the sorting and paging by modifying the same query builder
  • Before execution we can perform custom optimization operations on the query, either by arranging the query itself or by triggering indexes in the database to be created on the fly as a heuristic decision

In general you can create query DSL's using the builder pattern that are specific to certain query engines. In our discussion we were using the Qi4j Query API as an example, which is supposed to work with several query engines. Therefore it is useful to distinguish between what parts are common to all query engines, and which are specific, so that you can leverage the underlying capabilities of the query language if you want to.

The different parts of a query that needs to be handled are:

  • Dataset selection
  • Filtering of the dataset using "where"-style expressions
  • Extraction of the data that you want to return
  • Sorting of data
  • Paging of data
  • Projection of the data to the final format that you need

The dataset selection is going to depend on the underlying store, so this is largely non-generic. Since the Query object that the builder creates is an in-memory representation of the query, it is possible to execute it on a given Iterable, making it possible to essentially chain queries by passing the result of one Query as the dataset source for another Query. This way you can perform filtering operations that maybe are not directly supported by the database, without having to do them manually. There's obviously the tradeoff of having to load data into memory for filtering however.

When it comes to filtering many query languages share some basic operations like and,or, equals and numeric operators. These can be placed into a generic set of expressions. The pattern used here is the Specification pattern, which takes an object as input and returns true if it matches the specification. For example, in Qi4j if you want to return the Person entities whose name property matches a specific string, you can specify it with QueryExpressions.eq(person.name(), "Rickard"). This Specification will evaluate to true if the entity being tested has the name property set to "Rickard". By basing expressions on the Specification pattern it is possible to add custom operators by simply implementing such an interface and providing as part of the where-clause.

Extraction is done by specifying what part of the filtered dataset you want to return. I have not yet implemented this in Qi4j, so have no clue right now as to how to do this effectively and generically. It will come :-)

Sorting and paging are fairly straightforward operations, and can be done fairly generically independent of the underlying query engine. The main thing here is that, as outlined above, these are usually concerns of the UI rather than the domain model, so the domain or application layer might return query objects to the UI which provides this information before executing the query, rather than having the domain model execute the query.

The final step is to provide a projection, which is directly dependent on what specific data the client wants to get. This should be done as closely to the UI as possible, so that unnecessary data is not loaded. When using query objects from a REST API, make sure to use a REST mediatype that can perform such selection. So instead of returning plain XML documents, use something like the Google Data Table API, where the client can specify exactly what it wants.

Apart from the query builder DSL itself it is also important to consider the execution model. If the query is in the end represented as an Iterable, then there will be no hooks for doing resource cleanup, and so either a timeout or a full read of the query results is needed. The alternative is to offer a scoped access pattern, where it is possible to know when the reading of the results is done.

And that concludes some of the thoughts we had on the topic of Query DSL's here at the JSS in Crete, 2011.

Recommendations:

(as above)