PriDE Design Patterns: Caching
A strong measure to speed up persistent data access is the introduction
of a cache for repeatedly queried entities. The only basic requirement is
a unique identification of all entities of interest as it is defined by primary
key or at least unique constraints in the database. PriDE does not provide
any caching functionality by itself, so the following explainations must
be understood as general remarks on that issue.
As mentioned already in the patterns
for separation of database and business concerns, it is generally recommended
for larger applications to follow a layered architectural model. By definition
of a persistence layer, the database access can be encapsulated in the implementation
of storage facades or data access objects. For the well-known
class Customer this may look like this:
class CustomerStore {
void createCustomer(Customer customer) throws
SQLException {
customer.create();
}
Customer getCustomer(int id) throws SQLException
{
return new Customer(id);
}
// and so forth
}
|
Under this precondition, the implementation can easily by extended by caching
functionality as it is demonstrated in the most simple form in the following
example:
class CustomerStore {
private Map customers = new HashMap();
void createCustomer(Customer customer) throws
SQLException {
customer.create();
customers.put(new
Integer(customer.getId()), customer);
}
Customer getCustomer(int id) throws SQLException
{
Customer c = (Customer)customers.get(new
Integer(id));
if (c == null) {
c = new Customer(id);
customers.put(new Integer(id), c);
}
return c;
}
// and so forth
}
|
On introduction of caching functionality there are some important issues
to be kept in mind, which make this feature not as generally reasonable
as it is sold in many commercial O/R mapping toolkits:
- Synchronicity of caches
Although storage facades bundle the database access of one application,
there may still be other applications operating on the same database by different
means (e.g. batch update procedures). Caches therefore tend to be out of
date and the application must perform updates in a reasonable time frame
which is suitable for the modification rate of the data. Moreover it may
depend on the application context wether particular data must exactly reflect
the current database state or if a slight difference is acceptable. As a
rule of thumb, caching may be reasonable for selected master data which itself
is not the focus of the current processing. I.e. caching should be introduced
function-based rather than entity-based and must be understood as an individual
algorithmic and temporary optimization. In best case, the caches dont keep
their state accross method boundaries which in turn makes them as easy to
manage as demonstrated above. As soon as the end of a processing frame is
reached, the caches are immediately flushed to get rid of the potentially
outdated data.
- Complex queries
A database query of course does not always follow the simple scheme of
fetching a single record based on a unique combination of attributes. Thus,
complex queries often have to by-pass the caches assuming that they don't
support the full range of expression variety provided by SQL. On the other
hand it turns out in practice that only very few queries have a complex,
dynamic structure. The interfaces of storage facades give a good overview
about what is actually required by the business logic.
- Scope
If there are multiple threads working concurrently in an application, the
scope of validity for the caches must be kept in mind. The main problem
in this context is not a difference of validity for different threads but
the risk of uncontrolled flushing, minimizing the intended optimization
effect. If possible, it is recommended to use read-caches only which can
be managed locally for every processing function without the risk of interference
between concurrent threads.
The source code of the example above is available under examples/caching.