Wednesday, 11 April 2012

On the Programatic Handling of Approximate Dates

Despite more and more activities taking place in the digital world getting precise date stamps, events with dates of varying degrees of accuracy remain common place. For instance Historical events, book releases or when you had tea with Jack last Tuesday which can only be pinned down to a an approximate time such as a year, a month or a particular afternoon.

Obviously, any new digital data we create gets a marvelously precise timestamp, nevertheless however digital our lives become, life itself remains very imprecise. We will continue to create memories and experience things that will not get digital timestamps, but that we may yet wish to record digitally.

It is tempting to call the dates we may assign to such events fuzzy dates, as in fuzzy logic, however the term has been used to mean relative dating, e.g. 2 days ago. So approximate dates will have to do.

The question is how do we store such approximate dates digitally and in such a way the one can perform meaningful queries on the data. An approximate date type is not among the many data types provided by most database systems for storing and querying data.

The trick to designing an approximate date datatype is the realization that however approximate a date or time, you can easily choose upper and lower bounds for it; thereby defining an interval. It should be noted that a precise date or time can be modeled as the interval containing just that date or time.

Records hold two entires for the start and end dates of the range (for precise dates the start and end dates are equal and the range contains just the one point in time).

We can safely assume that that start is always less than or equal to end. Assume a table t with start and end columns, and a range from a to b.

Records that are contained within a date range can be queried using the intersection of records ending before the end of the search range and records starting after the beginning of the search range. Or more simply:

SELECT * FROM t WHERE t.start >= a AND t.end <= b

One might be tempted, given the above, to say that records that intersect with a date range can be found by using the union of records having ended after the start of the search range and those having started before the end of the search range:

SELECT * FROM t WHERE t.start >= a OR t.end <= b

This is wrong. Given the reasonable inequalities between a and b; and between the start date and end date of a record, the above is actually a tautology. This becomes obvious if we assume, without loss of generality, that a = b.

The correct solution is the intersection of events ending after a and starting before b.

SELECT * FROM t WHERE t.end >= a AND t.start <= b