Friday, August 17, 2012

What makes things interesting - Dates

This will be the first in a series of blogs about some of the issues we run into when we are developing OpenELIS.

Before you read any further think of three different ways you can write the date of July 25th, 2012 for a French speaking audience.

Just to show off I'll give you four ways:

25 juil 2012
25 juillet 2012

These are all understandable by French speakers, or English speakers who know a little French (C'est moi). The problem is to make them understandable to OpenELIS.

We had a recent bug which was caused by a change in the date format of one of the analyzers we support and the fix was to have OpenELIS understand both the original format (25-07-2012) and the new format (25 juil 2012).

The programming language we use for OpenELIS, Java, has very good support for figuring out what the date is but you have to tell it what the format is first.  For instance, for the original format you would tell it that the format is the day followed by a dash followed by the month followed by a dash followed by a four digit year.  If you gave it 25/7/2012 it would not know what to do.

The format for 25-07-2012 is day-month-year
The format for 25 juil 2012 is day abbreviated month year

The first job is to figure out which date format has been sent by the analyzer, there are a couple of different ways to do this but the easiest is just to see if the date has a '-' in it, if it does then it is the original or if it does not then it is the new format.

That was easy.  But it still doesn't work for the new format.

We know which format to use but Java still can't understand '25 juil 2012'.  The reason is that Java thinks that 'juil.' is the French abbreviation for July, not 'juil'.  The extra '.' makes all the difference in the world.  That's not hard to fix, we can just add a '.' to the month and now Java is happy.  It will be able to handle the date which was causing the problem and everything should be good.  Until next March, the abbreviation for that month is 'mars', with no '.' at the end.

Fortunately that is not a hard fix and we didn't have to wait until next March to discover that only some months have a '.' at the end of the abbreviation.

Dates are very hard to get right, it is one of those things that are relatively easy for humans and very hard for computers.

After all we all know 05/10/2012 means, except that I'm not going to tell you if it was for a French speaker or an English speaker.


No comments:

Post a Comment