OpenELIS Global: August 2012

Friday, August 17, 2012

What makes things interesting - Dates

This will be the first in a series of blogs about some of the issues we run into when we are developing OpenELIS.

Before you read any further think of three different ways you can write the date of July 25th, 2012 for a French speaking audience.

Just to show off I'll give you four ways:

25-07-2012
25/7/2012
25 juil 2012
25 juillet 2012

These are all understandable by French speakers, or English speakers who know a little French (C'est moi). The problem is to make them understandable to OpenELIS.

We had a recent bug which was caused by a change in the date format of one of the analyzers we support and the fix was to have OpenELIS understand both the original format (25-07-2012) and the new format (25 juil 2012).

The programming language we use for OpenELIS, Java, has very good support for figuring out what the date is but you have to tell it what the format is first. For instance, for the original format you would tell it that the format is the day followed by a dash followed by the month followed by a dash followed by a four digit year. If you gave it 25/7/2012 it would not know what to do.

The format for 25-07-2012 is day-month-year
The format for 25 juil 2012 is day abbreviated month year

The first job is to figure out which date format has been sent by the analyzer, there are a couple of different ways to do this but the easiest is just to see if the date has a '-' in it, if it does then it is the original or if it does not then it is the new format.

That was easy. But it still doesn't work for the new format.

We know which format to use but Java still can't understand '25 juil 2012'. The reason is that Java thinks that 'juil.' is the French abbreviation for July, not 'juil'. The extra '.' makes all the difference in the world. That's not hard to fix, we can just add a '.' to the month and now Java is happy. It will be able to handle the date which was causing the problem and everything should be good. Until next March, the abbreviation for that month is 'mars', with no '.' at the end.

Fortunately that is not a hard fix and we didn't have to wait until next March to discover that only some months have a '.' at the end of the abbreviation.

Dates are very hard to get right, it is one of those things that are relatively easy for humans and very hard for computers.

After all we all know 05/10/2012 means, except that I'm not going to tell you if it was for a French speaker or an English speaker.

Thursday, August 16, 2012

OpenELIS Global release 2.7.1

The latest OpenELIS release is now available for installation. This release is version 2.7.1 (build 3200) and will most directly affect the Retro-CI implementation. The release addresses the following priorities and issues:

Features and Updates

Updating the name in the menus of the Cobas Taqman DNA PCR and Viral Load analyzers

Creating the ability for lab techs to choose not to import test results, thus not creating a test request if it was run unnecessarily

Notifying the user when the analyzer import is not saved correctly

Improving the user interface for error messages to give users more information about what error has occurred and how to get help to solve the error

Functional prototype of the Audit Trail feature to allow administrators to view step by step status changes to each lab order, sample, and test request (See the blog post here)

New role to set explicit permissions to the Audit Trail feature

Bug Fixes

Two issues causing the system to crash (resulting in many “It’s not your fault” error screens being displayed). These isssues and solution are documented here.

Correcting the lab no. LART32667 to not display continuously in the analyzer import screen

Correcting the transfer of lab results from the Cobas Taqman DNA PCR

Fixed Fascaibur import bug to address the need to manually enter CD4 result.

Fixed bug that prevented the non-conformity report by lab section and reason from generating.

Please send us any feedback you may have about the release. We will follow up soon to outline our priorities for the next release.

Wednesday, August 15, 2012

Improving Error Messages to the User

Jan talked about the work we've been doing to better isolate the causes of the "Grey Screen of Death" — the error page that appears when there's a problem with the OpenELIS system. She mentioned the frustration with the vague "it's not your fault" language that appeared on the GSOD page.

Old "GSOD" error page in French:

Old "GSOD" error page in English:

The original intent was to assure users, particularly those without a lot of experience with such systems, that they didn't "cause" this problem hence the "it's not your fault" language.

While that sounds good in theory, we learned that what's most important is to figure out what went wrong and get it fixed. By providing more context when someone sees a "GSOD" error, we can get better feedback on what happened. This can make the process of auditing what went wrong much easier for the on-site system administrator and, ultimately, the development team.

To that end, we've redesigned the "GSOD" page to grab details about the user's browser, path through the system, etc. We also tried to include clear action steps that are general enough to make sense at any lab that uses OpenELIS. We also made it a little more appealing to the eyes:

First off, you'll see that the error message now keeps the OpenELIS "header"—that section at the top of the page with the logo and the blue background. Compared to the old version that just had the gray error box on a blank page, this version may be a little less jarring to the user.

The top section lists steps that the user should take to help report this problem. At the bottom is system information that can easily be copy and pasted into an email or printed. Not shown in this screenshot is the listing for the previous page the user was on before they saw this error—this can be particularly helpful in understanding what happened.

This is a start, but we're doing more work to make the (hopefully infrequent) encounter with system errors more efficient. We'll be adding a separate 404 "Page not found" error message that will have specific reporting instructions. We're also thinking of what, if any, part of the server error log could be included. While that might be helpful, security and privacy remain paramount - we don't want to expose any details that could be used to attack the system if an unauthorized user gained access.

If you have any questions or comments, let us know. And if you think of a better name for this new friendlier, more helpful page than "Grey Screen of Death," we'd be happy to using a term that's a little less morbid.

Friday, August 10, 2012

The Grey Screen of Death

In the previous post I mentioned that our team is focused on fixing a long-standing ever-elusive bug that has consistently been very disruptive to end users (users only see the message "It's not your fault"). One of our implementations is consistently seeing an increase in this message that can only be resolved by stopping the web application and restarting it. Due to the ability of this bug to completely disrupt the entire lab, and up to this point our inability to figure out what was causing this bug, we gave this bug the name of "the grey screen of death" - alluding to a common bug that computer users see when their computer system crashes and all they see is a "blue screen of death" and must restart their systems.

Current Grey Screen of Death message (in french):

The grey screen of death has been so elusive that up until now our troubleshooting process hasn't been able to determine what actually causes it - asking users what steps they took, looking at the logs, attempting to replicate the bug - none of it made sense or helped us solve it. All we knew was that this error was supposed to appear when there was an uncaught exception in the system (for the technical folks - we put this in place so stack traces never printed to the browser). Theoretically, the user should have never seen it. Unfortunately, in our development priorities, we never found time to improve our error pages to give the user more information to help us help them.

This week we finally found some clues and have put in fixes for what we think is causing the problem. First, we figured out that this error appears for both uncaught exceptions (system errors) and 404 page errors. So we've separated those out to be two different error screens.

For the uncaught exceptions, the problem appears to come from two different workflows:

Lab test results are automatically imported from an analyzer after a sample was marked non-conforming. In other words, no sample entry was performed in the system but a non-conformity event was entered for the sample. When the results are saved, the user is told that it is successful, but it wasn't. The analyzer result reappeared on the screen.
Prior to sample entry the analyzer imports of two different sample types are accepted. When the 2nd import is saved, it causes the grey screen of death. This particular workflow puts the application into an unstable state that causes all of the users in the system to experience the grey screen of death until the system is restarted. For the technical folks - This is due to Hibernate not knowing what to do with a transient object that is created when it tries to flush the cache an exception is thrown and the object remains in the cache. The system is unable to remove it from the cache so every Hibernate action continues to cause exceptions.

The reason that it has been so difficult to troubleshoot is that the logs showed us events well after the cause because the whole application became unstable.

Along with determining the cause of the bug, our users adamantly told us that they needed better feedback from the screen when they receive any type of error. Simply stating "It's not your fault" only frustrated them because they wanted to know what was happening to the system and how to get help to solve it. So we spent time improving this. I will talk about those improvements in the next post.

Thursday, August 9, 2012

Viewable Audit Trail

In our current development iteration towards an unscheduled release of OpenELIS 2.7.1 and scheduled release 2.8, we have focused development over the last couple of weeks on two very important scopes of work:

Creating a viewable audit trail of the order and sample/specimen events
Fixing a long-standing ever-elusive bug that has consistently been very disruptive to end users (users only see the message "It's not your fault")

Viewable Audit Trail
We've known for awhile that it would be helpful for administrators and users to be able to view all of the events that have occurred for patient samples. In January we started talking with our colleagues in Cote d'Ivoire about how this feature would assist them in their troubleshooting any issues that cropped up during the use of the software. After our most recent trip last month, it was clear that this was the most important next feature to add to OpenELIS. Many times users will report a problem with the software, but the system support team has no way of determining whether this was actually a system error that caused the problem, or if the state of the order/sample/test is actually different than the user expected because of the way that the user has interacted with the system. We were told by users in the lab that these types of things are causing the majority of their work, so we hope that this feature will provide a big improvement to reducing this workload.

First phase of development for 2.7.1
The first phase of development includes the following:

Researching all of the events that should occur and making certain that they are being added to the log in the database. Events for orders, samples, and tests should include the following

Order creation (Sample Entry)
Patient creation (Patient Entry)*
Any changes to the patient information (demographics, medical history, etc.) *
Status changes for an order (created, testing started, testing finished)
Adding or removing samples from an order *
Adding or canceling test requests from a sample in an order *
Status changes for a test request (not started, test canceled, technical acceptance/rejection, biologist acceptance/rejection, result finalized)
Nonconformity events for orders or for samples
Addition of notes *
Referrals to other laboratories and the receipt of those results *
Patient reports generation

Developing the code to access all of the events currently being captured
Developing the code to group these events into a "trail" that would make sense to a user
Developing the basic user interface to view the audit trail

[* indicates those events that are not currently being tracked and/or displayed in the audit trail for this release.]

What we found -

We unfortunately found that not all of the events were being captured! So we are fixing this.
For some of the update events, only the old values were being captured - not the new updated value. Not very helpful to have only the old values, so we are fixing this for those events already being recorded. We won't be able to reconstruct the data for previous events, but in going forward this will work correctly.
We discovered we need to have a role available for system administrators to be able to configure user permissions to view the audit trail. Each installation may not want all users to be able to view this audit trail. We implemented and simply called this new permission 'Audit Trail'.

Current view today on our development server:

Next phase of development for 2.8

Coding for the events that are not currently being capture or displayed. Those events in the list above with the asterisk (*).
Improve the user interface to make the trail more readable to the user, including, making the table sortable by the user and filtering for types of events
Possibly some of the feedback we receive from the first users of 2.7.1 (Retro-CI!)

Future development for X.X?
Eventually we would want to implement as part of the QA dashboard and include reports that might indicate potential issues in the events. Secondly we want to display the tracking data exchanges between systems, such as, aggregate reports between clinical and reference laboratories, patient demographic queries to a master patient index or other system, electronic results reporting, etc.

But that's in the future!