Current Grey Screen of Death message (in french):
The grey screen of death has been so elusive that up until now our troubleshooting process hasn't been able to determine what actually causes it - asking users what steps they took, looking at the logs, attempting to replicate the bug - none of it made sense or helped us solve it. All we knew was that this error was supposed to appear when there was an uncaught exception in the system (for the technical folks - we put this in place so stack traces never printed to the browser). Theoretically, the user should have never seen it. Unfortunately, in our development priorities, we never found time to improve our error pages to give the user more information to help us help them.
This week we finally found some clues and have put in fixes for what we think is causing the problem. First, we figured out that this error appears for both uncaught exceptions (system errors) and 404 page errors. So we've separated those out to be two different error screens.
For the uncaught exceptions, the problem appears to come from two different workflows:
- Lab test results are automatically imported from an analyzer after a sample was marked non-conforming. In other words, no sample entry was performed in the system but a non-conformity event was entered for the sample. When the results are saved, the user is told that it is successful, but it wasn't. The analyzer result reappeared on the screen.
- Prior to sample entry the analyzer imports of two different sample types are accepted. When the 2nd import is saved, it causes the grey screen of death. This particular workflow puts the application into an unstable state that causes all of the users in the system to experience the grey screen of death until the system is restarted. For the technical folks - This is due to Hibernate not knowing what to do with a transient object that is created when it tries to flush the cache an exception is thrown and the object remains in the cache. The system is unable to remove it from the cache so every Hibernate action continues to cause exceptions.
Along with determining the cause of the bug, our users adamantly told us that they needed better feedback from the screen when they receive any type of error. Simply stating "It's not your fault" only frustrated them because they wanted to know what was happening to the system and how to get help to solve it. So we spent time improving this. I will talk about those improvements in the next post.