Calibration Out of Tolerance – Part 2

Written by Heikki Laurila | Feb 02, 2017

In this post I continue on the topic of calibration being “Out of Tolerance” (OoT) or “Failed”.

“Out of Tolerance” (OoT) means that during a calibration some calibration point(s) failed to meet the required tolerance level. This causes the result of the calibration to be out of tolerance, or as often also referred to, that it was a failed calibration.

In the earlier blog post I already covered some of the items related to this subject, they were:

What does “Out of Tolerance “ mean?
What is the tolerance level used?
How was the calibration found to be out of tolerance?
How critical is it?

If you missed the first part, please check the blog post here.
I have also made a white paper on this subject and you can download it from the below link.

But now, let’s continue with the remaining topics.

When did it happen?

An out of tolerance situation is naturally noticed during a calibration, but this is anyhow not the moment when the instrument went out of tolerance, or started to measure incorrectly. But when did it happen then? It is important to determine when it happened, because any measurements done after that moment are suspect. In case of a critical process measurement that failed, any products produced after that moment are effected and may need to be recalled, in worst cases.

It is not an easy task to determine the moment when the instrument went out of tolerance. By checking the previous calibration data, and confirming that the instrument was then left in acceptable condition, is a place you can start from. However, if there are no records between the previous good calibration and the new failed calibration, you need to question everything done in between. You can study the measurement results and any relevant data in between the OoT and the previous good calibration to see if there is anything that would indicate when the instrument drifted out of specification. This may be, for example, a sudden raise in the reported changes or issues in process, or in case of a calibrator, a time period when more failed calibrations started to appear. You may also analyze the history of that instrument to see if there is an indication of any typical drift for that instrument, and possibly interpolate the data to find the most likely moment when it went out of tolerance. It can anyway be very difficult to determine the actual moment when an instrument failed to meet its tolerance. It may be that there is no option but to assume that all calibrations done after the previous successful calibration are effected and suspect to be failed.

Impact analysis - what are the consequences?

Once we know that the out of tolerance really happened and we have analyzed how much it was and have an idea when it had occurred, the next step is to evaluate the impact. You need to find out where this failed instrument has been used and what measurements are suspect.

In the case of a process transmitter installed to a location, it is obvious where it has been used, but in case of portable measuring equipment, or a portable calibrator, it is different situation. One powerful option available in some calibration management program is a “reverse traceability” report. This kind of report lists all the calibrations where a specific instrument has been used, over a certain time period. This report is most helpful when you need to analyze, for example, where a portable calibrator has been used. If you do not have an automated reverse traceability report and need to manually go through calibration reports to see where that certain calibrator was used, it may take many man hours to complete. However you do it, it needs to be done.

In the case of a process instrument being out of tolerance, you need to have your process specialist analyze what the impact of this failure is for the actual process and to your end product. In best case scenario, if the effect to the process measurement was so small, it will not cause any significant damage. However, in worst case, if the analysis tells you that the effects to the process, and to the products being produced, are so big that the products produced do not meet their specifications, then costs can be huge. In many processes, the quality of the end product cannot be simply tested in the final product, but the process conditions must be correct during the manufacturing process. If this for example involves food/medicine or the heat treatment process of critical aerospace/automobile parts, then you are obligated to inform your clients/customers, or even withdraw products form market. Product withdrawal is a dramatic consequence; it will get you into the news, it will be very expensive, and it will have a negative effect to your company brand, reputation and stock value.

In the case of a process calibrator that fails to meet its tolerance, you will need to evaluate how much the failure had effect to all the measurements made with in since its last known good calibration. Many times, the calibrator is significantly more accurate than the process instruments calibrated by it, so there is some safety margin. In the best case scenario, even if the calibrator failed recalibration, the failure can be so small that it does not have significant effect to the calibrations that have been done with it. But in the worst case, if all of the calibration work that has been done with that calibrator are suspect, then you need to analyze the effect for each process measurement that has been calibrated. As previously mentioned, this can be a really big task as you need to do the analysis for all the process measurements being effected.

Quality assurance considerations

You may have heard your quality professionals talking about CaPa, being an abbreviation of Corrective Actions and Preventive Actions. This is something that is stipulated by most quality standards, such as the very common ISO/ IEC 9001 quality standard as well as ISO/IEC 17025 used in accredited calibration laboratories. Corrective actions are obviously the actions you take to correct the situation, while preventive actions are all the actions you take in order to prevent the same situation from happening again in the future. The effectiveness of corrective and preventive actions is important to review. Also, all other similar instances should be considered to see if there is any possibility for similar occurrences elsewhere. Quality standards also require that these processes are documented and that responsibilities are specified.

A root cause analysis is typically required by quality standards to find out what caused an OoT to occur. A risk analysis, or generally, risk-based thinking, is something required by the modern quality system standards. Continuous improvement is also a common quality requirement to ensure that you continuously improve your quality system and learn from any mistakes, so that problems do not happen again.

Many companies, especially in regulated industries, are using some form of a “deviation management software system” where all OoT calibration cases are recorded in order to control and document the process of handling these cases.

Summary

Summarizing the key points of these two blog posts and the related white paper, if you get an out of tolerance calibration, you need to do the following:

Verify what tolerance level was used and that it is a correct level.
Verify the uncertainty used in making any decisions that a measurement is out of tolerance and that the uncertainty is appropriate.
How critical is this out of tolerance observation?
Where in the traceability chain did this occur?
When did it occur?
Make an impact analysis to find out what the consequences are.
Perform relevant quality assurance considerations.

View full post