Basic QC Practices
Is QC Quality Compromised?
Non-commutable controls, matrix effects, non-harmonized methods, consensus means, artificially wide manufacturer ranges, repeated controls, lot-to-lot variation. How many ways can the implementation of quality control can be compromised? The answer might depress you.
Is QC Quality Compromised?
How our QC techniques are deviating from best practices
Sten Westgard, MS
November 2011
Recently, we were sent a set of real-world data showing a summary of QC results for an instrument. The comment was made (and I’m paraphrasing here) that, “We’re seeing bias at the low end of the method. We’re thinking of switching controls.”
It struck us that while this statement reveals something about the specific laboratory, it really tells us more about the entire laboratory marketplace. The idea of QC is that when you have an out-of-control flag from a control, that means something is wrong with the method. In today’s laboratory, too often, an out-of-control flag seems to be taken as a sign that something is wrong with the control material instead.
While we here at Westgard QC often focus on the details of improper statistical quality control (for instance, false rejection rates and control limits), these are not the only factors that contribute to problems. Control materials with matrix effects, EQA/PT consensus-based means, reagent lot differences, all of these add to the difficulty of maintaining an effective quality management system.
The constant production and cost pressures of the laboratory have stressed and strained the practice of QC in many ways. It might be useful to review how we have unwittingly compromised the practice of quality control.
- Our out-of-control result is compromised (part 1).
Because of matrix effects, the control material doesn’t behave like a true patient specimen. Thus, when the control is “out”, we can tell ourselves that it’s not reflective of the real status of the method.
If we were using a commutable control, which behaved like a patient specimen, we couldn’t use that excuse. The control behavior would be indicative of a problem that would also impact the patient specimen. So if a control was out, that would mean the patient values were also being affected. - Our out-of-control result is compromised (part 2).
Because our limits are set improperly (possibly at 2 SD), we’re getting too many false rejections. So if we simply repeat the control, or repeat, repeat, and repeat the control, or possibly even run a new control, we’ll finally get one result to fall “in” and then we can ignore any and all previous “out-of-control” flags.
If our limits are set correctly, they don’t generate excessive false rejection. It’s the high false rejection rate that leads to the practice of repeating the control. If we were only alerted when there was a real problem, we wouldn’t have the impulse to repeat the control. Instead, we would actually trouble-shoot the method.On sheer statistical grounds, having control limits set at 2 SD generates a high false rejection rate (expect 9% false rejections with 2 controls, 2 SD control limits). When false rejections reach that level of frequency, we stop listening to the system and we begin to treat every out-of-control flag as if it was a false alarm. This leads to the Cry Wolf problem - just when we need an effective QC system the most (during a true out-of-control situation), we’re likely to ignore it. - Our PT/EQA result is compromised (part 1).
When we participate in a proficiency testing or external quality assessment program, we may receive results that indicate we have a significant bias from the mean of the group. However, if the sample is not commutable – if there is a matrix effect – we can always tell ourselves that the problem with the result doesn’t come from our laboratory, but from the matrix of the sample.
If our PT/EQA samples were commutable, we couldn’t use that excuse. A commutable PT specimen would mean that a problem with the specimen would also be reflected in a problem with the patient. So a PT failure would mean something was also happening to the patient specimens as well. Note that this is similar to the first corruption – when we use a non-commutable control for IQC. - Our PT/EQA result is compromised (part 2).
If the PT/EQA mean is only “consensus-based”, then it’s possible that the entire group could be wrong. So we can tell ourselves that any bias we see might be because we’re right, but the group is wrong. Since many methods are neither standardized nor harmonized, there is no real agreement on what the “right” value should be.
If we were participating in an “accuracy-based” PT/EQA program, and our samples were Traceable, we couldn’t use that excuse. A consensus-based PT/EQA program doesn’t provide us with a “true value” for the specimen. An accuracy-based PT/EQA program would provide a “true value” – possibly because the specimen would be assayed by a reference method and given a value assignment. - Our mean is compromised.
If we have multiple instruments and we “adjust” them all onto the same mean, for simplicity’s sake, we sweep the bias between instruments under the rug. While at least one instrument is being monitored at its actual mean, the others are all biased to some extent from that mean. The problem is that the extent of that bias could be significant and it could be impacting test results in a significant way. But if we’re not monitoring it, we’re ignoring it.
If we calculate the actual mean of each instrument, we can assess the actual bias between instruments and make an assessment about the significance of that bias. There is a general consensus on how much bias is acceptable, between one-third to one-fourth of the allowable error (quality requirement).
This same compromise can occur across multiple laboratories, if there is significant bias between the labs. The drive to set the same mean for all the laboratories in a system is strong, so small biases can become invisible, when we reassign the mean to instruments in order to keep them “the same.” - Our “range” is compromised (part 1).
If we use the manufacturer’s suggested range, sometimes called bottle limits, our control limits are probably set too wide. The manufacturer range is not intended to be used to set the routine control limits for an individual laboratory – it’s meant to give an idea of a range within which you expect your mean to fall. The upside is that if we compromise our QC system this way, we may not have as many “repeat” problems with our control results (see earlier compromises 3 and 4). The downside is that we are probably not detecting medically important errors. Instead, those errors are getting into the test results and being passed out to the clinicians and patients.
If we established our own mean and standard deviation, then followed up with appropriate QC design, we would minimize our false rejection while maximizing our error detection. That is, we could optimize the system so that when an out-of-control flag occurs, we will act as if it was a real problem and trouble-shoot appropriately. Ideally, this type of QC design also minimizes the number of error signals that occur. - Our “range” is compromised (part 2).
When we have reagent lots with significant differences, the range of the test will change. A new lot of reagent may shift all values up or down. So an out-of-control flag can be blamed on the reagent, rather than a true problem with the method. Note that we may not have this problem if we suffer from the earlier compromise (#6, using manufacturer ranges). Indeed, the manufacturer’s range may help us “avoid” problems caused by shifts between reagent lots. So rather than adjust the mean and range constantly, labs might succumb to the temptation to adopt either the manufacturer’s range or find another rationalization to widen the range so that the differences between lots do not cause additional flags.
If we had reagents that had minimal between-lot differences, this problem would not afflict us. Unfortunately, a recent article in Clinical Chemistry noted that there are significant biases between lots even for well-established tests. Here is a case where the laboratory is really at the mercy of the manufacturer. If the manufacturer cuts corners on reagent production – and allows significant biases to ship out the door – the laboratory has little in the way of recourse. The best solution is to demand better lot-to-lot performance from the manufacturer (and do it frequently and consistently, so that diagnostic market as a whole gets the message).
This list is certainly not comprehensive. I’m sure you can think of other areas of laboratory testing where quality might become compromised. But the net effect of all these “standard deviations” from good laboratory practice is that our confidence in our own QC system is corroded and corrupted. Our ability to deliver good patient care is compromised. If we accept all of these compromises, what we’re doing isn’t really QC anymore. It’s just a compliance exercise. Call it “QC Theater” – an act that looks like QC but is really only a work of fiction.
If our laboratory suffers from some or most of the above compromises, there is a good chance that our QC results are not providing useful indications of the real state of the method. We’ve got multiple reasons why a control could be “out” while the method is actually performing normally and acceptably. Worse still, there’s a very real possibility that the controls could behave “normally” while significant problems could be occurring on real patient specimens – but we’ve compromised our QC procedure and system so that they can no longer detect medically important errors.
There’s a common thread to the compromises in our QC. In a time-pressed, cost-sensitive environment, the temptation to do QC “on the cheap” is strong. The cost of QC is something you experience every day – while the failure cost of poor QC practices may only occur infrequently, and the laboratory may not experience the effects of failure (the clinician and patient bear the brunt). It’s human nature to get complacent when you don’t experience errors frequently, particularly when budgets are tight, staff resources are strained, and the overwhelming message from the top is, “Get us our numbers NOW!”
How did all of these problems accumulate? It’s part of an overall “drift” of QC practices. Sydney Dekker, an authority on patient safety and safety culture in general, notes that
“Drift is generated by normal processes of reconciling differential pressures on an organization (efficiency, capacity utilization, safety) against a background of uncertain technology and imperfect knowledge. Drift is about decrementalism contributing to extraordinary events, about the transformation of pressures of scarcity and competition into organizational mandates, and about the normalization of signals of danger so that organizational goals and supposedly normal assessments and decisions become aligned. In safe systems, the very processes that normally guarantee safety and generate organizational success, can also be responsible for organizational demise. The same complex, intertwined socio-technical life that surrounds the operation of successful technology, is to a large extent responsible for its potential failure.”
[Sydney Dekker, Drift Into Failure, Ashgate Press, Surrey, UK, 2011, page 121.
In other words, after decades of cost and staffing pressure on the laboratory, we have gradually compromised our QC practices so that they align more closely with the production goals - to generate test results faster and cheaper. Sometimes we make the decision consciously, as when we choose cheaper PT programs or controls, even when we know the samples are not commutable. Other times, we make the decision unconsciously, as when we make instrument decisions without making assay performance a major factor in the purchasing criteria. When we just assume that all instruments have high quality, what we actually are doing is sending the message that quality isn't important to us. We’re creating a QC system that’s going to fool us.
Compromise between quality and production goals is unavoidable. There will always be a give and take between what’s cheaper and what’s safer. But a laboratory needs to try and balance the two priorities. Erring on the side of production efficiency in all cases, by compromising the SD, the control limits and ranges, its PT/EQA program, and the control material, a laboratory can build a completely compliant and wholly ineffective quality system. Undoubtedly, some compromises are inevitable - but the laboratory needs to make sure not every part of the QC system errs on the side of efficiency.
Reviewing all these possible problems can be discouraging. Indeed, you can understand some of the motivation behind laboratories and professionals who are in favor of reducing QC frequency. If QC is so corrupted, why not reduce its frequency and impact on the laboratory? If it’s so bad, why not get rid of it completely and come up with some alternative instead? This would be even more tempting if a viable alternative actually existed (so far, however, AQC and EQC and even the proposed qualitative Risk Assessment methodology do not provide equivalent quality, despite all the promises and hype.)
We can’t afford to give up on QC, any more than we can give up on seat belts and air bags and other critical elements of automobile safety in the face of chronic accidents on our streets and highways. Instead, we’ve got to fix our QC, refusing to compromise on quality when practical, and optimizing and improving our practices when possible. We will still have to compromise sometimes, but we’ve got to make sure that we’re not compromising every element of safety in the pursuit of cheap numbers.