Questions
FAQ's of QC Validator 2.0
Some questions on the use of QC Validator - our old QC Design software. The concepts and features carry over to our current QC Design software, EZ Rules 3.
This question comes from the State Laboratory of Hygiene, Madison, WI:
A new user of the QC Validator program poses a series of questions about the input parameters and what they mean. We commonly hear these same questions from many new users, so we will try to provide answers to several of them this month. If these answers are too brief, we can provide more detailed answers later on. You may also find other materials on this website that help answer some of these same questions.
Decision levels
Imprecision and Inaccuracy
- Should the stable imprecision observed be matched to the decision level defined or should you use the average of the imprecision observed for the different control levels?
- Should the stable inaccuracy observed be matched to the decision level defined or should you use the average over the analytical range of interest?
Instability & the Frequency of Errors
Quality Requirements
Power Function Graphs, Critical-Error Graphs, and OPSpecs Charts
- Which graph is actually used to choose the control rule(s) and number of control measurements (N) ?
- Power function graphs?
- Critical-Error graphs?
- OPSpecs charts?
Choosing a QC control rule
How do you choose a decision level?
"Decision level", as used in the input parameters screen of QC Validator, refers to a critical concentration at which medical decisions will be made.
Some examples are:
- glucose concentration of 110 mg/dL which is the upper limit of the reference range and, if exceeded, often triggers additional testing;
- cholesterol concentration of 200 mg/dL which is recommended by the National Cholesterol Education Program (NCEP) as a concentration below which a patient test value represents a state of health;
- cholesterol concentration of 240 mg/dL, which is recommended by NCEP as a concentration above which follow up testing should be performed;
- hemoglobin concentration of 9 g/dL which represents a critical lower concentration for test interpretation;
- digoxin concentration of 2.0 ug/mL which represents the upper limit of the therapeutic range.
Choosing a decision level is important to focus your thinking on method performance characteristics (observed precision and bias) and the quality requirement (analytical total error or clinical decision interval) that are appropriate for the application and interpretation of the test. These other parameters are critical to the selection of appropriate QC procedures.
What do you do if there is more than one decision level?
There often are two or three decision levels for a test because of different medical applications, interpretations, or classifications. Multiple levels may also be needed because decision levels can change with the sex and age of the patient, depend on gestational age, etc.
We can initially look at the decision level which is most critical for the medical use of the test or the available performance of the test. For example, for glucose where decision levels might be at 50 mg/dL, 110 mg/dL, and 300 mg/dL, the lowest level may be most demanding in terms of the performance of the method. If we can design an appropriate QC procedure based on the quality required and performance observed at this level, the QC procedure selected will most likely be satisfactory at other decision levels. If there is any doubt, we can make further assessments for each of the other decision levels.
For methods where performance often changes significantly from low to high concentrations, such as immunoassay methods, you may want to routinely make assessments at the low and high decision levels, and then select the QC procedure based on the most demanding decision level.
Should the stable imprecision observed be matched to the decision level defined or should you use the average of the imprecision observed for the different control levels?
For methods where the CV is observed to be approximately constant for the different control levels, you could use the average CV. For methods where the CV's are considerably different for the different control levels, it would be better to use the CV for the control that is closest to the critical decision level.
This requires some judgment on your part for the tests of interest and the measurement systems in your laboratory. You can expect that for many of the highly automated systems in chemistry and hematology, the average CV's would often be okay. For manual methods and earlier generations of automated systems, you may find that the CV's will vary considerably from level to level, so you will have to carefully determine the best estimates of method performance at the concentrations that are most important for the medical use of the test.
Should the stable inaccuracy observed be matched to the decision level defined or should you use the average over the analytical range of interest?
If bias is constant, i.e., the systematic error is constant over the analytical range, then one estimate can be used for any decision level. If bias is not constant, i.e., a proportional systematic error exists, then it is best to estimate the systematic error at the decision level of interest.
Estimates of bias can be made from the linear regression statistics calculated for comparison of methods date. Constant error is indicated by a non-zero y-intercept. Proportional error is indicated by a deviation of the slope from the ideal value of one. The overall bias or systematic error at a decision level is the difference between the expected value by the method (Yc) and the defined decision level (Xc), which can be calculated as follows:
SE = Yc - Xc = a + bXc - Xc
where a is the y-intercept and b is the slope of the regression line. Estimates of systematic error may also be obtained from proficiency testing surveys. The average bias from the different PT samples may be useful, or the bias or biases of those samples near the decision level of interest. Remember that there may be matrix effects from PT materials, so the differences observed may be due to the material or the method.
How is the frequency of errors determined?
Few laboratories actually have any quantitative estimates of the frequency of errors, thus initially you will have to make some judgment on method stability. In version 2.0 of the QC Validator program, the entry for frequency of errors is made with a drop-down list that has four possible settings - off, low (10%). You might determine the appropriate setting from experience with the method or from knowledge of the method's expected susceptibility to problems. From experience, bench level analysts generally know which methods have a lot of problems and which ones seldom have problems. With this information, you can generally classify methods as good (set as low), bad (set as high), and in-between (set as moderate). Considering susceptibility to problems, manual methods would be expected to be more susceptible than automated methods, therefore set manual methods as high. Automated systems such as the highly automated 3rd and 4th generation chemistry analyzers would be expected to be more stable than 1st and 2nd generation immunoassay analyzers, therefore set later generation automated systems as low and earlier generations as moderate. With implementation of QC procedures that have high error detection and low false rejection, the frequency of problems can be estimated by the proportion of runs that are out-of-control.
Is it necessary to run QC Validator for both the analytical and clinical quality requirements?
Two forms of quality requirements may be entered into the program and either can be used for selecting an appropriate QC procedure. If both are available, it is wise to compare the results and satisfy the most demanding requirement.
The analytical quality requirement must be defined in the format of an allowable total error, which is often provided via the criteria for acceptability in proficiency testing, such as the CLIA criteria for acceptability in the USA. For example, the CLIA PT criterion for acceptability for cholesterol is 10%, which means that the laboratory must get a test result that agrees within 10% of the target value (TV). Many other countries have similar proficiency testing or external quality assessment surveys that also have defined quality requirements for a wide variety of tests. Whenever a test is subject to proficiency testing, you would be well advised to assess the implications of this quality requirement on the performance of your method and the QC that is needed.
The clinical quality requirement is defined in the format of a decision interval, or medically important change. This information can be obtained more directly from the clinical use and application of a test. For example, for cholesterol, USA national guidelines from NCEP recommend that a cholesterol test value of 200 mg/dL be considered healthy and that a test value greater than 240 mg/dL requires additional testing to determine appropriate clinical action. This "gray zone" between 200 and 240 mg/dL represents a medically important change because it will change the classification of a patient. The laboratory objective should be to avoid misclassification due to known variables, such as the analytical imprecision and inaccuracy of the method, the sensitivity of the QC procedure, and the preanalytical biological variation. Because all these variables need to be considered, the clinical quality-planning model is more complicated and requires additional inputs, such as the within-subject biological variation which is particularly important in most applications (6.5% in the case of cholesterol).
In situations where PT criteria are readily available, they provide a good starting point for designing QC procedures. There are not many published recommendations on clinical criteria in the format of medically important changes or decision intervals, but you should be able to make good judgments on the basis of your knowledge of the application and interpretation of the test. You will be aided by consultation with physicians who are knowledable about the critical applications of the tests and by the critical pathways or standard medical processes that are being developed in many healthcare institutions.
Which graph is actually used to choose the control rule(s) and number of control measurements (N)?
Three types of graphs are provided by the QC Validator program - power function graphs, critical-error graphs, and OPSpecs charts.
Power function graphs can be used to compare the general performance of different QC procedures, but they don't take into account the quality required or the method performance observed. They will show you the relative performance of different QC procedures and help you understand which have the highest or lowest false rejection and error detection capabilities. However, they don't tell you what specific control rules and numbers of control measurements (N) are sufficient and appropriate for an individual test.
Critical-error graphs can be used to select specific control rules and Ns. They are power function graphs that also show the size of errors that are medically important, as calculated from the quality requirement and the observed imprecision and inaccuracy of the method. They allow you to assess the probabilities or chances of rejecting both a good run that has no errors (except for the inherent imprecision of the method) and a bad run that has medically important errors. The y-intercept of the power curve tells you the probability for false rejection (Pfr); the intersection of the power curve and the vertical line representing the medically important error shows you the probability for error detection (Ped). The objective in selecting a QC procedure is to find the simplest control rules and lowest N that give a Ped of 0.90, or 90% error detection, with a Pfr of 0.05, or 5% or less false rejections. [See the potassium QC planning application that illustrates the use of critical-error graphs]
OPSpecs charts can also be used to select specific rules and Ns. For a defined quality requirement and a specified level of error detection (or % AQA, Analytical Quality Assurance), they show the inaccuracy that is allowable (on the y-axis) and the imprecision that is allowable (on the x-axis) for each QC procedure included. The analytical performance of a method can be considered by plotting the method's operating point, whose y-coordinate corresponds to the observed bias and x-coordinate corresponds to the observed imprecision. Any QC procedure whose operating line (allowable inaccuracy and imprecision) is above a method's operating point provides the error detection defined by the chart and the false rejection shown in the legend of the chart. [See our cholesterol QC planning application that illustrates the use of OPSpecs charts]
The QC Validator program actually provides three OPSpecs charts with different %AQA levels of 90%, 50%, and 25%. In general, you should start with the 90% AQA chart because the first objective is to see if 90% error detection can be achieved, in which case you don't have to worry about the frequency of problems with the method because essentially all problems that occur will be detected. There may be situations with stable methods where less than 90% AQA will be considered satisfactory, in which cases the OPSpecs charts for 50% and/or 25% AQA can be used to select control rules and Ns.
Which graph to use?
Most analysts will initially find that critical-error graphs are easier to understand than OPSpecs charts, therefore they will depend on critical-error graphs in their early work and applications. They often move back and forth between critical-error and OPSpecs displays to compare their selections and gain more understanding and confidence in the OPSpecs chart. With this experience, they gradually learn to trust the OPSpecs chart and, at some point, they start depending on it as the primary graphical tool because of its simplicity of use for both manual and automated selection of QC procedures.
What if you can't achieve 90% error detection with <5% false rejections?
First, be sure you have explored the range of performance available from different QC procedures. The error detection and false rejection available will depend on the specific control rules and number of control measurements. Try single-rules with narrower control limits to increase the error detection; unfortunately, false rejections will also go up, but not too much for rules whose limits are 2.5s and wider. Add rules together to form multirule procedure to increase error detection, but also expect there will be some increase in false rejections. Increase N to increase error detection, which will also cause some increase in false rejections. Consider whether you could tolerate a somewhat higher false rejection rate, say up to 7-8%, which would allow you to use multirule procedures with Ns up to 6. These higher N multirule procedures potentially offer a lot better error detection at some increase in false rejection costs, which may not be prohibitory given the costs of other strategies for managing the quality of these methods.
Second, consider whether you can improve the performance of the method by reducing the observed CV and reducing the observed bias. The effects of such improvements can be readily assessed from OPSpecs charts by plotting a new operating point that corresponds to the improved CV and/or bias. Imprecision can always be improved by performing replicate measurements, though at some increase in cost. The effect of the number of replicates on QC design can be assessed directly by entry of this parameter in the QC Validator 2.0 program.
Third, consider using multirule designs that operate across runs. If you can't get the desired error detection in the first run, you can look-back at previous control measurements to increase the effective N and catch a problem as soon as possible.
Fourth, consider multistage QC designs. These are QC designs where you have one set of rules and N for high error detection and another set of rules and N for low false rejection. Use the high error detection design whenever the system is susceptible to problems, such as the initial startup at the beginning of a day or shift, after a change in reagent lot, after calibration, etc. Use the low false rejection design for monitoring during periods when performance is expected to be stable.
Fifth, consider whether the stability of the method is so good that lower error detection is satisfactory. With very stable methods that seldom have problems, the biggest concern is the waste of time, effort, and money due to repeat analyses, thus the false rejection rate must be kept low but the error detection rate could be moderate, say 50%. With extremely stable methods, even 25% error detection may be satisfactory.
Finally, remember that statistical QC is only one part of your overall or Total QC strategy. If you can't readily detect problems when they occur, then you need to prevent them from occurring by thorough operator training, preventive maintenance, instrument function checks, and method performance tests. You can also review and inspect patient test results to be sure they agree with the patient's diagnosis and condition. All of these efforts take additional time and resources, which is the price you pay for methods that can not be adequately monitored by statistical QC.