HOME
Top of page

KiwiQC: a mean and variance quality control scheme for multiple correlated levels of replicated control samples.

John H. Livesey

E-mail john.livesey@cdhb.govt.nz

Critique of current quality control in the clinical chemistry laboratory

Statistical quality control was developed for manufacturing industry by Walter Shewhart and was based on the mean and variance (or range) of a measurement performed on several different items of the product (1). Levey and Jennings suggested that these procedures be applied unchanged in the clinical laboratory by running two replicates of a single control sample in each analytical run and using the mean and range for control purposes (2).

For many analytes though it is desirable to run two or more levels (different concentrations) of QC samples, and faced with the problem of interpreting such multi-level QC data, Westgard and co-workers (3) proposed their now widely used multi-rules based on charting individual values. Presumably this individual value approach was chosen because, after the preliminary set-up, no calculations were required to apply the rules. Now though that digital computers are almost universally available, it is more efficient to base QC procedures on the most powerful and selective statistical algorithms available.

While there are situations where the use of control charts for individual measurements is appropriate (4, p249), it has long been known that in most cases the selection of "rational subgroups" of the stream of product will facilitate the discovery of "assignable causes" for deviations from control, that the most effective size of the subgroup is typically about four rather than one and that the mean and variance respectively are the best estimators of a change in the location or dispersion of the measurements (1). Linnet has more recently emphasised this for clinical chemistry by showing that by comparison with individual-value rules, rules based on the mean and variance of several control values are always more powerful for detecting a shift of location, are usually more powerful for detecting an increase in variance, are more selective between a shift in location and an increase in variance, and are more robust towards deviations from normality (5).

Relatively little attention though appears to have been given to several factors which complicate the practical application of quality control rules in clinical chemistry. Firstly, more than one level of QC is usually analysed per batch (note 1). Consequently multivariate procedures should be used to avoid the increase in type I error (the false detection of an out-of-control condition) resulting from the independent testing of several variables (4, p507).

Secondly, where more than one level of QC is run, the measurements made on the different levels are usually correlated across levels because at least part of the between-batch variance is common to all levels (6). Again multivariate rather than univariate statistical procedures ought be used so as to appropriately adjust statistical significance levels (4, p507).

Thirdly, QC specimens may be replicated within a batch (to enable detection of within-batch drift and estimation of within-batch variability).

Fourthly, when the number of replicates of a QC sample can vary from batch to batch, both the within-batch and the between-batch components of variance must be separately estimated. Also the number of replicates can differ between QC levels.

Fifthly, analytical processes are frequently non-ergodic (7); that is to say the QC values from successive batches are autocorrelated.

Sixthly, especially for low volume tests, it is often necessary to start using a quality control pool with a rather small number of preliminary measurements from which to calculate the target mean and SD. In this case the actual probability of a type I error is greater than the nominal value. For example, for an in-control process running a single QC specimen, if the mean and standard deviation are calculated from 5, 10, 20 or 50 preliminary values with a Gaussian ('normal') distribution, the probability of a future QC value differing from the calculated mean by more than three times the calculated SD is approximately 0.052, 0.019, 0.0086 or 0.0046 respectively per batch instead of the nominal 0.0027.

In order to maximise laboratory efficiency and effectiveness it is desirable that the probability of type I error is fixed at a low level independent of the above six factors. Consequently it has been proposed that multivariate methods based on a chart (8) or Hotelling's T2 (9) be used in clinical chemistry. Such multivariate approaches to the interpretation of QC data however have not proved popular, possibly because they offer little assistance with troubleshooting since all shifts in means, positive or negative, and changes in variances or correlations are agglomerated into a single statistic, or T2.

To accommodate non-ergodic QC data Jansen and co-workers use a EWMA chart (7), but it is a univariate procedure making no allowance for inter-level correlation and the ARL is not stated. EWMA control charts have also been advocated for the detection of small systematic errors (9, 10) but again the statistics used are only univariate.

A new approach to quality control in the clinical laboratory

Because of these short-comings of the quality control procedures in common use in clinical chemistry laboratories, a novel scheme has been developed for mean and variance-based quality control which wholly or partially addresses the above six complications of practical quality control while computing up to three semi-independent control statistics to assist with troubleshooting (11).

The new scheme been implemented in our laboratory for over seventy manual and automated tests (mainly immunoassays) and should be of general applicability for clinical chemistry.

KiwiQC, simple software implementing the new QC scheme, is available as freeware (john.livesey@cdhb.govt.nz). The main features of KiwiQC are illustrated in the figures below:

The conceptual basis of KiwiQC

The analytical 'batch' is a key concept in KiwiQC, though this may seem incongruous in an age of automated continuous random access analysers. In reality though the output of such analysers is usually divided, at least conceptually, into successive 'analytical runs', or batches, in the application of statistical quality control procedures (12). The batch then is that group of analytical results whose acceptability is being judged by the application of the quality control procedure at a particular point in time.

This concept of the batch makes problematic the use of quality control algorithms that use results from a previous batch, such as a rule or an exponentially weighted moving average (EWMA), because an out-of-control alarm casts doubt not only on the acceptability of the current batch but also, awkwardly, on the acceptability of at least one previously accepted and released batch. Further, Jones (13) has observed that cross-batch rules contribute little to the power of common multi-rule QC schemes. Cross-batch rules (and related schemes such as multivariate EWMA (14) and CUSUM (15)) also tend to have an inertia which reduces the immediacy of the detection of sudden large shifts in the process characteristics. For these reasons KiwiQC is an entirely within-batch quality control procedure.

The most intuitively appealing approach to within-batch quality control where QC samples are run at several different levels in each batch and there is replication at one or more levels, is to standardise the individual QC values. Then the mean () of all the standardised QC values in a batch would be plotted on the control chart (5).

However, while may be the natural statistic to chart to control for a systematic shift in location, determining control limits for it when any of the QC samples are replicated and correlated is an intractable problem involving consideration of correlations within as well as between levels.

To avoid this difficulty the proposed quality control scheme is modelled on a one-way analysis of variance. By taking this approach explicit consideration of within-level correlation and multivariate statistics is avoided if the degree of replication at each QC level is the same for both the preliminary batches and for the routine batches. Where it is not, consideration of within-level covariance could be avoided by not only calculating , and for the full number of replicates available at each level (nk), but also calculating and storing them for each of 1, 2, .. nk-1 replicates. It has been shown however (11) that for the most common case, where nk is usually two but in particular batches nk is one, the scheme gives practically useful approximations to the desired type I error probability.

Between-level correlations observed in our laboratory are often as high as 0.6, with occasional instances of correlations of 0.8 or greater for short sequences of batches. It is also usual for some pairs of QC levels to be more highly correlated than others for the same method. One of the more extreme instances we have seen was a sequence of 75 batches of an immunoradiometric assay for growth hormone where the correlations between the pairs of levels H:M, H:L and M:L were 0.78, 0.26 and 0.21 respectively.

One of the key advantages of KiwiQC over the commonly used multi-rule procedures is the approximate constancy of the probability of type I error. The total probability of an out-of-control alarm when the analysis is actually in control is close to 0.0054 (taking zm, zb and zw together) when the error has a Gaussian distribution, regardless of the number of QC levels, the degree of replication, the number of preliminary batches available or the degree of correlation between the levels. This equates to an ARL of about 185 and only about 0.54% of batches will be falsely rejected. Multi-rule schemes applied to multiple levels of QC samples have considerably higher false rejection rates, especially, as is usually the case, when there is significant correlation between the QC This shortcoming of multi-rule schemes has previously been noted by Parvin (6).

High rejection rates when the analyst's intuition suggests that a method is actually running well tends to bring statistical quality control into disrepute and leads on the one hand to analysts arbitrarily ignoring out-of-control alarms and on the other, to proposals to either widen the acceptability limits to those based on medical requirements or biological variation, or to dispense with reference samples altogether and use patient data for quality control (16). Such alternatives may be appropriate for analytes where analytical variability is low enough to easily meet clinical requirements but there remain analytes for which variability must be minimised using the most efficient statistical quality control methods available.

Designing in a low type I error also minimises the losses in efficiency and effectiveness of having batches rejected, and reduces downward "SD creep" (the tendency for the estimate of the standard deviation to decrease on successive updates if batches are incorrectly rejected).

A consequence of the deliberately low type I error rate in the present scheme is that schemes which tolerate a higher type I error rate, such as common multi-rules, can have a greater power to detect shifts in QC values and may thus appear to be at first sight a more attractive alternative. However, because of the negative consequences of high false rejection rates outlined above, a better alternative where high sensitivity to small systematic shifts is required would probably be to use a multivariate EWMA (14) or CUSUM (15) scheme.

Two critical assumptions underlying the KiwiQC scheme (and the multi-rule schemes) are firstly that the distribution of QC values is approximately Gaussian, at least over a large number of batches, and secondly that the number of preliminary batches is large enough to accurately represent the total variability of the method. Consequently several different changes in the lots of both reagents and standards must be included in the preliminary calculations or variability is likely to be underestimated. A practical strategy is to start, if necessary, with as few as 10 preliminary batches, then recalculate the estimates of the process parameters (means, standard deviations and correlations) as 20, 50 and then 100 batches become available. Thus so long as any non-ergodicity or autocorrelation of the QC results arises from purely random influences, such as random variations in the potency of successive batches of standards, the distribution of the preliminary QC values will eventually approach a Gaussian distribution and the estimates of the targets, variances and correlations obtained by following this strategy will approximate the true values.

As an alternative to updating the estimates of the process parameters at approximately doubling intervals, as suggested above, it might be proposed that they be automatically updated as each new batch is accepted. This though is likely to be undesirable because, if a process were to depart from control to only a moderate degree and several batches were to be accepted (type II error) before the departure from control was detected, these batches would be included in the updated estimates and allow the possibility of SD and target creep. This could adversely alter the power and hence the probability of eventually detecting the departure from control.

A possible limitation of KiwiQC (and of the common multi-rule schemes) is that it does not accommodate significant autocorrelation arising from non-random effects, since as this violates the assumption of a large sample Gaussian distribution of analytical variability. A hypothetical example would be an autocorrelation of QC results caused by a regular day-night ambient temperature change. Here the probability of type I errors is likely to be markedly reduced but so is the power to detect escape from control. From a strict Shewhart viewpoint, this type of autocorrelation is due to an assignable cause and should be eliminated. However in practice elimination may not be feasible and in severe cases a multivariate EWMA QC scheme (14) might be preferable.

Compared to Hotelling's T2, KiwiQC offers generally simpler interpretation of out-of-control alarms, particularly if QC samples are duplicated within a batch. If zm is outside control limits but zb and zw are not, then it is most likely that the cause is a concordant shift in the means of the QC results, whereas a significant increase in zb alone suggests a discordant shift in the means of the QC results, and a significant increase in zw, with or without significant changes in zm or zb, strongly indicates an increase in random error. In an additional advantage over Hotelling's T2 , deviations in zm indicate the direction of a shift in means, unlike T2 which is always positive.

However interpretation of KiwiQC is not entirely straightforward since, in a small proportion of cases a significant change in zm or zb alone will be the first indication of an increase in variance rather than as, more frequently, an indication of a shift in means. Also, as with Hotelling's T2, out-of-control alarms for zm or zb could arise with greater than expected frequency from a change in the correlations between the QC levels.

One additional feature of KiwiQC that may be advantageous compared to currently popular schemes is the presentation of all control charts in standardised format. As zm , zb and zw are each in units of standard deviation, all control charts in the laboratory are plotted with the same range and units, say -4 to +4 for zm and 0 to +4 for zb and zw. Hence comparisons are easily made between the relative state of control of different analytes in the laboratory and the making of rules for the handling of out-of-control situations is simplified. For example, a laboratory might make it a policy that if it is proposed to release a batch or a result where zm is greater than zmcrit but less than 3.0, the approval of one other supervisor is required, but if zm lies between 3.0 and 5.0, the approval of two other supervisors would be required.

Detecting outliers in the preliminary batches

To start the QC scheme, the means and standard deviations for each level, and the correlation between levels, must be calculated from a number of preliminary batches (the training or phase I sample). These batches may contain outliers from a normal (Gaussian) distribution, the inclusion of which would result in a spuriously large estimate of the standard deviation. KiwiQC provides the kurtosis test (17) to assist in the detection and removal of outliers.

Summary

KiwiQC is a mean and variance-based statistical quality control scheme which typically has a probability of false rejection of between 0.0045 and 0.0071 per batch while allowing for:

In addition, KiwiQC has power comparable to Hotelling's T2 while facilitating trouble-shooting by indicating whether an out-of-control alarm most likely results from a concordant shift in mean values, a discordant shift in means or an increase in analytical variability.

Notes

  1. The term 'batch' is used instead of the more common 'analytical run' to avoid using 'run' in a different sense than it's use in 'ARL'.
  2. The derivation of the SD of the standardised mean is given here.

References

  1. Shewhart WA. Economic control of quality of manufactured product. New York: Van Nostrand, 1931:501pp.
  2. Levey S, Jennings ER. The use of control charts in the clinical laboratory. Amer J Clin Pathol 1950;20:1059-1066.
  3. Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27:493-501.
  4. Montgomery DC. Introduction to statistical quality control. 4ed. New York: Wiley, 2001:796pp.
  5. Linnet K. Mean and variance rules are more powerful or selective than quality control rules based on individual values. Eur J Clin Chem Clin Biochem 1991;29:417-424.
  6. Parvin CA. New insight into the comparative power of quality-control rules that use control observations within a single analytical run. Clin Chem 1993;39:440-447.
  7. Jansen RTP, Laeven M, Kardol W. Internal quality control system for non-stationary, non-ergodic analytical processes based on exponentially weighted estimation of process means and process standard deviation. Clin Chem Lab Med 2002;40:616-624.
  8. Dechert J, Case KE. Multivariate approach to quality control in clinical chemistry. Clin Chem 1998;44:1959-1963.
  9. Marquis P, Masseyeff R. Évaluer une méthode contrôle de qualité interne: application au contrôle multidimensionel. Ann Biol Clin (Paris) 2002;60:607-616.
  10. Linnet K. The exponentially weighted moving average (EWMA) rule compared with traditionally used quality control rules. Clin Chem Lab Med 2006;44:396-9.
  11. Livesey JH. Mean and variance quality control for multiple correlated levels of replicated control samples. Clin Chem Lab Med 2005;43:1240-1252.
  12. Westgard JO. Internal quality control: planning and implementation strategies. Ann Clin Biochem 2003; 40:593-611.
  13. Jones RD. Reevaluation of the power of error detection of Westgard multirules. Clin Chem 2004;50:762-764.
  14. Prabhu SS, Runger GC. Designing a multivariate EWMA control chart. J Qual Technol 1997;298-15.
  15. Pignatiello J, Runger G. Comparisons of multivariate CUSUM charts. J Qual Technol 1990;22:173-186.
  16. Cembrowski GS. Thoughts on quality-control systems: a laboratorian's perspective. Clin Chem 1997;43:886-892.
  17. Livesey JH. Kurtosis provides a good omnibus test for outliers in small samples. Clin Biochem 2007;40:1032-6.


Go to top of page