Open Access
Regarding “Term prediction with ultrasound: evaluation of a new dating curve for biparietal diameter”
Author(s) -
EIKNES STURLA H.,
GRØTTUM PER,
GJESSING HÅKON
Publication year - 2006
Publication title -
acta obstetricia et gynecologica scandinavica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.401
H-Index - 102
eISSN - 1600-0412
pISSN - 0001-6349
DOI - 10.1080/00016340600839668
Subject(s) - medicine , biparietal diameter , norwegian , gestational age , ultrasound , term (time) , pregnancy , obstetrics , statistics , radiology , mathematics , head circumference , biology , linguistics , philosophy , genetics , physics , quantum mechanics
In their article ‘‘Term prediction with ultrasound: evaluation of a new dating curve for biparietal diameter’’ (1), Backe and Nakling present a validation of a recently published dating method, ‘‘Terminhjulet’’ (Method B) (2), and an old method, ‘‘Snurra’’ (Method A) (3); the latter has been in use since 1984 in all Norwegian departments to assess gestational age and predict the expected day of delivery. The authors compare the two methods by computing the mean difference between the observed and expected day of delivery, which is presented in their Table II, and in Figure 2 for individual biparietal diameter (BPD) values. This shows ‘‘Terminhjulet’’ as having a mean residual of /0.7 days and ‘‘Snurra’’ /3.5 days. The authors conclude that ‘‘the underestimation of fetal age by the BPD dating curves used in Norway for the last 20 years may lead to wrong clinical decisions, and the new reference values should be used’’. Their conclusion is incorrect and is caused by wrongful use of the mean as a statistical tool to evaluate the data. Additionally, several other factors deserve comments. The distribution of the duration of the pregnancy for the human fetus is highly skewed with a long left tail of preterm births, mainly caused by pathology. When evaluating a system that is designed to predict term in normal pregnancies, as these two methods are, one would prefer to exclude the pathological cases. However, because there are no good, independent measures of such pathology, the cases cannot easily be identified and excluded from the evaluation. One must consequently choose measures of performance that are reasonably insensitive to abnormal cases or outliers (gross errors in data). The mean is highly sensitive to observations in the tails of the distribution, and particularly so for skewed distributions. The pathological, early births will draw the ‘‘true’’ mean residual in a negative direction, yet their presence has no relevance for the predictive capacity of the method evaluated. Analyses we have done on a dataset of approximately 50,000 ultrasound scans from Trondheim, Norway, show that the leftmost 6% of the residual distribution account for a change in the mean of 2 days. In addition, residuals are also shifted in a negative direction by the inductions for post-term, which artificially shortens the duration of the pregnancy. The authors’ unjustified exclusion of all post-term inductions adds to the problem rather than diminishing it. Thus, a mean residual as close as possible to zero does not constitute a proof of soundness. On the contrary, it indicates that the method predicts a too early term. The median is a robust parameter for the evaluation of skewed distribution such as the birth distribution (4,5). It is far less sensitive to the pathological processes at the extreme range of the curve, such as the pathological preterm births. Indeed, also the inductions post-term can be managed statistically using the median, without introducing a bias. In their article, the authors should have focused on the median as the measure of goodness. In fact, the authors present only the overall median residual, and when computing it they have not used an appropriate method for rounded data, making their median comparison imprecise and possibly biased. The median should have been presented with decimals. Finally, the use of mean values in the important Figure 2 in Backe and Nakling’s paper gives a completely misleading impression. As an example of the difficulties in interpreting the mean, the authors’ statement about inductions in the first paragraph of the discussion is incorrect: ‘‘This selective exclusion of cases with long duration will bias the comparison in favor of method A. Despite the inherent bias, method B has a significantly smaller mean prediction error than method A . . .’’. In fact, the opposite is more likely: from the medians given in Table II it can be seen that inclusion of the post-term inductions with their large positive values would improve the median of ‘‘Snurra’’ (method A) compared to the median of ‘‘Terminhjulet’’ (method B). It is not possible to know the precise effect on the means, but it is likely to be in the same direction. The exact values of the post-term inductions will not influence the median, in contrast to the mean, and Acta Obstetricia et Gynecologica. 2006; 85: 1276 1279