[Start][My Kids][Science]
[Willi thinks]

Principal components in the analysis of longitudinal growth data

Meigen C, Hermanussen M*
Leipzig, Germany; *Aschauhof, Germany

Correspondence to: PD Dr. Michael Hermanussen, Aschauhof 3, 24340 Altenhof, Germany Tel 0049-4351-41738
Email: hermanussen.aschauhof@t-online.de

Abstract

We present a new approach of analysing longitudinal height and body mass index (BMI) data using principal component analysis. The analysis is based on series of longitudinal growth measurements ranging from birth to maturity obtained from two growth studies, performed at Lublin, Poland, and Zurich, Switzerland, with 248 healthy boys and 235 healthy girls. Measurements were compared with their respective national reference standard, and converted into height SDS and BMI SDS. The individual SDS-curves were slightly smoothed using kernel smoothing. Thereafter, principal component analysis (PCA) was performed. Eight components are sufficient to model all measurements with a remaining variance equivalent to the technical error. We propose a way to apply these results to any series of measurements, resulting in a test whether these measurements are "normal" according to a given group of individual children or not. The analysis not only describes longitudinal growth regardless of the timing of the measurements, it also predicts height and BMI outside the individual measurement period.

Introduction

Everyday clinical practice has shown that conventional growth assessment fails to diagnose children with growth disorders at early stage. This is true even though frequent measures of body height and weight had often been performed (Keller et al. 2000). There is urgent need for a diagnostic tool that is both easily accessible in everyday practice and guarantees an automatized analysis of child growth and development. The analysis of longitudinal data of height and weight in child growth has a long tradition (Falkner, Tanner 1986), longitudinal growth models were recently summarized by Ledford and Cole (1998). Analyzing growth velocity is not a trivial task. Traditional practice suggests measuring height and weight at annual or semi-annual intervals for which standards of growth velocity exist (Tanner 1986). Yet, measuring at exact intervals is difficult under routine conditions. In order to describe growth independently from the timing of the measurements, we developed the concept of growth tracks (Hermanussen et al. 2001a, 2002). The term tracking has been used to connote an individual's maintenance of relative rank of some longitudinally measured characteristic over a given time span (Segal and Tager 1993), and reflects the idea of growth canalization (Hermanussen et al. 2001b). Growth tracks are insensitive to the timing of the measurements, and denominate areas of probability within which subsequent measurements of body height of healthy individuals will likely be found. The concept of growth tracks significantly improved separating aberrant patterns from normal growth in clinical practice. The concept of growth tracks appeared robust against the unpredictable dynamics of short term growth and measurement error, and even enabled to describe future growth for a certain number of years. Yet, in spite of its usefulness, the concept was purely empirical, and lacked a sound mathematical foundation.

The present report offers an alternative approach. We used principal component analysis (PCA) to describe growth, as suggested earlier by Cole (TJ Cole, personal communication, 2002), and produced standards for longitudinal growth from two well-reputed growth studies with growth measurements from birth to maturity.

Material and methods

Longitudinal data on child height and weight were obtained from two well-reputed growth studies, performed at Lublin, Poland (Chrzastek-Spruch et al. 1990), and Zurich, Switzerland (Prader et al. 1989), with altogether 248 healthy boys and 235 healthy girls. Details of these studies have been published elsewhere. All children were measured from birth to maturity, usually at annual intervals, and occasionally at six months or shorter intervals. For the analysis, we selected measurements at the age of 1.0 years, 1.5years, 2.0 years, and at annual intervals up to the age of 18 years. We assumed a technical error (standard error) of 0.5 cm for individual height measurements. This corresponds to some 0.12 SD in younger children, and decreases to some 0.08 SD in adolescents and adults. All measurements were referred to their respective national reference standard, and converted into height and BMI standard deviation scores (SDS). We then described growth by principal component analysis (PCA). The purpose of PCA is to partition the variability of the data into orthogonal components. Each component explains a certain amount of the total variability. In a first approach we used unsmoothed data. But unsmoothed data may lead to small random effects even in the first components (see Figure 1 and 2). As the present PCA was intended to establish standard components for later analysing individual data, we tried to avoid any random effects and decided to smoothen the data already prior to the analysis. We thereby accepted that smoothing data prior to the PCA, may slightly distort the later components.


Figure 1: Principal component C1, C2, and C3 of female height SD derived from unsmoothed data

Smoothing was performed using kernel smoothing (Venables & Ripley 1997), with the density function of the normal distribution as kernel and a bandwidth of 1.5 years. Principal components based on the smoothed data of our samples appeared very similar to principal components based upon unsmoothed data, but without the random effects mentioned above(see Figure 1 and 2).


Figure 2: Principal component C1, C2, and C3 of female height SD derived from smoothed data

For both height and BMI, this process resulted in two matrices with 19 columns for the age points and 248 rows for the boys respectively 235 rows for the girls. On these data, principal component analysis was performed using the MVA package of the statistical language R (Venables & Ripley 1997).

We needed eight components to model the height measurements of every child with a remaining variance equivalent to the technical error. This was the same in smoothed and unsmoothed data. PCA gives not only the loadings l1...l8 of the components (these vectors describe the impact of the components at each of the 19 age points), but also the standard deviations s1...s8 of each component in the study population (Tables 1-4).

The crucial question however remains: How to apply these results to arbitrarily spaced measurements of a single individual, in order to test whether the measurements of the individual are "normal" according to the study population?

We developed a technique to do this and use it successfully on our website willi-will-wachsen.com. We propose to employ the Maximum Likelihood Principle, a technique to find most likely estimates for parameters (Sachs 1978). We give a step-by-step description of our approach. In combination with Tables 1-4, the algorithm provides a test for normality of arbitrarily spaced longitudinal measurements of height and BMI SDS.

  1. We have to test n measurement values mi at ages ai, i=1..n, with a respective standard error ei.
  2. We assume that the standard error of each measurement may be considered normally distributed (Boyd 1929).
  3. We can assign a SDS vector c1*l1+ ... +c8*l8 to every set of coefficients c1...c8.
  4. The SDS vector can be transformed to a curve f:Age ® Measurement by using the known norm curve to reconvert SDS to measurement values. Linear interpolation is used between the age points to obtain a complete curve. When we assume this to be the true curve of measurements for that child, all differences f(ai)-mi must be measurement errors. Based on the known standard errors ei we can assign an SDS sei=(f(ai)-mi)/ei to each measurement error.
  5. The coefficients for the components are independent, as are measurement errors. We therefore compute the product density by multiplying the densities: dnorm(c1/s1)* ... *dnorm(c8/s8) *dnorm(me1)* ... *dnorm(mei), where dnorm is the density of the normal distribution (the Gaussian "Bell curve").
  6. The Maximum Likelihood Principle suggests to use that set of coefficients with the highest over-all product of densities, so a numerical optimization of the coefficients has to be performed. We use the Hooke-Jeeves algorithm (Bronstein 1991) with a start value of 0 for all coefficients.
  7. The result of this optimisation is the most likely growth curve for that child - under the assumption that it belongs to the study population and that the measurements had the given standard error.
  8. We can now check whether these coefficients are in the normal range, and whether the measurements are within an acceptable distance from the curve. One could use a +/-2 SD criterion on each coefficient and on each measurement error, but this arises the problem of multiple testing. The probability that the first coefficient lies between +/-2SD is roughly 0.95=95% for a child of the study population. The probability, that the first and the second coefficient lie between +/-2 SD is 0.95*0.95=90.25% etc. The probability that all 8 coefficients are between +/-2 SD is 0.95^8=66% for a child of the study population, i.e., just because of the coefficients every third healthy child would not pass the test. We use the False Discovery Rate procedure (Benjamini & Hochberg, 1995) to correct for this effect. The procedure calculates limits (usually much higher than 2SD) which assure that only 5% of the healthy children are rejected, no matter how many measurements and coefficients we use.

Result

Tables 1-4 illustrate the components of the height and BMI SDS curves for both sexes. The columns contain the loadings, the standard deviations are given in the first row. In females, the principal component C1 explains 80.2 %, C1+C2 explain 88.9 %, C1+C2+C3 explain 95.5 %, all eight components explain 99.4 % of the height variation, similar results were found in males, and in BMI (data not shown).

By applying the procedure described above, the data can be used to check normality of arbitrarily spaced longitudinal measurements of height and BMI. This procedure has been made accessible for parents and scientists, on our website willi-will-wachsen.com. We present an example of a 13 year old child suffering from constitutional delay of growth and puberty.

Example 1:

Though the child has originally been measured at 22 age points, we reduced the series of measurements to n=3, in order to better explain the algorithm. The three selected measurements took place at the ages of a1=2.07, a2=6.99 and a3=12.7 years resulting in height values of m1=89.3 cm, m2=120.1 cm, and m3=149.9 cm, respectively. We assume a measurement error of 0.75cm (conservative estimate for measurements performed by parents), so e1=e2=e3=0.75.

We start with the value 0 for all coefficients, and the estimated SDS curve c1*l1+...+c8*l8 is constantly 0. I.e., the curve f (step-by-step description #4) exactly equals the 50th centile of the German reference population (Hermanussen et al. 1999), and differs from the 3rd measurement by 4.4 cm, so se3>5 and dnorm(me3)<1e-6. Therefore, the overall product density is very small, indicating that the first guess is inappropriate. Applying the Hooke-Jeeves algorithm to optimise the coefficients quickly results in c1=2.48, c2=-1.00, c3=-0.15, c4=0.47, c5=0.07, c6=0.07, c7=0.02, and c8=-0.02.

Calculating c1*l1+c2*l2+ ... +c8*l8, i.e. multiplying the coefficients with the respective rows in Table 1 and summing up the vectors, results in an estimated SDS curve. When reconverting the SDS curve into centimetres, we obtain the following values for the 19 age points: 77.29, 83.62, 88.21, 95.90, 102.94, 108.86, 115.12, 120.45, 125.58, 130.73, 135.97, 140.91, 146.29, 151.46, 157.60, 164.13, 169.29, 172.39, 173.76.


SDS values of 3 height measurements and estimated curve

By using linear interpolation, we get the following height estimates for the 3 ages of a1=2.07, a2=6.99 and a3=12.7 years: f(a1)=88.77, f(a2)=120.43, and f(a3)=149.93. The respective standard error is se1=(88.7-89.3)/0.75=-0.71, se2=(120.43-120.1)/0.75=0.44, se3=(149.93-149.9)/0.75=0.04. i.e., the height measurements are therefore within an acceptable distance from the estimated curve f (step-by-step description #8). When dividing the coefficients by the respective standard deviation of that coefficient in the population (Table 1), we get c1/s1=0.52, c2/s2=-1.05, c3/s3=-0.25, c4/s4=0.73, c5/s5=0.24, c6/s6=0.28, c7/s7=0.10 and c8/s8=0.08. These coefficients are within the normal range, even without applying the False Discovery Rate procedure to correct for multiple testing.

Example 2:

We check the same male child, but take all n=22 height measurements into account. The child shows a clinically abnormal growth pattern, as he suffers from constitutional delay of growth and puberty. More measurements increase the accuracy of our analysis, and reveal the abnormal growth pattern with no pubertal growth spurt up to the age of 13 years. After early childhood growth retardation, the child continues growing at a lower centile, but up to the age of 13 years with no evidence of pubertal growth. The algorithm results in c1/s1=0.23, c2/s2=0.40, c3/s3=-3.20, c4/s4=0.80, c5/s5=-1.49, c6/s6=-0.05, c7/s7=-0.15 and c8/s8=-0.15. With c3/s3=-3.20, this pattern significantly deviates from normality, and is easily detected by the algorithm.


Figure 4: SDS values of 22 height measurements and estimated curve

The example illustrates the importance of long-term multiple measurements for proper analysing individual growth curves. Calculating centiles for height or traditional growth velocity, may often be insufficient to detect aberrant growth pattern in children.

Discussion

Conventional growth assessment often fails to diagnose children with growth disorders at early stage (Keller et al. 2000) because an effective tool for nation-wide early screening of abnormal growth is not present. We propose an automatized analysis of child growth that is accessible not only for medical personnel, but may also be used by parents and non-medical staff.

The analysis of longitudinal data is effectively an analysis of growth functions (Ramsay & Silverman 1997). Using the SDS at the distinct age points directly to describe these functions is a very trivial way compared to more sophisticated ways to model growth functions (Ledford & Cole, 1998), but it avoids several problems that are associated with these models. In fact, the result of the PCA on the SDS values can itself be viewed as an eight-parametric model for growth functions, constructed from the study population.

Only the first component describing tallness and shortness of height, or heaviness/lightness, truly reflect biological properties. Although the second and third components still appear related to biological phenomena (centile crossings, and timing of the onset of puberty), the full set of components is a mathematical description with no obvious relation to known biological properties.

The procedure does not only describe patterns of growth during the period measurements have been performed; the analysis assigns the most probable set of components, regardless of the timing of the measurements, and also predicts height and BMI outside of the measurement period. The present analysis bases upon standard deviation scores and therefore, strongly depends on the reference standard used. The analysis can be used wherever reliable references are available. If reliable references are not available, references may be synthesized as previously described (Hermanussen and Burmeister 1999). We are still interested in longitudinal data of height and weight from children of developing countries in order to further analyze whether the present procedure may also be used among these populations.

Acknowledgements

We are very grateful for numerous advice and fruitful discussions with Prof. TJ Cole, London, UK, Prof. James Ramsay, Montreal, Canada, Dr. Luciano Molinari, Zürich, Switzerland, and Dr. Marek Brabec, Prague, Czech Republic. This research was supported by GrandisBioTech, Deutsche Gesellschaft für Auxologie, and by Deutsche Forschungsgemeinschaft, Grant HE 1440/5-1.

References

  • Benjamini, Y., Hochberg, Y., 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society. Series B, Volume 57, Issue 1, 289-300.
  • Boyd, E., 1929. The experimental error inherent in measuring the growing human body. Am. J. Phys. Anthropol. 13:389-432.
  • Bronstein, I.N., Semendjajew, K.A., 1991. Taschenbuch der Mathematik, Leipzig.
  • Chrzastek-Spruch, H., Susanne, C., Hauspie, R., 1990. Standards for height and height velocity for Polish children. Studies in Human Ecology 9:179-197.
  • Hermanussen, M., Burmeister, J., 1999. Synthetic growth charts. Acta Paediatr. 88:809-14.
  • Hermanussen, M., Lange, S., Grasedyck, L., 2001a. Growth tracks in early childhood. Acta Paediatr. 90:381-6.
  • Hermanussen, M., Largo, R. H., Molinari, L., 2001b. Canalisation in human growth: a widely accepted concept reconsidered. Eur. J. Pediatr. 160:163-7.
  • Hermanussen, M., Grasedyck, L., Kromeyer-Hauschild, K., Prokopec, M., Chrzastek-Spruch, H., 2002. Growth tracks in pre-pubertal children. Ann. Hum. Biol. 29:667-676.
  • Ledford, A. W., Cole, T. J., 1998. Mathematical models of growth in stature throughout childhood. Ann. Hum. Biol. 25:101-115.
  • Meigen, C., Hermanussen, M., 2003: Automatic Analysis of Longitudinal Growth Data on the website willi-will-wachsen.de. Homo Vol. 54/2, 157-161.
  • Prader, A., Largo, R. H., Molinari, L., Issler C., 1989. Physical growth of Swiss children from birth to 20 years of age. Helv. Paediat. Acta 43. Suppl. 52:1-125.
  • Ramsay, J. O., Liverman, B. O., 1997, Functional Data Analysis, Springer Series in Statistics
  • Sachs, L., 1978. Angewandte Statistik, Berlin, Heidelberg, New York.
  • Segal, M. R., Tager, I.B., 1993. Trees and tracking. Statistics in Medicine 12:2153-68.
  • Tanner, J. M., 1986. Use and abuse of growth standards. In Human growth, edited by Falkner, F., Tanner, J. M., volume 3, 2nd edition (New York, London: Plenum Press) 95-109.
  • Venables, W. N., Ripley, B. D., 1997. Modern Applied Statistics with S-PLUS, Springer-Verlag.

Appendix: Principal Components derived from the analysis

Table 1: Components of female Height SD

AgeC1C2C3C4C5C6C7C8
SD 3.940 1.306 1.101 0.589 0.292 0.206 0.164 0.133
1.00 -0.203 -0.381 -0.178 0.398 -0.341 -0.173 -0.291 -0.128
1.50 -0.210 -0.372 -0.176 0.312 -0.150 -0.026 0.034 0.041
2.00 -0.217 -0.353 -0.156 0.196 0.077 0.129 0.361 0.139
3.00 -0.230 -0.281 -0.040 -0.139 0.418 0.422 0.317 -0.051
4.00 -0.236 -0.227 0.033 -0.262 0.379 0.122 -0.275 -0.214
5.00 -0.240 -0.174 0.088 -0.277 0.159 -0.270 -0.450 -0.054
6.00 -0.244 -0.120 0.129 -0.248 0.006 -0.278 -0.164 0.248
7.00 -0.246 -0.050 0.148 -0.222 -0.136 -0.226 0.208 0.292
8.00 -0.246 0.015 0.171 -0.188 -0.220 -0.196 0.357 0.104
9.00 -0.244 0.067 0.197 -0.150 -0.284 -0.038 0.273 -0.199
10.00 -0.24 0.095 0.249 -0.034 -0.3 0.222 -0.010 -0.443
11.00 -0.231 0.124 0.316 0.129 -0.160 0.409 -0.205 -0.080
12.00 -0.224 0.181 0.316 0.317 0.080 0.260 -0.198 0.386
13.00 -0.225 0.247 0.186 0.373 0.334 -0.216 0.040 0.225
14.00 -0.232 0.268 -0.059 0.251 0.292 -0.338 0.132 -0.335
15.00 -0.228 0.250 -0.253 0.054 0.102 -0.131 0.072 -0.326
16.00 -0.221 0.23 -0.349 -0.079 -0.036 0.080 -0.042 0.006
17.00 -0.216 0.224 -0.385 -0.132 -0.110 0.144 -0.097 0.199
18.00 -0.213 0.226 -0.399 -0.148 -0.133 0.160 -0.112 0.225

Table 2: Components of male Height SD

AgeC1C2C3C4C5C6C7C8
SD 4.139 0.934 0.621 0.605 0.327 0.214 0.172 0.134
1.00 -0.203 -0.381 -0.178 0.398 -0.341 -0.173 -0.291 -0.128
1.50 -0.210 -0.372 -0.176 0.312 -0.150 -0.026 0.034 0.041
2.00 -0.217 -0.353 -0.156 0.196 0.077 0.129 0.361 0.139
3.00 -0.230 -0.281 -0.040 -0.139 0.418 0.422 0.317 -0.051
4.00 -0.236 -0.227 0.033 -0.262 0.379 0.122 -0.275 -0.214
5.00 -0.240 -0.174 0.088 -0.277 0.159 -0.270 -0.450 -0.054
6.00 -0.244 -0.120 0.129 -0.248 0.006 -0.278 -0.164 0.248
7.00 -0.246 -0.050 0.148 -0.222 -0.136 -0.226 0.208 0.292
8.00 -0.246 0.015 0.171 -0.188 -0.220 -0.196 0.357 0.104
9.00 -0.244 0.067 0.197 -0.150 -0.284 -0.038 0.273 -0.199
10.00 -0.24 0.095 0.249 -0.034 -0.3 0.222 -0.010 -0.443
11.00 -0.231 0.124 0.316 0.129 -0.160 0.409 -0.205 -0.080
12.00 -0.224 0.181 0.316 0.317 0.080 0.260 -0.198 0.386
13.00 -0.225 0.247 0.186 0.373 0.334 -0.216 0.040 0.225
14.00 -0.232 0.268 -0.059 0.251 0.292 -0.338 0.132 -0.335
15.00 -0.228 0.250 -0.253 0.054 0.102 -0.131 0.072 -0.326
16.00 -0.221 0.23 -0.349 -0.079 -0.036 0.080 -0.042 0.006
17.00 -0.216 0.224 -0.385 -0.132 -0.110 0.144 -0.097 0.199
18.00 -0.213 0.226 -0.399 -0.148 -0.133 0.160 -0.112 0.225

Table 3: Components of female BMI SD

AgeC1C2C3C4C5C6C7C8
SD 3.856 1.461 0.952 0.641 0.469 0.339 0.299 0.255
1.00 -0.198 -0.368 0.265 0.208 0.346 -0.148 0.238 -0.201
1.50 -0.205 -0.373 0.232 0.212 0.195 -0.059 0.132 -0.005
2.00 -0.21 -0.367 0.181 0.177 0.022 0.032 -0.028 0.229
3.00 -0.216 -0.332 0.06 -0.083 -0.345 0.026 -0.471 0.393
4.00 -0.234 -0.231 -0.064 -0.21 -0.313 -0.028 -0.268 -0.141
5.00 -0.242 -0.176 -0.133 -0.165 -0.104 0.183 0.036 -0.487
6.00 -0.241 -0.116 -0.245 -0.216 -0.079 0.206 0.12 -0.331
7.00 -0.242 -0.053 -0.292 -0.169 -0.051 0.232 0.376 0.144
8.00 -0.243 0.028 -0.278 -0.102 -0.019 -0.129 0.401 0.498
9.00 -0.244 0.064 -0.289 0.023 0.142 -0.336 0.075 0.137
10.00 -0.242 0.126 -0.245 0.037 0.244 -0.324 -0.264 -0.205
11.00 -0.242 0.149 -0.219 0.121 0.211 -0.198 -0.269 -0.056
12.00 -0.24 0.184 -0.11 0.307 0.135 0.211 -0.264 0.082
13.00 -0.233 0.223 0.005 0.385 -0.016 0.386 -0.088 0.001
14.00 -0.229 0.245 0.142 0.313 -0.245 0.241 0.141 0.008
15.00 -0.226 0.247 0.252 0.113 -0.363 -0.178 0.244 -0.162
16.00 -0.226 0.222 0.315 -0.113 -0.249 -0.391 0.027 -0.048
17.00 -0.222 0.219 0.329 -0.347 0.098 -0.083 -0.058 0.055
18.00 -0.214 0.185 0.317 -0.462 0.446 0.355 -0.075 0.122

Table 4: Components of male BMI SD

AgeC1C2C3C4C5C6C7C8
SD 3.848 1.641 0.811 0.532 0.374 0.333 0.279 0.225
1.00 -0.188 -0.368 0.314 0.31 -0.139 0.182 0.197 -0.201
1.50 -0.193 -0.375 0.263 0.277 -0.086 0.098 0.057 -0.009
2.00 -0.199 -0.367 0.195 0.189 0.01 -0.002 -0.092 0.204
3.00 -0.219 -0.293 0.048 -0.195 0.189 -0.356 -0.37 0.416
4.00 -0.228 -0.246 -0.04 -0.346 0.194 -0.339 -0.172 -0.27
5.00 -0.237 -0.188 -0.096 -0.361 0.151 -0.054 0.329 -0.459
6.00 -0.241 -0.125 -0.223 -0.317 -0.065 0.207 0.415 0.196
7.00 -0.247 -0.045 -0.276 -0.145 -0.181 0.317 -0.098 0.421
8.00 -0.248 -0.013 -0.262 -0.032 -0.225 0.357 -0.211 -0.059
9.00 -0.249 0.047 -0.282 0.11 -0.169 0.102 -0.172 -0.239
10.00 -0.247 0.102 -0.222 0.196 -0.144 -0.15 -0.18 -0.223
11.00 -0.244 0.135 -0.194 0.307 -0.053 -0.285 -0.048 -0.138
12.00 -0.242 0.16 -0.145 0.316 -0.003 -0.312 0.125 0.144
13.00 -0.242 0.18 -0.079 0.184 0.291 -0.116 0.328 0.244
14.00 -0.237 0.202 0.072 0.091 0.497 0.194 0.231 0.053
15.00 -0.229 0.239 0.205 -0.022 0.339 0.281 -0.26 -0.115
16.00 -0.222 0.255 0.332 -0.104 0.073 0.173 -0.301 -0.031
17.00 -0.219 0.258 0.363 -0.184 -0.246 -0.042 0.011 -0.051
18.00 -0.214 0.266 0.323 -0.22 -0.474 -0.263 0.224 0.127