

Principal components in the analysis of longitudinal growth data
Meigen C, Hermanussen M*
Leipzig, Germany; *Aschauhof, Germany
Abstract
We present a new approach of analysing longitudinal height and
body mass index (BMI) data using principal component analysis. The
analysis is based on series of longitudinal growth measurements
ranging from birth to maturity obtained from two growth studies,
performed at Lublin, Poland, and Zurich, Switzerland, with 248
healthy boys and 235 healthy girls. Measurements were compared with
their respective national reference standard, and converted into
height SDS and BMI SDS. The individual SDScurves were slightly
smoothed using kernel smoothing. Thereafter, principal component
analysis (PCA) was performed. Eight components are sufficient to
model all measurements with a remaining variance equivalent to the
technical error. We propose a way to apply these results to any
series of measurements, resulting in a test whether these
measurements are "normal" according to a given group of individual
children or not. The analysis not only describes longitudinal growth
regardless of the timing of the measurements, it also predicts height
and BMI outside the individual measurement period.
Introduction
Everyday clinical practice has shown that conventional growth
assessment fails to diagnose children with growth disorders at early
stage. This is true even though frequent measures of body height and
weight had often been performed (Keller et al. 2000). There is urgent
need for a diagnostic tool that is both easily accessible in everyday
practice and guarantees an automatized analysis of child growth and
development. The analysis of longitudinal data of height and weight in
child growth has a long tradition (Falkner, Tanner 1986), longitudinal
growth models were recently summarized by Ledford and Cole
(1998). Analyzing growth velocity is not a trivial task. Traditional
practice suggests measuring height and weight at annual or semiannual
intervals for which standards of growth velocity exist (Tanner
1986). Yet, measuring at exact intervals is difficult under routine
conditions. In order to describe growth independently from the timing
of the measurements, we developed the concept of growth tracks
(Hermanussen et al. 2001a, 2002). The term tracking has been used to
connote an individual's maintenance of relative rank of some
longitudinally measured characteristic over a given time span (Segal
and Tager 1993), and reflects the idea of growth canalization
(Hermanussen et al. 2001b). Growth tracks are insensitive to the
timing of the measurements, and denominate areas of probability within
which subsequent measurements of body height of healthy individuals
will likely be found. The concept of growth tracks significantly
improved separating aberrant patterns from normal growth in clinical
practice. The concept of growth tracks appeared robust against the
unpredictable dynamics of short term growth and measurement error, and
even enabled to describe future growth for a certain number of
years. Yet, in spite of its usefulness, the concept was purely
empirical, and lacked a sound mathematical foundation.
The present report offers an alternative approach. We used
principal component analysis (PCA) to describe growth, as suggested
earlier by Cole (TJ Cole, personal communication, 2002), and produced
standards for longitudinal growth from two wellreputed growth studies
with growth measurements from birth to maturity.
Material and methods
Longitudinal data on child height and weight were obtained from two
wellreputed growth studies, performed at Lublin, Poland
(ChrzastekSpruch et al. 1990), and Zurich, Switzerland (Prader et
al. 1989), with altogether 248 healthy boys and 235 healthy
girls. Details of these studies have been published elsewhere. All
children were measured from birth to maturity, usually at annual
intervals, and occasionally at six months or shorter intervals. For
the analysis, we selected measurements at the age of 1.0 years,
1.5years, 2.0 years, and at annual intervals up to the age of 18
years. We assumed a technical error (standard error) of 0.5 cm for
individual height measurements. This corresponds to some 0.12 SD in
younger children, and decreases to some 0.08 SD in adolescents and
adults. All measurements were referred to their respective national
reference standard, and converted into height and BMI standard
deviation scores (SDS). We then described growth by principal
component analysis (PCA). The purpose of PCA is to partition the
variability of the data into orthogonal components. Each component
explains a certain amount of the total variability. In a first
approach we used unsmoothed data. But unsmoothed data may lead to
small random effects even in the first components (see Figure 1 and
2). As the present PCA was intended to establish standard components
for later analysing individual data, we tried to avoid any random
effects and decided to smoothen the data already prior to the
analysis. We thereby accepted that smoothing data prior to the PCA,
may slightly distort the later components.
Figure 1: Principal component C1, C2, and C3 of female height
SD derived from unsmoothed data
Smoothing was performed using kernel smoothing (Venables &
Ripley 1997), with the density function of the normal distribution as
kernel and a bandwidth of 1.5 years. Principal components based on the
smoothed data of our samples appeared very similar to principal
components based upon unsmoothed data, but without the random effects
mentioned above(see Figure 1 and 2).
Figure 2: Principal component C1, C2, and C3 of female height
SD derived from smoothed data
For both height and BMI, this process resulted in two matrices with
19 columns for the age points and 248 rows for the boys respectively
235 rows for the girls. On these data, principal component analysis
was performed using the MVA package of the statistical language R
(Venables & Ripley 1997).
We needed eight components to model the height measurements of
every child with a remaining variance equivalent to the technical
error. This was the same in smoothed and unsmoothed data. PCA gives
not only the loadings l_{1}...l_{8} of the components
(these vectors describe the impact of the components at each of the 19
age points), but also the standard deviations
s_{1}...s_{8} of each component in the study
population (Tables 14).
The crucial question however remains: How to apply these results to
arbitrarily spaced measurements of a single individual, in order to
test whether the measurements of the individual are "normal" according
to the study population?
We developed a technique to do this and use it successfully on our
website williwillwachsen.com. We propose to employ the Maximum
Likelihood Principle, a technique to find most likely estimates for
parameters (Sachs 1978). We give a stepbystep description of our
approach. In combination with Tables 14, the algorithm provides a
test for normality of arbitrarily spaced longitudinal measurements of
height and BMI SDS.
 We have to test n measurement values m_{i }at ages
a_{i}, i=1..n, with a respective standard error
e_{i}.
 We assume that the standard error of each measurement may be
considered normally distributed (Boyd 1929).
 We can assign a SDS vector c_{1}*l_{1}+
... +c_{8}*l_{8} to every set of coefficients
c_{1}...c_{8}.
 The SDS vector can be transformed to a curve f:Age ®
Measurement by using the known norm curve to reconvert SDS to
measurement values. Linear interpolation is used between the age
points to obtain a complete curve. When we assume this to be the true
curve of measurements for that child, all differences
f(a_{i})m_{i} must be measurement errors. Based on
the known standard errors e_{i} we can assign an SDS
se_{i}=(f(a_{i})m_{i})/e_{i} to each
measurement error.
 The coefficients for the components are independent, as are
measurement errors. We therefore compute the product density by
multiplying the densities: dnorm(c_{1}/s_{1})*
... *dnorm(c_{8}/s_{8}) *dnorm(me_{1})*
... *dnorm(me_{i}), where dnorm is the density of the normal
distribution (the Gaussian "Bell curve").
 The Maximum Likelihood Principle suggests to use that set of
coefficients with the highest overall product of densities, so a
numerical optimization of the coefficients has to be performed. We use
the HookeJeeves algorithm (Bronstein 1991) with a start value of 0
for all coefficients.
 The result of this optimisation is the most likely growth curve
for that child  under the assumption that it belongs to the study
population and that the measurements had the given standard
error.
 We can now check whether these coefficients are in the normal
range, and whether the measurements are within an acceptable distance
from the curve. One could use a +/2 SD criterion on each coefficient
and on each measurement error, but this arises the problem of multiple
testing. The probability that the first coefficient lies between
+/2SD is roughly 0.95=95% for a child of the study population. The
probability, that the first and the second coefficient lie
between +/2 SD is 0.95*0.95=90.25% etc. The probability that all 8
coefficients are between +/2 SD is 0.95^8=66% for a child of the
study population, i.e., just because of the coefficients every third
healthy child would not pass the test. We use the False Discovery Rate
procedure (Benjamini & Hochberg, 1995) to correct for this
effect. The procedure calculates limits (usually much higher than 2SD)
which assure that only 5% of the healthy children are rejected, no
matter how many measurements and coefficients we use.
Result
Tables 14 illustrate the components of the height and BMI SDS
curves for both sexes. The columns contain the loadings, the standard
deviations are given in the first row. In females, the principal
component C1 explains 80.2 %, C1+C2 explain 88.9 %, C1+C2+C3 explain
95.5 %, all eight components explain 99.4 % of the height variation,
similar results were found in males, and in BMI (data not shown).
By applying the procedure described above, the data can be used to
check normality of arbitrarily spaced longitudinal measurements of
height and BMI. This procedure has been made accessible for parents
and scientists, on our website williwillwachsen.com. We present an
example of a 13 year old child suffering from constitutional delay of
growth and puberty.
Example 1:
Though the child has originally been measured at 22 age points, we
reduced the series of measurements to n=3, in order to better explain
the algorithm. The three selected measurements took place at the ages
of a_{1}=2.07, a_{2}=6.99 and a_{3}=12.7 years
resulting in height values of m_{1}=89.3 cm,
m_{2}=120.1 cm, and m_{3}=149.9 cm, respectively. We
assume a measurement error of 0.75cm (conservative estimate for
measurements performed by parents), so
e_{1}=e_{2}=e_{3}=0.75.
We start with the value 0 for all coefficients, and the estimated
SDS curve c_{1}*l_{1}+...+c_{8}*l_{8}
is constantly 0. I.e., the curve f (stepbystep description #4)
exactly equals the 50^{th} centile of the German reference
population (Hermanussen et al. 1999), and differs from the 3rd
measurement by 4.4 cm, so se_{3}>5 and
dnorm(me_{3})<1e6. Therefore, the overall product density
is very small, indicating that the first guess is
inappropriate. Applying the HookeJeeves algorithm to optimise the
coefficients quickly results in c_{1}=2.48,
c_{2}=1.00, c_{3}=0.15, c_{4}=0.47,
c_{5}=0.07, c_{6}=0.07, c_{7}=0.02, and
c_{8}=0.02.
Calculating c_{1}*l_{1}+c_{2}*l_{2}+
... +c_{8}*l_{8}, i.e. multiplying the coefficients
with the respective rows in Table 1 and summing up the vectors,
results in an estimated SDS curve. When reconverting the SDS curve
into centimetres, we obtain the following values for the 19 age
points: 77.29, 83.62, 88.21, 95.90, 102.94, 108.86, 115.12, 120.45,
125.58, 130.73, 135.97, 140.91, 146.29, 151.46, 157.60, 164.13,
169.29, 172.39, 173.76.
SDS values of 3 height measurements and estimated curve
By using linear interpolation, we get the following height
estimates for the 3 ages of a_{1}=2.07, a_{2}=6.99 and
a_{3}=12.7 years: f(a_{1})=88.77,
f(a_{2})=120.43, and f(a_{3})=149.93. The respective
standard error is se_{1}=(88.789.3)/0.75=0.71,
se_{2}=(120.43120.1)/0.75=0.44,
se_{3}=(149.93149.9)/0.75=0.04. i.e., the height measurements
are therefore within an acceptable distance from the estimated curve f
(stepbystep description #8). When dividing the coefficients by the
respective standard deviation of that coefficient in the population
(Table 1), we get c_{1}/s_{1}=0.52,
c_{2}/s_{2}=1.05, c_{3}/s_{3}=0.25,
c_{4}/s_{4}=0.73, c_{5}/s_{5}=0.24,
c_{6}/s_{6}=0.28, c_{7}/s_{7}=0.10 and
c_{8}/s_{8}=0.08. These coefficients are within the
normal range, even without applying the False Discovery Rate procedure
to correct for multiple testing.
Example 2:
We check the same male child, but take all n=22 height measurements
into account. The child shows a clinically abnormal growth pattern, as
he suffers from constitutional delay of growth and puberty. More
measurements increase the accuracy of our analysis, and reveal the
abnormal growth pattern with no pubertal growth spurt up to the age of
13 years. After early childhood growth retardation, the
child continues growing at a lower centile, but up to the age of 13
years with no evidence of pubertal growth. The algorithm results in
c_{1}/s_{1}=0.23, c_{2}/s_{2}=0.40,
c_{3}/s_{3}=3.20, c_{4}/s_{4}=0.80,
c_{5}/s_{5}=1.49, c_{6}/s_{6}=0.05,
c_{7}/s_{7}=0.15 and
c_{8}/s_{8}=0.15. With
c_{3}/s_{3}=3.20, this pattern significantly deviates
from normality, and is easily detected by the algorithm.
Figure 4: SDS values of 22 height measurements and estimated curve
The example illustrates the importance of longterm multiple
measurements for proper analysing individual growth
curves. Calculating centiles for height or traditional growth
velocity, may often be insufficient to detect aberrant growth pattern
in children.
Discussion
Conventional growth assessment often fails to diagnose children
with growth disorders at early stage (Keller et al. 2000) because an
effective tool for nationwide early screening of abnormal growth is
not present. We propose an automatized analysis of child growth that
is accessible not only for medical personnel, but may also be used by
parents and nonmedical staff.
The analysis of longitudinal data is effectively an analysis of
growth functions (Ramsay & Silverman 1997). Using the SDS at the
distinct age points directly to describe these functions is a very
trivial way compared to more sophisticated ways to model growth
functions (Ledford & Cole, 1998), but it avoids several problems
that are associated with these models. In fact, the result of the PCA
on the SDS values can itself be viewed as an eightparametric model
for growth functions, constructed from the study population.
Only the first component describing tallness and shortness of
height, or heaviness/lightness, truly reflect biological
properties. Although the second and third components still appear
related to biological phenomena (centile crossings, and timing of the
onset of puberty), the full set of components is a mathematical
description with no obvious relation to known biological
properties.
The procedure does not only describe patterns of growth during the
period measurements have been performed; the analysis assigns the most
probable set of components, regardless of the timing of the
measurements, and also predicts height and BMI outside of the
measurement period. The present analysis bases upon standard deviation
scores and therefore, strongly depends on the reference standard
used. The analysis can be used wherever reliable references are
available. If reliable references are not available, references may be
synthesized as previously described (Hermanussen and Burmeister
1999). We are still interested in longitudinal data of height and
weight from children of developing countries in order to further
analyze whether the present procedure may also be used among these
populations.
Acknowledgements
We are very grateful for numerous advice and fruitful discussions
with Prof. TJ Cole, London, UK, Prof. James Ramsay, Montreal, Canada,
Dr. Luciano Molinari, Zürich, Switzerland, and Dr. Marek Brabec,
Prague, Czech Republic. This research was supported by GrandisBioTech,
Deutsche Gesellschaft für Auxologie, and by Deutsche
Forschungsgemeinschaft, Grant HE 1440/51.
References
 Benjamini, Y., Hochberg, Y., 1995. Controlling the False Discovery
Rate: A Practical and Powerful Approach to Multiple Testing, Journal
of the Royal Statistical Society. Series B, Volume 57, Issue 1,
289300.
 Boyd, E., 1929. The experimental error inherent in measuring the
growing human body. Am. J. Phys. Anthropol. 13:389432.
 Bronstein, I.N., Semendjajew, K.A., 1991. Taschenbuch der
Mathematik, Leipzig.
 ChrzastekSpruch, H., Susanne, C., Hauspie, R., 1990. Standards
for height and height velocity for Polish children. Studies in Human
Ecology 9:179197.
 Hermanussen, M., Burmeister, J., 1999. Synthetic growth
charts. Acta Paediatr. 88:80914.
 Hermanussen, M., Lange, S., Grasedyck, L., 2001a. Growth tracks in
early childhood. Acta Paediatr. 90:3816.
 Hermanussen, M., Largo, R. H., Molinari, L., 2001b. Canalisation
in human growth: a widely accepted concept
reconsidered. Eur. J. Pediatr. 160:1637.
 Hermanussen, M., Grasedyck, L., KromeyerHauschild, K., Prokopec,
M., ChrzastekSpruch, H., 2002. Growth tracks in prepubertal
children. Ann. Hum. Biol. 29:667676.
 Ledford, A. W., Cole, T. J., 1998. Mathematical models of growth
in stature throughout childhood. Ann. Hum. Biol. 25:101115.
 Meigen, C., Hermanussen, M., 2003: Automatic Analysis of
Longitudinal Growth Data on the website williwillwachsen.de. Homo
Vol. 54/2, 157161.
 Prader, A., Largo, R. H., Molinari, L., Issler C., 1989. Physical
growth of Swiss children from birth to 20 years of
age. Helv. Paediat. Acta 43. Suppl. 52:1125.
 Ramsay, J. O., Liverman, B. O., 1997, Functional Data Analysis,
Springer Series in Statistics
 Sachs, L., 1978. Angewandte Statistik, Berlin, Heidelberg, New
York.
 Segal, M. R., Tager, I.B., 1993. Trees and tracking. Statistics in
Medicine 12:215368.
 Tanner, J. M., 1986. Use and abuse of growth standards. In Human
growth, edited by Falkner, F., Tanner, J. M., volume 3, 2nd edition
(New York, London: Plenum Press) 95109.
 Venables, W. N., Ripley, B. D., 1997. Modern Applied Statistics
with SPLUS, SpringerVerlag.
Appendix: Principal Components derived from the analysis
Table 1: Components of female Height SD
Age  C1  C2  C3  C4  C5  C6  C7  C8 
SD  3.940  1.306  1.101  0.589  0.292  0.206  0.164  0.133 
1.00  0.203  0.381  0.178  0.398  0.341  0.173  0.291  0.128 
1.50  0.210  0.372  0.176  0.312  0.150  0.026  0.034  0.041 
2.00  0.217  0.353  0.156  0.196  0.077  0.129  0.361  0.139 
3.00  0.230  0.281  0.040  0.139  0.418  0.422  0.317  0.051 
4.00  0.236  0.227  0.033  0.262  0.379  0.122  0.275  0.214 
5.00  0.240  0.174  0.088  0.277  0.159  0.270  0.450  0.054 
6.00  0.244  0.120  0.129  0.248  0.006  0.278  0.164  0.248 
7.00  0.246  0.050  0.148  0.222  0.136  0.226  0.208  0.292 
8.00  0.246  0.015  0.171  0.188  0.220  0.196  0.357  0.104 
9.00  0.244  0.067  0.197  0.150  0.284  0.038  0.273  0.199 
10.00  0.24  0.095  0.249  0.034  0.3  0.222  0.010  0.443 
11.00  0.231  0.124  0.316  0.129  0.160  0.409  0.205  0.080 
12.00  0.224  0.181  0.316  0.317  0.080  0.260  0.198  0.386 
13.00  0.225  0.247  0.186  0.373  0.334  0.216  0.040  0.225 
14.00  0.232  0.268  0.059  0.251  0.292  0.338  0.132  0.335 
15.00  0.228  0.250  0.253  0.054  0.102  0.131  0.072  0.326 
16.00  0.221  0.23  0.349  0.079  0.036  0.080  0.042  0.006 
17.00  0.216  0.224  0.385  0.132  0.110  0.144  0.097  0.199 
18.00  0.213  0.226  0.399  0.148  0.133  0.160  0.112  0.225 
Table 2: Components of male Height SD
Age  C1  C2  C3  C4  C5  C6  C7  C8 
SD  4.139  0.934  0.621  0.605  0.327  0.214  0.172  0.134
 1.00  0.203  0.381  0.178  0.398  0.341  0.173  0.291  0.128 
1.50  0.210  0.372  0.176  0.312  0.150  0.026  0.034  0.041 
2.00  0.217  0.353  0.156  0.196  0.077  0.129  0.361  0.139 
3.00  0.230  0.281  0.040  0.139  0.418  0.422  0.317  0.051 
4.00  0.236  0.227  0.033  0.262  0.379  0.122  0.275  0.214 
5.00  0.240  0.174  0.088  0.277  0.159  0.270  0.450  0.054 
6.00  0.244  0.120  0.129  0.248  0.006  0.278  0.164  0.248 
7.00  0.246  0.050  0.148  0.222  0.136  0.226  0.208  0.292 
8.00  0.246  0.015  0.171  0.188  0.220  0.196  0.357  0.104 
9.00  0.244  0.067  0.197  0.150  0.284  0.038  0.273  0.199 
10.00  0.24  0.095  0.249  0.034  0.3  0.222  0.010  0.443 
11.00  0.231  0.124  0.316  0.129  0.160  0.409  0.205  0.080 
12.00  0.224  0.181  0.316  0.317  0.080  0.260  0.198  0.386 
13.00  0.225  0.247  0.186  0.373  0.334  0.216  0.040  0.225 
14.00  0.232  0.268  0.059  0.251  0.292  0.338  0.132  0.335 
15.00  0.228  0.250  0.253  0.054  0.102  0.131  0.072  0.326 
16.00  0.221  0.23  0.349  0.079  0.036  0.080  0.042  0.006 
17.00  0.216  0.224  0.385  0.132  0.110  0.144  0.097  0.199 
18.00  0.213  0.226  0.399  0.148  0.133  0.160  0.112  0.225 
Table 3: Components of female BMI SD
Age  C1  C2  C3  C4  C5  C6  C7  C8 
SD  3.856  1.461  0.952  0.641  0.469  0.339  0.299  0.255 
1.00  0.198  0.368  0.265  0.208  0.346  0.148  0.238  0.201 
1.50  0.205  0.373  0.232  0.212  0.195  0.059  0.132  0.005 
2.00  0.21  0.367  0.181  0.177  0.022  0.032  0.028  0.229 
3.00  0.216  0.332  0.06  0.083  0.345  0.026  0.471  0.393 
4.00  0.234  0.231  0.064  0.21  0.313  0.028  0.268  0.141 
5.00  0.242  0.176  0.133  0.165  0.104  0.183  0.036  0.487 
6.00  0.241  0.116  0.245  0.216  0.079  0.206  0.12  0.331 
7.00  0.242  0.053  0.292  0.169  0.051  0.232  0.376  0.144 
8.00  0.243  0.028  0.278  0.102  0.019  0.129  0.401  0.498 
9.00  0.244  0.064  0.289  0.023  0.142  0.336  0.075  0.137 
10.00  0.242  0.126  0.245  0.037  0.244  0.324  0.264  0.205 
11.00  0.242  0.149  0.219  0.121  0.211  0.198  0.269  0.056 
12.00  0.24  0.184  0.11  0.307  0.135  0.211  0.264  0.082 
13.00  0.233  0.223  0.005  0.385  0.016  0.386  0.088  0.001 
14.00  0.229  0.245  0.142  0.313  0.245  0.241  0.141  0.008 
15.00  0.226  0.247  0.252  0.113  0.363  0.178  0.244  0.162 
16.00  0.226  0.222  0.315  0.113  0.249  0.391  0.027  0.048 
17.00  0.222  0.219  0.329  0.347  0.098  0.083  0.058  0.055 
18.00  0.214  0.185  0.317  0.462  0.446  0.355  0.075  0.122 
Table 4: Components of male BMI SD
Age  C1  C2  C3  C4  C5  C6  C7  C8 
SD  3.848  1.641  0.811  0.532  0.374  0.333  0.279  0.225 
1.00  0.188  0.368  0.314  0.31  0.139  0.182  0.197  0.201 
1.50  0.193  0.375  0.263  0.277  0.086  0.098  0.057  0.009 
2.00  0.199  0.367  0.195  0.189  0.01  0.002  0.092  0.204 
3.00  0.219  0.293  0.048  0.195  0.189  0.356  0.37  0.416 
4.00  0.228  0.246  0.04  0.346  0.194  0.339  0.172  0.27 
5.00  0.237  0.188  0.096  0.361  0.151  0.054  0.329  0.459 
6.00  0.241  0.125  0.223  0.317  0.065  0.207  0.415  0.196 
7.00  0.247  0.045  0.276  0.145  0.181  0.317  0.098  0.421 
8.00  0.248  0.013  0.262  0.032  0.225  0.357  0.211  0.059 
9.00  0.249  0.047  0.282  0.11  0.169  0.102  0.172  0.239 
10.00  0.247  0.102  0.222  0.196  0.144  0.15  0.18  0.223 
11.00  0.244  0.135  0.194  0.307  0.053  0.285  0.048  0.138 
12.00  0.242  0.16  0.145  0.316  0.003  0.312  0.125  0.144 
13.00  0.242  0.18  0.079  0.184  0.291  0.116  0.328  0.244 
14.00  0.237  0.202  0.072  0.091  0.497  0.194  0.231  0.053 
15.00  0.229  0.239  0.205  0.022  0.339  0.281  0.26  0.115 
16.00  0.222  0.255  0.332  0.104  0.073  0.173  0.301  0.031 
17.00  0.219  0.258  0.363  0.184  0.246  0.042  0.011  0.051 
18.00  0.214  0.266  0.323  0.22  0.474  0.263  0.224  0.127 


