## Friday, May 22, 2009

### Using Logarithmic Transformations When Fitting Allometric Data

In the 4th Edition of Intermediate Physics for Medicine and Biology, Russ Hobbie and I discuss least squares fitting. A homework problem at the end of Chapter 11 (see page 321) asks the student to fit some data to a power law.
"Problem 11 Consider the data given in Problem 2.36 relating molecular weight M and molecular radius R. Assume the radius is determined from the molecular weight by a power law: R = B M^n [this blog does not do math well, "^n" means superscript n]. Fit the data to this expression to determine B and n. Hint: Take logarithms of both sides of the equation."
The solution manual (available at the book's website, contact one of the authors for the password) outlines how taking logarithms makes the problem linear, so a simple linear least squares fit gives the solution R = 0.0534 M^0.371.

However, inquisitive students may ask "what if I don't follow the hint, and do a least squares fit to the original power law without taking logarithms. Do I get the same result?" This becomes a more difficult problem, since you must now make a nonlinear least squares fit. Nevertheless, I solved the problem this way (using a terribly inefficient iterative guess-and-check method) and found R = 0.0619 M^0.358.

Which solution is correct? Gary Packard and Thomas Boardman, both from Colorado State University, address this question in their paper "A Comparison of Methods for Fitting Allometric Equations to Field Metabolic Rates of Animals" (J. Comp. Physiol. B, Volume 179, Pages 175–182, 2009), and find that
"the discrepancies could be caused by four sources of bias acting singly or in combination to cause exponents (and coefficients) estimated by back-transformation from logarithms to be inaccurate and misleading. First, influential outliers may go undetected in some analyses ... owing to the altered relationship between X and Y variables that accompanies logarithmic transformation ... Second, the use of logarithmic transformations may result in the fitting of mathematical functions (i.e., two-parameter power functions) that are poor descriptors of data in the original scale ... Third, a two-parameter power function ... fitted to the original data by least squares invokes a statistical model with additive error Y = aX^b + e and predicts arithmetic means for Y whereas a straight line fitted to logarithmic transformations of the data by least squares invokes an underlying model with multiplicative error Y = aX^b 10^e and predicts geometric means for the response variable ... And fourth, linear regression on nonlinear transformations like logarithms may introduce further bias into analyses by the unequal weighting of large and small values for both X and Y...

Conversion to logs results in an overall compression of the distributions for both the Y- and X-variables, but the compression is greater at the high ends of the scales than at the low ends ... Consequently, linear regression on transformations gives unduly large influence to small values for Y and X and unduly small influence to large ones ... This disparate influence is apparent in plots of back-transformations against the backdrop of data in their original scales, where the location of data for the largest animals had little apparent influence on fits of the lines."
Their paper concludes
"Why transform? Log transformations have a long history of use in allometric research ... and have been justified on grounds ranging from linearizing data to achieving better distributions for purposes of graphical display ... However, most of the reasons for making such transformations disappeared with the advent of powerful PCs and sophisticated software for graphics and statistics. Indeed, the only ongoing application for log transformations in allometric research is in adjusting (‘‘stabilizing’’) distributions when residuals from analyses performed in the original scale are not distributed normally and/or when variances are not constant at all values for X. Assuming that log transformations actually linearize the data and produce the desired distributions, the regression of log Y on log X will yield evidence for a dependency between Y and X values in their original state, and statistical comparisons can be made with other samples that also are expressed in logarithmic form. However, interpretations about patterns of variation of the variables in the arithmetic domain seldom are warranted..., because transformation fundamentally alters the relationship between the predictor and response variables. Interest typically is in patterns of variation of data expressed in an arithmetic scale, so this is the scale in which allometric analyses need to be performed if it is at all possible to do so.

Implications for allometric research. Accumulating evidence from the field of biology ... and beyond ... gives cause for concern about the accuracy and reliability of allometric equations that have been estimated in the traditional way ... This concern has special bearing on the current debate about the ‘true’’ exponent for scaling of metabolism to body mass because exponents of 2/3 and 3/4 need both to be viewed with some skepticism. The aforementioned evidence also indicates that the traditional approach to allometric analysis may need to be abandoned in favor of a new research paradigm that will prevent future studies from being compromised by the insidious effects of logarithmic transformations."
In the above quote, many of the "..."s indicate important references that I skipped to save space.

Packard and Boardman make a persuasive case that you might want to ignore our hint at the end of Problem 11. However, if you do ignore it, you had better be prepared to do nonlinear least squares fitting. See Sec. 11.2, "Nonlinear Least Squares", in our book to get started.

For more about this subject, see Packard's letter to the editor in the Journal of Theoretical Biology (Volume 257, Pages 515–518, 2009). Also, Russ Hobbie has a paper submitted to the journal Ecology that discusses a similar issue with exponential, rather than power law, least squares fits ("Single pool exponential decomposition models: Potential pitfalls in their use in ecological studies"). Russ's coatuhors are E. Carol Adair and Sarah E. Hobbie (Russ's daughter), both of the University of Minnesota in Saint Paul.