Linear regression books usually include a footnote that you might have to transform your data before you can apply regression. However, they seldom give any guidance on how to pick a transformation. Just try something until your scatterplots look linear.

John Tukey gave a nice heuristic for linearizing data in his 1977 book Exploratory Data Analysis. Tukey gives what he calls a **ladder of transformations**.

*y*^{3}

*y*^{2
}*y*

√*y*

log *y*

–*y*^{-1
}–*y*^{-2
}–*y*^{-3}

Try transformations in the direction of the bulge in the plot. If the plot bulges up (say your plot looks something like *y*=√*x*), then move up the ladder from the identity: try squaring or cubing the data. Or if you’re going to transform *x*, think of the ladder as horizontal, from *x*^{3} to –*x*^{-3}. If the bulge is down and to the right, either move down the *y*-ladder or to the right on the *x*-ladder.

(If you know of a good presentation of this topic online, something with good illustrations, please let me know and I’ll link to it. I did a quick search and found several hits, but the ones I looked at lacked clear pictures.)

**Related**: Applied linear regression

How to know was transformation was the best? Can I use correlation to decide which transformation was better?

@ciro: You just have to plot the residuals and see which transformation makes the residuals look most normal. You could do a test for normality, but that’s overkill. This sort of thing isn’t rigorously justified to begin with, so there’s no point getting too fussy about it.

I also write to your email, sorry for that. So you mean to plot the residual of all the transformations, and use the transformation in which the residuals look most normal?. So, as increases normality increases linearity as well?. What about using correlations. I mean, if the relationships are nonlinear, after transformations the correlations should improve. I am using path analysis; therefore, I need to work with linear relationships. Nonnormality is not a problem to me because I could fit the model with generalized least squares instead of maximum likelihod. Regards, and thanks a lot for your help.

So let me get this straight. If the plot is concave up, the y data is transformed using the bottom of the ladder. Then if it is concave down, y is transofrmed using the bottom of the ladder. If I’m thinking correctly, there is no way to know exactly which transformation to use for a set of data that needs to be transformed. We are working onthis in AP Stats now and alot of the plots we transform are indeed curved, but the correlation coefficient is .9+. Even if the correlation is high, I think that you still must linearize it if it visually does not look to be straight. I’ve been having trouble recently in statistics so I would really appreciate any help.

A concave situation is nice and safe. When it becomes linear, nothing is lost. A convex situation should not be linearized, because that increases fragility.

If you want to find an ‘optimal’ transform for something that should be normal, a box-cox normality plot is one place to start. A similar plot of correlation versus transform exponent can find a maximum for a straight-line fit.

But why linearise at all? First, an arbitrary transform is just that – arbitrary. Second, we are no longer limited to linearity. Why not use tools like generalised linear modelling that respect the distribution you have, rather than bend the data to fit a limited toolkit?