# Method of Least Squares

As cautioned, the high-low method can be quite misleading. The reason is that cost data are rarely as linear as presented in the preceding illustration, and inferences are based on only two observations (either of which could be a statistical anomaly or "outlier"). For most cases, a more precise analysis tool should be used. If you have studied statistical methods, recall "regression analysis" or the "method of least squares." This tool is ideally suited to cost behavior analysis. This method appears to be imposingly complex, but it is not nearly so complex as it seems. Let's start by considering the objective of this calculation.

The goal of least squares is to define a line so that it fits through a set of points on a graph, where the cumulative sum of the squared distances between the points and the line is minimized (hence, the name "least squares"). Simply, if you were laying out a straight train track between a lot of cities, least squares would define a straight-line route between all of the cities, so that the cumulative distances (squared) from each city to the track is minimized.

Let's dissect this method, beginning with the definition of a line. A line on a graph can be defined by its intercept with the vertical (Y) axis and the slope along the horizontal (X) axis. In the following diagram, observe a red line starting on the Y axis (at the value of "2"), and rising gently upward as it moves out along the X axis. The rate of rise is called the slope of the line; in this case, the slope is 0.8, because the line "rises" 8 units on the Y axis for every 10 units of "run" along the X axis.

In general, a straight line can be defined by this formula:

where:

For the line drawn on the previous page, the formula would be:

And, if you wished to know the value of Y, when X is 5 (see the red circle on the line), you perform the following calculation:

Now, lets move on to fitting a line through a set of points. On the next page is a table of data showing monthly unit production and the associated cost (sorted from low to high). These data are plotted on the graph to the right. Through the middle of the data points is drawn a line, and the line has a formula of:

This formula suggests that fixed costs are $138,533, and variable costs are $10.34 per unit. For example, how much would it cost to produce about 110,000 units? The answer is about $1,275,000 ($138,533 + ($10.34 * 110,000)).

How was the formula derived? One approach would be to "eyeball the points" and draw a line through them. You would then estimate the slope of the line and the Y intercept. This approach is known as the scatter graph method, but it would not be precise. A more accurate approach, and the one used to derive the above formula, would be the least squares technique. With least squares, the vertical distance between each point and resulting line (e.g., as illustrated by an arrow at the $1,500,000 point) is squared, and all of the squared values are summed. Importantly, the defined line is the one that minimizes the summed squared values! This line is deemed to be the best fit line, hopefully giving the clearest indication of the fixed portion (the intercept) and the variable portion (the slope) of the observed data.

One can always fit a line to data, but how reliable or accurate is that resulting line? The R-Square value is a statistical calculation that characterizes how well a particular line fits a set of data. For the illustration, note (in cell B21) an R2 of .798; meaning that almost 80% of the variation in cost can be explained by volume fluctuations. As a general rule, the closer R2 is to 1.00 the better; as this would represent a perfect fit where every point fell exactly on the resulting line.

The R-Square method is good in theory. But, how does one go about finding the line that results in a minimization of the cumulative squared distances from the points to the line? One way is to utilize built-in tools in spreadsheet programs, as illustrated above. Notice that the formula for cell B21 (as noted at the top of spreadsheet) contains the function RSQ(C5:C16,B5:B16). This tells the spreadsheet to calculate the R2 value for the data in the indicated ranges. Likewise, cell B20 is based on the function SLOPE (C5:C16,B5:B16). Cell B19 is INTERCEPT(C5:C16,B5:B16). Most spreadsheets provide intuitive pop-up windows with prompts for setting up these statistical functions.

Spreadsheets have not always been available. You may be curious to know the underlying mechanics for the least squares method. If so, you can check out the link on the website.

# Recap

Before moving on, let's review a few key points. A good manager must understand an organization's cost structure. This requires careful consideration of variable and fixed cost components. However, it is sometimes difficult to discern the exact cost structure. As a result, various methods can be employed to analyze cost behavior. Once an organization's cost structure is understood, it then becomes possible to perform important diagnostic calculations which are the subject of the next sections of this chapter.