This is a guest post by Michael Ross from RER Energy Inc. Michael is teaching a 6-week, 30 hour class on Mastering RETScreen for Clean Energy Project Analysis. The class is capped at 50 students, and there are only 30 discounted seats. Get your discount here.

Article Goal 

This article shows engineers and energy data analysts how to “normalize” energy consumption or production to account for the variation in weather over time. By the end of the article, you should understand why normalizing for weather is important, and how it can be done, either in a spreadsheet or using a free tool called RETScreen® Plus.

Why Normalize for Weather?

The need to “normalize” for weather arises very often. For example, you have a year or two of utility bills for a facility, you plan on improving the energy efficiency of the facility, and you need to estimate what the energy savings will be in the future. One challenge is that the past energy consumption is determined not just by the equipment at the facility, but by the variations in the weather experienced by the facility. What if the winter covered by the utility bills was especially cold, and as a consequence gas consumption was higher than typical? Basing your estimates of savings on a single year, without “normalizing” for weather, or explicitly adjusting the consumption to reflect typical weather conditions, will cause you to overestimate the typical savings in the future.

Normalizing for weather is a good idea whenever an accurate understanding of the current energy consumption of a facility (a “baseline”) is needed; otherwise, as suggested in the previous example, estimates of future savings arising from improvements to the existing facility may be too high or too low, and consequently inferences that a proposed improvement is cost-effective may not turn out to be correct (or, conversely, a truly cost-effective opportunity may be missed).

The need to normalize may also appear in energy production projects. For example, a photovoltaic system might produce more electricity in one year than in the previous year. Is this merely because there was more sunshine in the second year? If so, did this additional sunshine hide deterioration in the system operation?

Sometimes normalizing for weather is not merely a good idea, but rather a requirement of a client or a utility or government funding program. For example, I recently conducted a study for a client who was seeking funding from the Federation of Canadian Municipalities (FCM). The client needed to show how much connecting his building to a district heating system would reduce overall natural gas consumption (and thereby greenhouse gas emissions). The FCM program stipulated that any study had to first normalize past energy consumption for variation in the weather, and then project savings into the future based on typical weather.

Normalizing for Weather: the Theory

Normalizing for weather is, in principal, straight forward:  you “fit” a statistical model (i.e., an equation) that relates you consumption data (e.g., utility bill consumption) to one or more variables that you think exercise an influence on consumption (e.g., heating or cooling degree days).  When “fitting” the model to the data, you adjust the coefficients of the equation until sum of squared differences between the actual consumption data and the modeled consumption data is minimized. Often a linear equation is used for the statistical model, and the process is called “linear regression”.

So, for example, you might produce a scatterplot of daily average gas consumption for each billing period against the average number of heating degree days per day for the billing period, as shown in the figure below.



I’ve superimposed a straight line on the scatterplot to make it evident that there is a linear relationship between the fuel consumption and the heating degree days. That is, I should be able to estimate with some accuracy the fuel consumption using an equation of the form[1]:

This equation has the right form, but what should I use for the coefficients a and b? A common approach is to select a and b in such a way as to minimize the “sum of squared errors”, or SSE.  To do this manually, I start out with a guess for these coefficients, and then I use this equation to estimate the fuel consumption for each billing period. I then compare these estimates with the actual fuel consumption for each billing period. If I square the difference of the two and sum over all billing periods, I’ll have the SSE. This is a measure of how well my choice of coefficients fits this equation to the data; I adjust the coefficients until the SSE is as small as I can make it (unless the line passes through every data point exactly, the SSE will not go to zero).



Then I’ve got my equation. For the data from the example above, it would be:


I can then use this equation to estimate the gas consumption based on the heating degree days. So, for example, imagine that for the location of this building, a typical month of March will have 620 heating degree days (°C·day). That works out to 20 heating degree days per day. If I wanted to know what the facility’s gas consumption in a typical March would be, I’d plug this into the equation:


This would tell me that on an average March day, I’d require 6.6 GJ of gas, so over the whole month I’d consume around 206 GJ of gas. To determine the gas consumption in a typical year, I do this same exercise for each month’s typical number of heating degree days.

Normalizing for Weather Using RETScreen® Plus

While this normalization can be done using a spreadsheet, my tool of preference is RETScreen® Plus, a sister program to the better known but completely different RETScreen® 4. (Both tools are available for download, for free, from the Government of Canada:

RETScreen® Plus is designed precisely for this type of exercise (as well as much more in-depth analyses to be discussed in later articles), and consequently much quicker and (less error-prone) than doing the manual exercise outlined above. The main program features that make it quicker and easier than the manual exercise are:

1)     Rapid access to up-to-date daily weather data for locations across the globe

2)     Tools for combining and regrouping data sets on different time bases.

3)     Automatic fitting of equations

4)     Optimization of the heating degree day reference temperature

Let’s examine each of these advantages by going through the key steps for normalizing for weather data using RETScreen® Plus.

I’ll start by asking my client for utility bills. He sends me a spreadsheet for the period of 2012 through 2013, indicating for each bill the billing date and the billed gas consumption (in GJ) for the period:


Note that the “monthly” bills are not all dated on the same day of the month, and the number of days in the billing period changes from bill to bill. Also note that I’m missing the bill for May 23. Such are the complications of the real world.

Next, I open RETScreen® Plus. The first key step is to tell it where my building is located; it will be apparent why we need to do this when we need to get weather data. There are a variety of ways to specify the project location, but the fanciest is through a map interface that lets me indicate the project location with a thumbtack:



Then I import my Excel spreadsheet of utility data into RETScreen® Plus. I tell it that the data I want to investigate is for “Fuel Consumption”, specifically natural gas measured in GJ. It opens a blank table:

Screen shot 2014-06-19 at 10.24.00 AM

I fill this table by “Importing from file…” and selecting my Excel file. A dialog box pops up and I see that it has correctly interpreted the headers in the file, with the exception of the gas consumption, which I have to pick from a drop down list:



When I click on the green checkmark, I get another dialog box identifying the missing data for May and giving me some choices for dealing with this, such as using the average for the whole data set, interpolating between adjacent data points, deleting the whole row, or repeating the previous value. I chose to simply ignore the missing data for now. RETScreen inserts this data into my table, automatically calculating the number of days in each billing period:


With that half my data is in the tool. But now I need to tell RETScreen what the “factors of influence” in this data are: that is, what variables are likely to exert an influence on the gas consumption. When normalizing for weather, the answer is pretty clear (it’s the weather, obviously), but in different applications of the tool it might be factory production, hotel occupancy, or something else.

Thus, I need to get weather data for 2012 and 2013. Ideally, this weather would be on the same time basis as my utility bills. That is, I’d have the average weather conditions for my site for the first, second, etc. billing periods.

In the past, this would be a challenge. I might easily find a table of monthly climate norms, but this is no good because it is not specific to the 2012 through 2013 period for which I have data. With some luck I might find a nearby weather station from which I could obtain monthly data for the 2012-2013 period, but I’d still have to manipulate this data to get it to correspond to the billing periods. If I were really lucky, I’d find a nearby weather station with daily weather data and I’d calculate the averages for each billing period.

With RETScreen® Plus, it turns out, we are really lucky—we get precisely what we would most like in terms of weather data. By clicking on the icon labelled “Download NASA satellite data”, we instantly get access to a wide range of daily weather variables for pretty much any location, for pretty much any period since 1983 up to, in most cases, last week. In the list of four RETScreen® Plus features that make it much faster than a manual approach, this is number one.

The NASA weather data has been derived from satellites, which means that there is some noise in the day-to-day estimates, and certain variables are more accurate than others (for example, winds at your site are likely to be somewhat different from the NASA estimates). But on the whole, the data set is pretty good, and certainly sufficient for the typical requirements of weather normalization. If you are not convinced, you can choose to import weather data from your own sources into RETScreen° Plus, and use it instead of the NASA data.

Now I have all the data I need: my utility bills and my weather data. But I still have some processing to do.

First, the heating degree days are not contained in the NASA weather data, but they can be derived from the average, min, and max daily temperatures. Clicking on the RETScreen® Plus “heating degree days…” icon, I add columns to my daily weather data that give the heating degree days for each day. I have to specify the reference temperature—that is, the ambient temperature above which no heating is required. For this facility, I don’t have a clue what this should be, so I stick with the default value of 16°C. We’ll revisit this later.

Second, my two data tables are still on different time bases: I’ve got gas consumption for approximately monthly billing periods, but my weather data is on a daily time basis. If I click on the RETScreen® Plus icon for “Merge…”, however, I can merge the two data sets. I tell RETScreen that I want to take the heating degree days from the daily weather data table and add it to the natural gas consumption table. Click—you’re done:


It is worth pausing here and thinking about how much time that button just saved us. Even though merging two data tables is a deceptively simple operation, doing it manually with most available tools requires some thought. If you are really good with Excel, you’ll see that you can do that merge with some lookup functions; if you’re not such a guru, you’re in for some drudgery. Or maybe you’re a masochist and you do all your data processing with Matlab, C++, or (horrors) Fortran. Once you’ve got that simple code running, or set up the Excel spreadsheet, and once you’ve checked for off-by-one errors and other gaffs, I’m sure that at least 20 minutes will have elapsed, likely a lot more. Thus, this button is the second item in the list of ways that RETScreen® Plus saves us time and makes our life easier.

At this point, I’m itching to see whether there seems to be any correlation between my utility gas consumption and the number of heating degree days. RETScreen® Plus provides a lot of different options for graphs. Here is one that makes me hopeful that there is a strong correlation:


To investigate in detail, I go to the Analytics section of RETScreen and select “Establish baseline”… “Regression analysis”. This is how I actually get RETScreen to fit an equation that explains the natural gas consumption in terms of the heating degree days.


A dialog box opens, and I tell RETScreen what my dependent and independent variables are. The dependent variable is the one that I am trying to predict or explain, in this case, natural gas consumption. The independent variable is the factor that influences this, in this case the heating degree days. (Note that RETScreen permits multiple independent variables—i.e., “multiple regression”. While Excel can automatically fit a few simple equations to a scatterplot, it does not perform multiple regression automatically). By selecting “daily” for the method, I tell RETScreen that, for each billing period, I want to deal with the average fuel consumption per day and the average heating degree days per day:


Before I can get RETScreen to fit an equation to my data, there is one last step: I need to tell it what subset of data to use when it calculates the sum of squared errors. In particular, I tell it to start with the second period (because I don’t actually know when the first billing period started) and to ignore the billing period ending on May 23, 2012—that was the bill for which the client lacked data. Fortunately, there is an “exclude data” button for this latter operation:


With that done, RETScreen fits an equation to the data. The need for us to calculate the sum of squared errors and search for the best coefficients is eliminated by an automatic operation: this is the third time-saving feature in our list.

RETScreen chooses a linear equation by default, and this is warranted for our gas consumption/heating degree day data. If the relationship between gas consumption and heating degree days were not linear, however, I could click on “select equation” and choose from a wide range of equations:



The table of equations indicates the coefficient of determination, or R2, value. This value indicates how much of the variation around the mean value for the gas consumption is explained by the variation around the mean value heating degree days. A value of one indicates that the relationship perfectly predicts the gas consumption for every billing period. A value of zero indicates that the fit equation does no better predicting the gas consumption for each billing period than simply using the average gas consumption for all the billing periods.

It is tempting to choose the equation with the highest coefficient of determination. Resist the temptation. Rather, choose the equation based on what you would expect the underlying physical reality to be. For example, since gas consumption for heating is related to building heat loss, and steady state heat loss should be roughly proportional to the difference between the interior and exterior temperatures, I’d expect there to be a linear relationship between gas consumption and heating degree days.

If you don’t have a notion for what the physical reality is, have a strong preference for the linear equation or, if that obviously doesn’t work, other simple equations. I could chose a higher order equation and slightly increase my R2 value, but probably all I would be doing is getting my equation to better fit the noise in the data. By its very nature, going into the future, the noise will be different, and therefore my equation may work less well. The more nonlinear the equation, the greater the danger that it will result in really wacky predictions, especially when the equation is fit to a fairly small data set.

By clicking on the “details” tab, I can view the coefficients that best fit the data. I can even choose to set these to values that I deem in some way “better” or more physically realistic (e.g., I could set the intercept to zero) by selecting “adjust coefficients”; note that RETScreen does not recalculate the coefficient of determination for the adjusted coefficients, however.

With that, I have my equation, and all I need to do is apply it to typical weather data, i.e., climate norms. Since different weather data sets may have different biases, to be rigorous you should get your climate norms from the same data set that you used when fitting the equation. Since I used NASA weather data when fitting the equation, I’d ideally use NASA climate norms.

One way to get typical monthly values for climate from the NASA data is to calculate the average values for the site of interest from the hourly NASA data set dating back to 1983 (you may want to export it from RETScreen to Excel). Fortunately, a quicker option often exists: in the same map interface where I selected my project location, I can pick a location for monthly climate data. Many locations are available with NASA data (identified by blue dots), although the user should differentiate between these and the locations that have climate data from other sources (identified by red dots). A third option is to accept the probably slightly different biases in different data sources—after all, they may be insignificant compared with other sources of error, or change in the climate itself.

I still haven’t explained the fourth point in the list of ways that RETScreen® Plus is an improvement on the manual approach, the optimization of the heating degree day reference temperature. This deserves some attention.

I arbitrarily chose 16°C as my reference temperature for the heating degree day calculation. But that value may be wrong. In that case, using a value for the reference temperature that more accurately reflected the exterior temperature below which my building needs gas heating would improve the fit of my equation.

Perhaps more important, it might make the relationship between gas consumption and heating degree days more physically realistic. For example, when I look closely at my linear equation, I see that it has a negative intercept. That is, when the number of heating degree days is zero, my gas consumption is negative. Physically, that doesn’t make sense: not only does my building not generate natural gas when temperatures are warm, but it likely consumes a small measure of hot water regardless of the outdoor temperature. Let’s see if we find a more physically realistic relationship with a different heating degree day reference temperature.


Faced with this task in the manual approach, this implies regenerating the heating degree days over a range of temperatures, refitting the equation, and see what works best. That is a lot of work. With RETScreen Plus, this exact procedure is automatically run when I click on the “Optimize reference temperature” button.

Doing this, I find that my coefficient of determination is somewhat better when my reference temperature is not the 16°C I chose arbitrarily, but rather when I use a reference temperature of 11°C. When I click on the green checkmark to accept this finding, RETScreen automatically adds a new column to my data tables (heating degree days to a reference temperature of 11°C) and uses this data in the regression analysis instead of the 16°C heating degree day data.


Encouragingly, having adjusted the reference temperature to 11°C, my intercept is no longer negative, but instead slightly positive: it is consistent with the notion that even when I don’t need space heating or make-up air heating, I still need some domestic water heating.


When investigating the energy consumption of a building based on a limited set of historical data (e.g., bills), it is important to take into account the influence of weather; otherwise, your estimates of the building’s typical energy consumption may be in error due to atypical weather during the historical data period. You can fit an equation that explains variation in the energy consumption according to variation in the weather, and then apply this to typical weather data (i.e., climate norms) to estimate typical energy consumption in the future. In the past, this was an onerous exercise, because input weather data was hard to get, weather data didn’t match the time basis of the energy consumption data, and fitting equations to the data was time consuming. As shown in this blog, with appropriate tools (e.g., RETScreen® Plus, available for free), this procedure is quick, with many of the steps being done automatically.