Smooth Forecast Help

Introduction

SmoothForecast.com provides free time series forecasting capability on the web. That is you can enter a sequence of numbers representing anything from monthly rainfall amounts for the last 10 years to monthly food expenditures for the last 36 months and forecast the upcoming values based on that history. You can specify the number of forecasts that are produced. You can also specify whether the history seems to have a trend or seasonal cycles (i.e. the history goes up and down in regular intervals). Also if the history has values that seem too large or too small (called “spikes”), you can have them filtered out by using the spike smoothing feature. And finally, to get a sense of how well the forecast is, you can specify what is called a “hold back”. This means you can “hold back” a specified number of values off the end of the history and have the system forecast based upon the history prior to the held back values. The system will then compare the forecast with the held back history and produce metrics that can be used to assess how accurate the forecast is. It will also list the forecast side-by-side with the held back history.

All the values are graphed so you can compare the history with the held back forecast or with the forecast. Also by visually comparing held back forecasts with the forecasts, you can get a sense of how good the forecast is based on how well it performed for past history. If spike filtering is enabled that graph will be layered on top of the history so you can see what values were chopped off and how much.

Example 1

As an example consider the following 24 months of a food budget. The first value is the oldest and the last the most recent. Here are the values.

1019.93

1073.70

840.78

1001.39

932.44

1092.17

1420.03

1082.91

1077.38

1204.53

1105.77

1389.77

1657.32

1309.19

1218.00

1053.80

1163.03

1205.71

1563.80

1006.82

1058.99

1403.32

1206.78

1331.15

 

Here is a snapshot of SmoothForecast.com using this historical data. Note that spike smoothing has been selected and a linear trend has been selected. The initial try at this used the defaults (i.e. no spike smoothing and no trend). The resulting forecast in the graph did not look right. It is often a good idea to select spike smoothing because much real data is fraught with extreme cases that throw off the overall shape of the history. So that was selected and so was the linear trend to see what the result would look like. The desire was to forecast the food expenditure a year ahead (i.e. 12 months) and see how the forecast would have looked if it was done 6 months ago (i.e. Number to Hold Back is 6). Looking at the graph one can see the forecast of the hold back line does match the forecast line fairly well giving us confidence this is a reasonable forecast and the general trend for the food budget is to increase in cost for the next 12 months. Note the MedAPE (Median Absolute Percentage Error) value for the hold back is about 9.7%. This means of the 6 hold back values the median error was a little less than 10%.

 

 

 

Example 2

As another example consider the following nearly 15 years of a monthly electric bill. The first value is the oldest and the last the most recent. Here are the values.

104.56

99.39

122.83

122.60

100.43

114.68

190.24

228.24

208.85

162.24

128.52

141.47

133.81

156.14

0.00

69.34

74.93

92.46

182.82

171.00

179.00

134.00

113.00

110.00

99.00

121.00

100.00

100.00

37.00

134.00

150.00

185.00

147.00

141.00

125.00

70.00

114.00

130.00

124.00

128.00

24.00

109.00

138.00

213.00

140.00

174.00

105.00

100.00

105.00

25.00

103.00

111.00

18.00

119.00

139.00

214.00

235.00

101.00

128.00

107.00

32.00

139.00

109.00

98.00

95.00

102.00

166.00

268.00

186.00

164.00

135.00

129.00

103.00

128.00

61.00

126.00

120.00

124.00

139.00

215.00

178.00

185.00

142.00

87.67

80.33

93.00

104.00

39.00

70.00

82.00

143.00

183.00

212.00

196.00

119.00

117.00

94.00

97.00

101.00

80.00

100.00

110.00

149.00

265.00

303.00

118.00

165.00

124.00

106.00

83.00

76.00

70.00

110.00

125.00

112.00

227.00

286.00

239.00

159.00

143.00

56.00

77.00

143.00

72.00

111.00

125.00

135.00

277.00

213.00

253.00

181.00

109.00

46.00

42.00

82.00

72.00

65.00

150.00

140.00

186.00

289.00

310.00

246.00

173.00

150.00

73.00

74.00

89.00

128.00

191.00

203.00

290.00

272.00

273.00

229.00

186.00

175.00

159.00

149.00

79.00

186.00

197.00

294.00

348.00

361.00

300.00

245.00

216.00

175.00

163.00

156.00

83.00

178.00

221.00

261.00

488.00

406.00

328.00

 

Below is a snapshot of SmoothForecast.com using this historical data. This series has a cycle but we are not certain what it is. So the Automatic Seasonal Cycle radio button is selected allowing the system to fill in the Seasonal Cycle text box when the Process Series button is selected. Note the system determines the cycle to be 12 months (i.e. yearly cycle). 24 forecasts were requested and 12 held back for comparison purposes. Note the Smooth Spikes checkbox is not selected in this example since it actually makes the forecast worse even if the Window value is set at 12. The way to see this is to select Smooth Spikes and set the Window to 12 and compare with the Holdback Results before and after selecting Smooth Spikes. The Series Trend is set to Damped since there is a gentle slope upwards in the data. The way to see which Series Trend works the best is to set the Number to Hold Back to a value large enough to be a good sample of data and try the different Series Trend selections looking at the Holdback Results. For seasonal data a good value is some multiple of the Cycle. In this example a multiple of 1 was used since 12 values is a good enough recent sample. Note the MedAPE (Median Absolute Percentage Error) is 4.4% which is a very good match with the held back data.

 

Form Field Descriptions

 

There are several form fields that need to be specified to use SmoothForecast.com to generate forecasts. Below is a description of each and whether the field is required or optional.

 

·        Series (required)

 

This field is where the historical data is entered. One numerical value is entered per line starting with the oldest value. Values can be pasted into the field and edited before processing or after processing for subsequent processing runs (i.e. when the Process Series button is pressed). The series is preserved in the field from one processing run to another. Also the data can be loaded from a file by selecting the Browse button on the Load Series File field. The series loaded from a file will be loaded into the Series field when the Process Series button is pressed.

 

·        Load Series File (optional)

 

This is field is used to load a numerical series from a file selected using the Browse button. Any file loaded must be a text file and have one numerical value per line starting with the oldest value. The values will be loaded from the file and placed into the Series field when the Process Series button is selected.

 

·        Smooth Spikes (optional)

 

This checkbox, if selected, causes values considered outside the series moving average to clipped back to the moving average value. The size of the moving average used is based on the Window value.

 

·        Window (optional)

 

This value is only used if the Smooth Spikes checkbox is selected. It is used to determine the size of the moving average used to find series outliers. If no value is specified, the Window value of 3 is used.

 

·        Seasonal Series (optional)

 

These radio buttons, if one is selected other than No Seasonal Cycle, cause the forecast to be based on a regularly occurring cycle in the series (e.g. 12 months in a year, 7 days in a week, 365 days in a year) depending on what the series values represent (e.g. a month value, a day of the week value, a day in a year, etc). If the Automatic Seasonal Cycle radio button is selected, the Cycle value will be filled in automatically when the Process Series button is selected. If there is no detectable seasonal cycle, the Cycle value will be blank. If the Specify Seasonal Cycle radio button is selected, the Cycle value must also be specified indicating the size of the cycle (e.g. 12 for months in a year, 7 for days in a week, 365 for days in a year, etc). It is also possible to use this value to generate more complex forecasts for irregular but semi-cyclical data. This can be done by varying the Cycle value in conjunction with a Number to Hold Back value and checking to see which Cycle value reduces the metrics values in the Holdback Results.

 

·        Cycle (required if Specify Seasonal Series is selected)

 

This value specifies the size of the regularly occurring cycle in the series. See Seasonal Series field for more information.

 

·        Series Trend (optional)

 

This drop down menu indicates whether there is a trend in the data either Linear or Damped. A Linear trend places the forecast which may include seasonal cycles on a straight line either up or down. A Damped trend places the forecast which may include seasonal cycles on a curve that gently slopes up or down and then levels off. The default is to have no trend in the forecast.

 

·        Number of Forecasts (required)

 

The number of forecasts to generate must be specified and be at least 1. Generally it is a good idea to specify enough forecasts so they can be visually compared with the series using the generated graph. The resulting forecasts are listed starting with the most recent value in the Forecasts text box. Also the resulting graph shows the forecast on the end of the series. If the series had Smooth Spikes selected, the history graph line is overlaid with a line showing the resulting series when the history has outliers clipped and that would be the data used to generate the forecast instead of the actual history.

 

·        Number to Hold Back (optional)

 

The number to hold back tells the system to generate forecasts using less recent data so the forecast can be compared with the more recent actual values that occurred. For example, if the Number to Hold Back is 12, the last 12 values in the series are set aside and 12 forecasts are generated using the series up to but not including those values set aside. The forecasts are then compared with the values set aside. The actual values are listed alongside the corresponding forecast in the Holdback Results text box. Also included are several metrics that can be used to gauge how well the forecasts matched the actual values. Each metric is described below:

 

§  RMSE (Root Mean Squared Error)

 

This metric value gives the square root of the average value of the squared difference between the forecast and the actual value. Having said that the metric is a good indicator of how far away the forecast is from the actual value in absolute terms. That is, the value is in the units of the series (e.g. dollars/cents, mileage, etc). Note the RMSE can be skewed by outliers in the series. Outlier effects can be reduced by selecting the Smooth Spikes checkbox.

 

§  MASE (Mean Absolute Scaled Error)

 

This is the average of the absolute value of the difference between the forecast and the actual value divided by the scale determined by using a random walk on the history prior to the holdback period. This metric is a relative indicator of how well the forecast of the holdback period compares with the simplest model being used to forecast the history. The idea is to get a sense of how much better the forecast model behaves relative to the simplest possible forecast model. Ideally, the resulting ratio is less than 1.0 implying the forecast model is superior to a random walk. On the other hand, forecasting into a large holdback period has a larger likelihood of being incorrect in further holdback values. So values larger than 1.0 can be expected in those cases and still have reasonable forecasts. But if the MASE is large (e.g. greater than say 2.0) for holdback periods less than 10% of the size of the history, the model forecast is probably not that good.

 

§  MAPE (Mean Absolute Percentage Error)

 

This is the average of the absolute value of the difference between the forecast and the actual value divided by the actual value and is expressed as a percentage. This metric is a relative indicator of how far away the forecast is from the actual value. Like the RMSE it is subject to being skewed by outliers. And outlier effects can be reduced by selecting the Smooth Spikes checkbox.

 

§  MdAPE (Median Absolute Percentage Error)

 

This is the median (i.e. center value) of the absolute value of the difference between the forecast and the actual value divided by the actual value and is expressed as a percentage. This metric is a relative indicator of how far away the forecast is from the actual value. Unlike the MAPE value, this value is not affected by outliers. Consequently, it is the better metric to use and gives a good sense of how well spike smoothing has (or has not) improved the series for forecasting purposes.

 

§  SMAPE (Symmetric Mean Absolute Percentage Error)

 

This is the average of the absolute value of the difference between the forecast and the actual value divided by the actual value plus the forecast divided by 2 and is expressed as a percentage. This metric is a relative indicator of how far away the forecast is from the center of the actual and forecast values. The idea behind this metric is to have the resulting percentage be between 0% and 200%. However under and over forecasts are not given equal weight.

 

Generally speaking it is a good idea to gauge a hold back forecast using 2 or more of the metrics listed. Usually the RMSE and the MedAPE are the best indicators of the hold back forecast performance. Finally, note the graph line corresponding to the hold back forecast is overlayed on the historical values that it is forecasting to allowing a visual comparison between the actual values and the forecast.

 

               © 2010-2018 John Eldreth All rights reserved.   Contact Us At admin@smoothforecast.com