Welcome, All Over Albany!
This post contains more information about my quantitative analysis of the accuracy of several Albany weather bureaux, posted at All Over Albany: Who has the best weather forecast?. All of the major conclusions are in that article, and if you haven’t read it yet, you might want to read that post first before digesting this one.
Anything worth doing is worth overdoing, so I figured that in addition to the analysis provided at AOA, I’d include some more data here.
Let’s start with a little more about why I did what I did. As previously noted, most forecasts were picked up between 6:00pm and 8:00pm. Gaps in the forecasts are due to occasional blips in the station’s reporting (occasionally, an overnight low was left off). Data for Fox on April 9 was missing because I hadn’t yet learned I needed to watch their video forecast to get to the seven-day graphic.
It was virtually impossible to consistently determine the forecast conditions just from the text of the written forecast, so I used the symbols in the weather graphic to determine the forecast conditions. Each station had their own unique symbology which took a bit to translate, but eventually I came to a standard which matched the station’s text forecasts. The Weather Channel added “Mostly Sunny” and “Partly Sunny” to their forecasts — I treated these as “Partly Cloudy” and “Mostly Cloudy” respectively.
I realized about halfway through the experiment that I should have been saving the long-range forecast graphics each day. That would not have been much more work and would have allowed for double-checking later on.
In the article on AOA, I chose to focus on the differences between the sources, not the magnitude of the difference between each source and the benchmark. The reason is that temperatures are difficult to measure accurately. I chose the Albany airport as the “home” location because it (by definition, being an airport) is located in a field and doesn’t experience temperature differences due to terrain, such as valleys and foothills. However, it’s not in Albany city proper. Cities radiate some heat by virtue of the homes therein and the activities that take place. One could measure the temperature at five places in a city, using the same thermometer, and get completely results that range over as much as 5-7°F.
One possible way to create a more stable benchmark would have been to take the temperatures from five or ten sites across the city and average them. That would provide something closer to the “correct” value of the temperature. However, I elected not to do this and only compare the sources of the weather forecasts, not their individual accuracy.
Also, the Weather Underground’s historical data tends to over-represent rain. This is because the site considers a day to be “rainy” if more than approximately 0.05″ of precipitation falls. I was initially worried about this, but then considered the purpose of this exercise. Consumers of weather forecasts, when looking at conditions, are often interested in the question, “Will it rain or not?” and want a “yes” or “no”. Forcing the forecasters to say “yes” or “no” to rain therefore seems appropriate here. If a station says it won’t rain and more than a few drops fall the next afternoon, that should be an incorrect forecast.
Some stations provided probabilities of rain instead of making a definitive call. If the station put raindrops on its seven-day graphic, I took that as a forecast of “rain”. Stepping out of the objective scientific mindset for a minute (and into my cynical self) I believe they’re only trying to hedge their bets so as to never be wrong. Some stations (in particular NewsChannel 13) rarely called rain definitively without including a probability.
One thing that bothered me somewhat about this experiment was that the score system I designed for determining temperatures (one point per degree difference) was independent of the direction of the variance. So, if a source was constantly 5 degrees below, they would score the same as a source that alternated 5 degrees above and below the proper temperature. If performing a signed average, however, it might give some indication of who is off all the time and who’s circling around the target. I did even perform a signed average on the accompanying spreadsheet. However, we can’t glean a whole lot from this.
We can’t tell from the data whether the results are from legitimate misses or from the difference between the temperature at the station and the benchmark. A particular station’s weather equipment might be located in a region that is a few degrees warmer or cooler than Albany Airport. In that case, we would expect a non-zero temperature difference.
CBS 6’s forecasts were interesting in that, especially in the long-term, Steve LaPointe’s team would often not change its forecast from day to day. In the spreadsheet, go to the “CBS6″ worksheet and find the large diagonal swath of forecasts (cells D4:AN34). Reading down each column (especially columns O and P, the forecasts for April 20 and 21), the same high and low temperatures are predicted 2-4 days straight before being updated. It seems that regardless of how models are updated, CBS 6 is the only source of the six studied that does not change the forecast unless there is a compelling reason to do so. Apparently, daily variations in models’ predictions don’t cause them to change their forecast to match. CBS 6 did not place in any of the categories enumerated in the AOA article.
One notable absence from the experiment is error analysis. Since I was unaware of each site’s techniques for determining the forecast, I could not determine the appropriate error bars. I would estimate (entirely nonscientifically) that appropriate error bars for temperature would be approximately 0.5°F and for conditions, 0.25 points on the six-point scale.
I’d love to re-run this study sometime with a bit more scientific rigor. I would also love to include a few other forecasts, including the NOAA “official” government weather forecast and the daily forecast of Professor Mike Landin of the Department of Earth and Atmospheric Sciences at the University of Albany, broadcast on WAMC locally. Ironing out a more stable benchmark through averaging and including a few more sources would make for a classic experiment.
I would also, if feasible, interview each of the forecasting teams to learn if there is any special consideration they make while forecasting. Do they tweak the forecasts that their computer models generate? Does it help to have experience in the Albany area, or could a weather forecaster from Las Vegas come to Albany and do a bang-up job?
Do you have any questions or comments on my analysis? Please post on the AOA thread or below.
Thanks for reading, and many thanks to Mary and Greg at All Over Albany for the idea for this project, and for providing a forum for publication.