Does the Pirates spring training record mean anything?


Andrew McCutchen cut his hair, Starling Marte is tearing the cover off the ball, Jeong-ho Kang has looked like a AA player since his home run heard ’round the world in the Pirates spring training universe. All of that has led the Pirates to a 10-9 record in exhibition play thus far.

But does the record matter in the slightest? All you hear from the time Spring Training games start to the time they end is statistics and analysis followed by the justification that “spring training doesn’t matter”. I’ve always bought into that. In fact, since the NCAA Tournament started, I haven’t been paying any attention to Pirate games at all, and I must say I don’t miss them at all.

More from Rum Bunter

I’m the kind of guy that doesn’t take theories as truth unless I see proof, so I wanted to at least do something small to try to test the theory that spring training team records don’t mean anything.

So I took every team’s spring training winning percentage for every year since 2009 and compared it to their regular season winning percentage for the same year.

Let’s take a look at the Pirates numbers in my study:


So I have one of these charts for every team in the league. The first result I looked for and found in the data was the average difference in winning percentage. I did this by finding the difference of every team’s preseason and regular season winning percentage for each individual year, listing them all, and then taking the average of them. I made them all positive numbers since for this purpose it doesn’t matter if they are improving or getting worse, and then I averaged all 191 numbers out.

That result came out to 0.08705, which is equivalent to about a 13 win difference over 162 games. That number is huge, and suggests that spring training records tell you very little about what a team’s regular season record will end up being.

After that I got a bit more statistical with it and found the correlation coefficient between all preseason and regular season records. For you non math majors out there, here’s an explanation. Basically, the correlation coefficient is figure that calculates correlation between a series of ‘x’ and ‘y’ variables. So what I ended up with was a long, long list of numbers separated into two columns, column A being the ‘x’ variable, which was the list of preseason winning percentages, and column B being the ‘y’ variable, which was the list of regular season winning percentage. It looks similar to the list above, except 30 times longer.

When calculating the correlation coefficient, the result will always be somewhere between -1 and +1. A correlation coefficient of +1 indicates a perfect positive correlation. As variable X (in this case, spring training winning percentage) increases, variable Y (regular season winning percentage) increases. As variable X decreases, variable Y decreases. On the flip side, a correlation coefficient of -1 indicates a perfect negative correlation. As X increases,  Y decreases, and vice versa. A coefficient of 0 represents no correlation at all, indicating complete randomness.

So what was the correlation coefficient of Spring Training win percentage and regular season win percentage? 0.269. While they are not COMPLETELY random, they are much closer to being random then being correlated.

True statisticians will say that six years of data isn’t enough, which is true, but typing out six years of data in an Excel spreadsheet was more than enough for me, and I think my point still stands.

So next time somebody says that a team is looking like they’ll have a great regular season record because they played well in spring, you can feel more confident about telling them kindly to shut up.

Next: Forbes identifies Pirates as 17th most valuable MLB club