Moneyball: Why no prediction can't be made for baseball champion

Friday, August 4, 2017

How can you not be romantic about baseball?
Billy Beans

Last week, we discussed how Billy Beans and Paul DePosta predicted, using linear regression, about the conditions necessary for Oakland A’s to make it to the 2002 playoffs. Although A’s made it to the playoffs yet they didn’t succeed to win the World Series. Billy Beans justified this by making a claim that sabermatrics can’t be used to predict baseball champions. Today, we’ll try to analyze why no prediction can’t be made for winning the baseball World Series.

We’ll try to make prediction using logistic regression in R with the same dataset baseball.csv used last week.

Note that it’s much harder to win the World Series if there are 10 teams competing for the championship versus just two. Therefore, we will add the predictor variable NumCompetitors to the baseball data frame. It will contain the number of total teams making the playoffs in the year of a particular team/year pair.

When we’re not sure which of our variables are useful in predicting a particular outcome, it’s often helpful to build bivariate models, which are models that predict the outcome using a single independent variable.

After analyzing the above models, we found that Year, RA, RankSeason and NumCompetitors are significant.
Let’s build a regression model using above four variables.

Oops, none of the variables are significant in the multivariate model!
It maybe due to correlation between the variables.

The above correlation matrix indicates that Year/NumCompetitors has high degree of correlation (0.91).

Now, let’s try to build two-variable model.

None of the models with two independent variables has both variables significant, so none seem promising as compared to a simple bivariate model. Indeed the model with the lowest AIC value is the model with just NumCompetitors as the independent variable.

This seems to confirm the claim made by Billy Beane in Moneyball that all that matters in the Playoffs is luck, since NumCompetitors has nothing to do with the quality of the teams!
Hence, no prediction can be made for baseball champion.