Rethinking BABIP

Written by

Published on March 20, 2012

Batting average on balls in play (BABIP) is a statistic that measures the percentage of plate appearances ending with a batted ball in play (excluding home runs) for which the batter is credited with a hit. BABIP is commonly used as a red flag in statistical analysis, as a consistently high or low BABIP is viewed as hard to maintain. Therefore, BABIP is used to spot lucky seasons by players, e.g., pitchers whose BABIPs are extremely high can often be expected to improve in the following season, and pitchers whose BABIPs are extremely low can often be expected to regress in the following season and vice versa for hitters. The equation for BABIP is (hits minus home runs) divided by (at bats minus strike outs minus home runs minus sacrifice flies).

CURRENT USE OF THE BABIP STATISTIC
It is no secret that BABIP has become one of the latest baseball statistics crazes over the past decade. Statisticians argue that there is a large element of randomness when it comes to whether a batted ball, that is not a home run, ends up a hit versus an out. They cite the large fluctuation from season to season of BABIPs against pitchers as the largest supporting evidence of their BABIP theory. Most all discussions surrounding BABIP use phrases such as player X was an unlucky victim of a poor BABIP year or player Y currently has a BABIP off the charts so expect him to cool off because his current averages are lucky. The argument proposes that every player (with slight variations) should expect around the major league average .300 BABIP regardless of talent. If a player is well above this mark, he is lucky, if he is well below, he is unlucky. Players with a high or low BABIP are expected over time to migrate back to the mean.

Many in the baseball industry have been skeptical of the stock sports writers and baseball management seem to put in the randomness of BABIP. Any casual baseball fan understands that some hard hit balls end up being outs and some bloopers end up as hits, but, to the naked eye, there also seems to exist a direct relationship between how hard a ball is hit and the odds of it becoming a base hit. Baseball players are human beings, not machines, so there stands reason to suggest that players would have some good performance years or outings as well as some bad ones. When a player's BABIP fluctuates, it does not necessarily mean their luck is migrating to or from the mean, but instead they may be in a legitimate slump or on a legitimate hot streak in which their performance is actually changing. If this is the case, a bad BABIP year should not necessarily be considered randomly unlucky and conversely a good BABIP year should not be considered randomly lucky.

Side Note: The MLB has no way of really proving quantitatively the random BABIP theory, because in order to do so they would need to be able to measure the force at which a ball travels off the bat. They are just guessing that BABIP is random on the basis of 2 things: 1. When you watch a game some bloopers drop for hits and some liners go for outs, and 2. There is plenty of data that shows players' BABIPs fluctuate from year to year. There is no way to quantify all the factors that go into a players changing performance such as a lingering injury, strength gains, a decline in confidence, or a new batting stance so instead experts conveniently call it randomness that they cannot account for, or in other words, luck.

NEW NCAA BAT RULE
This past season of college baseball has provided an excellent natural experiment on the effect of how hard a ball is hit on the BABIP statistic. Just prior to the 2011 season, the National Collegiate Athletic Association (NCAA) changed the regulations for bats specifically to limit the speed and force at which a batted ball can travel. "The organization (NCAA) wants to make sure the power produced by the bat-ball contact is no greater than that produced by wood," - NCAA.

With the bat change in college baseball this year, the potential exists to compare a full year's worth of data regarding the effect of balls being hit less hard on overall BABIP. This data should go a long way in determining whether how hard a ball is hit (that is not a home run) has an effect on whether or not it goes for a base hit. If BABIP were a truly random stat, than overall BABIP for the NCAA should have remained similar to past years because BABIP would not be dependent on how hard balls are hit; however, if BABIP is dependent on how hard balls are hit, than BABIP across the NCAA should have dipped significantly due to balls being hit less hard.

NCAA BASEBALL BABIP STATISTICS
I have compiled BABIP data for NCAA divisions I, II and III for the past four years. 2008 was the first year that records were kept for each of the statistics necessary to calculate BABIP (i.e., at bats, hits, home runs, strike outs, and sacrifice flies), and 2009 was the first year for DIII. BABIP slightly increased each season from 2008 to 2010, which among other things, can be attributed to constant minor improvements to bat technology. However, in 2011, the year of the less powerful bats, BABIP dropped significantly in each division and was also the lowest year on record for each division. From 2010 to 2011 the drop for DI was .018, for DII .014 and for DIII .019.

APPLICATION OF THE FINDINGS
There are more advanced ways of critiquing BABIP that show it can be highly dependent on the speed of the batter and the skill or positioning of the defense etc., but this data gets straight at the core of the BABIP theory. The data suggests simply that harder hit balls are more likely to end up base hits. This certainly does not prove randomness and luck are not involved in whether a batted ball goes for a base hit, however, it does suggest that the current theory on the randomness of BABIP may be exaggerated and misused.

Most good fantasy baseball managers take a comprehensive look at all the information available to them when making player decisions and there are probably very few currently using BABIP as their one and only perfect statistic. These findings merely suggest that managers should not be reading into BABIP the things they currently may be reading into it. For instance, it may not be wise to draft or give an extra-long leash to player X simply because he possesses a very low BABIP with the expectation that his "unlucky" BABIP will eventually rise back to normal. It is far more likely that player X just isn't getting good wood on the ball very often which has little to do with luck or randomness.