Wednesday, February 6, 2008

Run Residual

In the "Bill James Handbook 2008" Bill James projected John Maine to have a 12-11 record with a 4.05 ERA, and Oliver Perez to have a 9-12 record with a 4.69 ERA. Now after watching these two pitchers emerge into top of the rotation starters over the past two years I found it hard to believe that they would each take such a significant step backwards next year. The only explaination I could come up with for this bold prediction was that Bill James had found something that indicated that Maine and Perez over- achieved last year. I personally felt that they were both due for another step forward next year, but the god of sabermetrics disagreed with me, so I needed to statistically back up my hypothesis.

The ultimate goal of baseball is to score the most runs, and in order to score a run you need to gain four bases. So under this principle each base is worth 1/4 of a run. To confirm this number I took a SRS of 30 ML pitchers and divided their Runs Allowed by Total Bases (1b+2(2b)+3(3b)+4(HR)+BB+HBP). I found the mean ration to be .248 which is basically 1/4. I then devised a regression formula, R=TB/4. This formula gives you a predicted number of runs allowed, which allows you to find the residual value, observed(actual)- predicted Runs. A positive residual meant that the pitcher was actually a victim of bad luck and pitched better than his statistics indicated, and a negative residual meant that the player over-achieved. The r-sq. value of this regresion was .74, meaning that 74% of the data can be accounted for by this regression. I believe this value, however, would increase with a larger sample size.

While researching this stat further I found two pretty significant outliers. In 2007 Brad Penny gave up only 75 runs, but according to the regression formula he should've given up 88.25. However in 2006 and 2005 he had a +1.25 and a +4.5 run residual value meaning that his stats reflected the way he actually pitched in each of those years. The other outlier was Roy Oswalt, who had a +17.75 run residual in 2007. Unlike Penny, though, this was a trend for Oswalt who had had a +19.25 residual value in each of the prior two years. I found that in each of the three years (2007, 2006, and 2005) opposing batters had a babip of .285, .262, and .272, and an OPS of .637, .619, and .734 respectively. This indicated to me that Roy Oswalt was an unbelievably gifted pitcher when it came to stranding runners on base.

So now we finally get to the focus of this post, John Maine and Oliver Perez. Last year Maine gave up 90 runs, and in the regression formula he was predicted to have given up 90.25, a residual of -.25 meaning that Maine's statistics last year reflected his overall performance. His babip for last year was .281 which again means he was neither lucky nor unlucky. Perez also gave up 90 runs last year, but his predicted run total was only 86 giving him a residual of +4. This actually means that Ollie pitched better last year than his stats indicate, and his babip of .273 also reflects this belief.

Nothing I have found or calculated indicates that either one of these pitchers is due for a decline in performance next year. It seems more logical to believe that experience will only help these two young pitchers continue to improve as they have over the past two years. I have the utmost respect for Bill James and his sabermetric statistics, but in this case I think he is well off in his prediction, and basically I find it hard to imagine either of these two pitchers regressing next year.

No comments: