##
Quickie: MLB Playoffs by Pitching Statistics *February 23, 2010*

*Posted by tomflesher in Baseball.*

Tags: Baseball, OLS, playoffs, probit, regression

trackback

Tags: Baseball, OLS, playoffs, probit, regression

trackback

It’s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I’m trying to stay warm by turning up my hot stove the way only an economist can – crunching the numbers on playoffs.

I’m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of parsimony. It contains dummy variables *teamdivwin* and *teamwildcard* which take value 1 if the pitcher’s team won the division or the wildcard respectively. I then created a variable *playoffs* which took the value of the sum of *teamdivwin* and *teamwildcard* – just a playoff dummy variable.

Using a Probit model and a standard OLS regression model, I estimated the effects of individual pitching stats on *playoffs*. Neither model has very strong predictive value (linear has R-squared of about .05), which is unsurprising since it doesn’t take the team’s batting into account at all. None of the coefficient values are shocking – in the American League (designated as *lg *= 1), teams have a higher probability of making the playoffs because there are fewer teams, and although complete games appear to have a negative effect, the positive shutout effect more than makes up for that in both models. I’m interested in whether complete game wins and complete game losses have differential effects – that will probably be my next snowy-day project.

Results are behind the cut.

Results:

Call:

glm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV +

Lg + R, family = binomial(link = “probit”))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.8444 -0.7356 -0.6261 -0.3803 2.4768

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.756627 0.046176 -16.386 < 2e-16 ***

W 0.123523 0.011183 11.046 < 2e-16 ***

SHO 0.187091 0.107494 1.740 0.081774 .

CG -0.140882 0.060472 -2.330 0.019822 *

weightedsaves -0.076265 0.020332 -3.751 0.000176 ***

SV 0.097770 0.025446 3.842 0.000122 ***

Lg 0.190521 0.050481 3.774 0.000161 ***

R -0.015532 0.001556 -9.985 < 2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3423.4 on 3221 degrees of freedom

Residual deviance: 3251.9 on 3214 degrees of freedom

AIC: 3267.9

Number of Fisher Scoring iterations: 4

Call:

lm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV + Lg +

R)

Residuals:

Min 1Q Median 3Q Max

-0.72890 -0.24345 -0.18725 -0.04165 1.01024

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.225013 0.013119 17.151 < 2e-16 ***

W 0.035344 0.003105 11.382 < 2e-16 ***

SHO 0.058328 0.030826 1.892 0.058560 .

CG -0.040513 0.017029 -2.379 0.017417 *

weightedsaves -0.022451 0.005671 -3.959 7.70e-05 ***

SV 0.029226 0.007193 4.063 4.96e-05 ***

Lg 0.055360 0.014435 3.835 0.000128 ***

R -0.004171 0.000401 -10.401 < 2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.406 on 3214 degrees of freedom

Multiple R-squared: 0.05262, Adjusted R-squared: 0.05056

F-statistic: 25.5 on 7 and 3214 DF, p-value: < 2.2e-16

## Comments»

No comments yet — be the first.