<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Tom Flesher &#187; regression</title>
	<atom:link href="http://tomflesher.com/tag/regression/feed/" rel="self" type="application/rss+xml" />
	<link>http://tomflesher.com</link>
	<description>Mercenary Educator and Bad Economist</description>
	<lastBuildDate>Mon, 14 Mar 2011 03:02:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='tomflesher.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Tom Flesher &#187; regression</title>
		<link>http://tomflesher.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://tomflesher.com/osd.xml" title="Tom Flesher" />
	<atom:link rel='hub' href='http://tomflesher.com/?pushpress=hub'/>
		<item>
		<title>Hit Batsman Roundup, 2010</title>
		<link>http://tomflesher.com/2010/12/26/hit-batsman-roundup-2010/</link>
		<comments>http://tomflesher.com/2010/12/26/hit-batsman-roundup-2010/#comments</comments>
		<pubDate>Sun, 26 Dec 2010 20:56:14 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Brett Carroll]]></category>
		<category><![CDATA[hit batsman]]></category>
		<category><![CDATA[hit by pitch]]></category>
		<category><![CDATA[Hunter Pence]]></category>
		<category><![CDATA[Kevin Youkilis]]></category>
		<category><![CDATA[Omar Infante]]></category>
		<category><![CDATA[Raul Ibanez]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[Rickie Weeks]]></category>
		<category><![CDATA[Scott Podsednik]]></category>
		<category><![CDATA[spurious correlation]]></category>
		<category><![CDATA[Victor Martinez]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=498</guid>
		<description><![CDATA[There&#8217;s very little more subtle and involved than the quiet elegance of a batter getting beaned. In fact, that particular strategy was invoked 1549 times in 2010, with 419 batters getting plunked at least one. The absolute leader this season was not Kevin Youkilis or Brett Carroll but Rickie Weeks, who led with 25 HBP [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=498&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s very little more subtle and involved than the quiet elegance of a batter getting beaned. In fact, that particular strategy was invoked 1549 times in 2010, with 419 batters getting plunked at least one.</p>
<p>The <a href="http://bbref.com/pi/shareit/DKZvv">absolute leader</a> this season was not <strong><a href="http://www.baseball-reference.com/players/y/youklke01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Kevin  Youkilis</a></strong> or <strong><a href="http://www.baseball-reference.com/players/c/carrobr01.shtml">Brett  Carroll</a></strong> but <strong><a href="http://www.baseball-reference.com/players/w/weeksri01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Rickie  Weeks</a></strong>, who led with 25 HBP in 754 plate appearances. Put another way, Weeks got hit in 3.32% of his plate appearances.  That&#8217;s almost once every 30 plate appearances, or nearly four times the MLB-wide rate of 0.83% of the time. (Incidentally, that&#8217;s total HBP divided by total plate appearances. The more skewed mean percentage is 0.58%.) What leads to such a high number of plunkings?</p>
<p>I would assume that a few things would go into the decision to hit a batter intentionally:</p>
<ul>
<li>Pitchers are less likely to be hit by other pitchers.</li>
<li>If a hitter is likely to get on base anyway, he&#8217;s more likely to be hit &#8211; you don&#8217;t lose anything by putting him on base, and you control the damage by limiting him to one base.</li>
<li>If a batter is likely to hit for extra bases, he&#8217;s more likely to be hit.</li>
<li>If a batter is likely to steal a base, he&#8217;s less likely to be hit, but there is an offsetting effect for caught stealing.</li>
<li>American League batters are more likely to be hit because of the moral hazard effect of pitchers not having to bat.</li>
</ul>
<p>With that in mind, I set up a regression in R using every player who had at least one plate appearance in 2010. I added binary variables for Pitcher (1 if the player&#8217;s primary position is pitcher, 0 otherwise) and Lg (1 if the player played the entire season in the American League, 0 otherwise), then regressed <em>HBP/PA</em> on <em>Pitcher, Lg, BB, HR, OBP, SLG, SB,</em> and <em>CS</em>. The results were somewhat surprising:</p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = hbppa ~ Pitcher + Lg + <a href="/packages/BB">BB</a> + HR + OBP + SLG + SB +
    CS<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.0154027</span> -<span style="color:#cc66cc;">0.0059081</span> -<span style="color:#cc66cc;">0.0018096</span>  <span style="color:#cc66cc;">0.0001845</span>  <span style="color:#cc66cc;">0.1397065</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  6.847e-03  9.815e-04   <span style="color:#cc66cc;">6.975</span> 5.77e-12 ***
Pitcher     -5.399e-03  9.136e-04  -<span style="color:#cc66cc;">5.909</span> 4.81e-09 ***
Lg          -1.614e-03  7.054e-04  -<span style="color:#cc66cc;">2.289</span>   <span style="color:#cc66cc;">0.0223</span> *
<a href="/packages/BB">BB</a>          -1.412e-05  3.257e-05  -<span style="color:#cc66cc;">0.434</span>   <span style="color:#cc66cc;">0.6647</span>
HR           1.122e-04  7.956e-05   <span style="color:#cc66cc;">1.411</span>   <span style="color:#cc66cc;">0.1587</span>
OBP          8.570e-03  3.477e-03   <span style="color:#cc66cc;">2.465</span>   <span style="color:#cc66cc;">0.0139</span> *
SLG         -3.451e-03  2.468e-03  -<span style="color:#cc66cc;">1.398</span>   <span style="color:#cc66cc;">0.1624</span>
SB          -6.749e-05  8.693e-05  -<span style="color:#cc66cc;">0.776</span>   <span style="color:#cc66cc;">0.4377</span>
CS           1.770e-04  2.646e-04   <span style="color:#cc66cc;">0.669</span>   <span style="color:#cc66cc;">0.5036</span>
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.01042</span> on <span style="color:#cc66cc;">935</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.08839</span><span style="color:#339933;">,</span>    Adjusted R-squared: <span style="color:#cc66cc;">0.08059</span>
F-statistic: <span style="color:#cc66cc;">11.33</span> on <span style="color:#cc66cc;">8</span> and <span style="color:#cc66cc;">935</span> DF<span style="color:#339933;">,</span>  p-value: 2.07e-15</pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>That&#8217;s right &#8211; only <em>Pitcher, Lg, HR,</em> and <em>SLG</em> are even marginally significant (80% level). <em>BB, SB,</em> and <em>CS</em> aren&#8217;t even close. Why not?</p>
<p>Well, for one, the number of stolen bases and times caught stealing are relatively small no matter what. There probably isn&#8217;t enough data. For another, there simply probably isn&#8217;t as much intent to hit batters as we&#8217;d like to pretend.</p>
<p>Second, American Leaguers are <strong>less</strong> likely to be hit. This baffles me a little bit.</p>
<p>Also, keep in mind that this model shouldn&#8217;t be expected to, and cannot, explain all or even most of the variation in hit batsman. The R-squared is about .09, meaning that it explains about 9% of the variation. It ignores probably the most important factor, physics, entirely. (That is, the model doesn&#8217;t have any way to account for accidental plunkings.) As a side note, other regressions show there might be an effect for plate appearances, meaning you&#8217;re more likely to get hit by chance alone if you take enough pitches.</p>
<p>Finally, there are some guys who manage to do the opposite of Weeks&#8217; feat. Houston outfielder <strong><a href="http://www.baseball-reference.com/players/p/pencehu01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Hunter  Pence</a></strong> went 156 games and 658 plate appearances without getting plunked at all. Honorable mentions go to <strong><a href="http://www.baseball-reference.com/players/i/ibanera01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Raul  Ibanez</a></strong>, <strong><a href="http://www.baseball-reference.com/players/p/podsesc01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Scott  Podsednik</a></strong>, <strong><a href="http://www.baseball-reference.com/players/m/martivi01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Victor  Martinez</a></strong>, and <strong><a href="http://www.baseball-reference.com/players/i/infanom01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Omar  Infante</a></strong>, all of whom went over 500 plate appearances without a beaning. Now THAT&#8217;S plate discipline.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/498/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/498/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/498/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=498&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/26/hit-batsman-roundup-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Diagnosing the AL</title>
		<link>http://tomflesher.com/2010/12/22/diagnosing-the-al/</link>
		<comments>http://tomflesher.com/2010/12/22/diagnosing-the-al/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 21:20:26 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[2010]]></category>
		<category><![CDATA[American League]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=463</guid>
		<description><![CDATA[In the previous post, I crunched some numbers on a previous forecast I&#8217;d made and figured out that it was a pretty crappy forecast. (That&#8217;s the fun of forecasting, of course &#8211; sometimes you&#8217;re right and sometimes you&#8217;re wrong.) The funny part of it, though, is that the predicted home runs per game for the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=463&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the previous post, I crunched some numbers on a previous forecast I&#8217;d made and figured out that it was a pretty crappy forecast. (That&#8217;s the fun of forecasting, of course &#8211; sometimes you&#8217;re right and sometimes you&#8217;re wrong.) The funny part of it, though, is that the predicted home runs per game for the American League was so far off &#8211; 3.4 standard errors below the predicted value &#8211; that it&#8217;s highly unlikely that the regression model I used controls for all relevant variables. That&#8217;s not surprising, since it was only a time trend with a dummy variable for the designated hitter.</p>
<p>There are a couple of things to check for immediately. The first is the most common explanation thrown around when home runs drop &#8211; steroids. It seems to me that if the drop in home runs were due to better control of performance-enhancing drugs, then it should mostly be home runs that are affected. For example, intentional walks should probably be below expectation, since intentional walks are used to protect against a home run hitter. Unintentional walks should probably be about as expected, since walks are a function of plate discipline and pitcher control, not of strength. On-base percentage should probably drop at a lower magnitude than home runs, since some hits that would have been home runs will stay in the park as singles, doubles, or triples. Finally, slugging average should drop because a loss in power without a corresponding increase in speed will lower total bases.</p>
<p>I&#8217;ll analyze these with pretty new R code behind the cut.</p>
<p><span id="more-463"></span>Using R, I fitted time-series models of the same functional form as the home runs per game model. I pulled the data from the Baseball-Reference.com AL Batting Encyclopedia and regressed the variable of interest on a time trend, its square, and a dummy for the designated hitter.</p>
<p><span style="text-decoration:underline;"><strong>First Assumption:</strong></span> Intentional walks should decrease.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; ibb.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>IBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>ibb.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = IBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.1350376</span> -<span style="color:#cc66cc;">0.0261969</span>  <span style="color:#cc66cc;">0.0005516</span>  <span style="color:#cc66cc;">0.0294412</span>  <span style="color:#cc66cc;">0.1534536</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  2.656e-01  1.408e-02  <span style="color:#cc66cc;">18.870</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>            8.037e-03  1.199e-03   <span style="color:#cc66cc;">6.706</span> 1.01e-09 ***
tsq         -1.393e-04  2.024e-05  -<span style="color:#cc66cc;">6.882</span> 4.30e-10 ***
DH          -1.140e-01  1.055e-02 -<span style="color:#cc66cc;">10.805</span>  &lt; 2e-16 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.04689</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.5961</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.5847</span>
F-statistic: <span style="color:#cc66cc;">52.14</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: &lt; 2.2e-16
 
&gt; ibb.2010.fitted &lt;- <span style="color:#009900;">(</span>2.656e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>8.037e-03<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>-1.393e-04<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-1.140e-01<span style="color:#009900;">)</span>
&gt; ibb.2010.obs &lt;- <span style="color:#cc66cc;">.2</span>
&gt; residual.ibb &lt;- ibb.2010.obs - ibb.2010.fitted
&gt; se.ibb &lt;- <span style="color:#cc66cc;">.04689</span>
&gt; residual.ibb/se.ibb
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> <span style="color:#cc66cc;">0.750113</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>Intentional walks per game increased, but the increase was by less than one standard error. Statistically, intentional walks did not change.</p>
<p><strong><span style="text-decoration:underline;">Second Assumption:</span></strong> Unintentional walks should not change.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; uBB &lt;- <span style="color:#009900;">(</span>BB-IBB<span style="color:#009900;">)</span>
&gt; ubb.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>uBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>ubb.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = uBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
     Min       1Q   Median       3Q      Max
-<span style="color:#cc66cc;">0.69256</span> -<span style="color:#cc66cc;">0.12758</span> -<span style="color:#cc66cc;">0.01390</span>  <span style="color:#cc66cc;">0.13178</span>  <span style="color:#cc66cc;">0.77866</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  <span style="color:#cc66cc;">3.0879505</span>  <span style="color:#cc66cc;">0.0732669</span>  <span style="color:#cc66cc;">42.147</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -<span style="color:#cc66cc;">0.0190285</span>  <span style="color:#cc66cc;">0.0062392</span>  -<span style="color:#cc66cc;">3.050</span> <span style="color:#cc66cc;">0.002892</span> **
tsq          <span style="color:#cc66cc;">0.0003623</span>  <span style="color:#cc66cc;">0.0001054</span>   <span style="color:#cc66cc;">3.439</span> <span style="color:#cc66cc;">0.000837</span> ***
DH           <span style="color:#cc66cc;">0.1812598</span>  <span style="color:#cc66cc;">0.0549094</span>   <span style="color:#cc66cc;">3.301</span> <span style="color:#cc66cc;">0.001313</span> **
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.2441</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.1876</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.1647</span>
F-statistic: <span style="color:#cc66cc;">8.162</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: 6.127e-05
 
&gt; ubb.2010.fitted &lt;- <span style="color:#cc66cc;">3.0879505</span> + <span style="color:#009900;">(</span>-<span style="color:#cc66cc;">.0190285</span><span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span><span style="color:#cc66cc;">.0003623</span><span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#cc66cc;">.1812598</span>
&gt; ubb.2010.obs &lt;- <span style="color:#cc66cc;">3.25</span> - <span style="color:#cc66cc;">.2</span>
&gt; residual.ubb &lt;- ubb.2010.obs - ubb.2010.fitted
&gt; se.ubb &lt;- <span style="color:#cc66cc;">.2441</span>
&gt; residual.ubb/se.ubb
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">1.187166</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>Unintentional walks decreased by a bit over one standard error. Again, that isn&#8217;t evidence of a big enough fluctuation to say that it&#8217;s statistically different from our expectation.</p>
<p><strong><span style="text-decoration:underline;">Third Assumption:</span></strong> OBP drops, but by somewhat less than 3.4 standard errors.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; obp.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>OBP ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>obp.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = OBP ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.0217348</span> -<span style="color:#cc66cc;">0.0044903</span>  <span style="color:#cc66cc;">0.0002799</span>  <span style="color:#cc66cc;">0.0046695</span>  <span style="color:#cc66cc;">0.0182481</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  3.238e-01  2.230e-03 <span style="color:#cc66cc;">145.199</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -5.703e-04  1.899e-04  -<span style="color:#cc66cc;">3.003</span>  <span style="color:#cc66cc;">0.00334</span> **
tsq          1.472e-05  3.207e-06   <span style="color:#cc66cc;">4.591</span> 1.22e-05 ***
DH           8.245e-03  1.671e-03   <span style="color:#cc66cc;">4.933</span> 3.02e-06 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.00743</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.487</span><span style="color:#339933;">,</span>      Adjusted R-squared: <span style="color:#cc66cc;">0.4724</span>
F-statistic: <span style="color:#cc66cc;">33.54</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: 2.532e-15
 
&gt; obp.2010.fitted &lt;- <span style="color:#009900;">(</span>3.238e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-5.703e-04<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>1.472e-05<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + 8.245e-03
&gt; obp.2010.obs &lt;- <span style="color:#cc66cc;">.327</span>
&gt; residual.obp &lt;- obp.2010.obs - obp.2010.fitted
&gt; se.obp &lt;- <span style="color:#cc66cc;">.00743</span>
&gt; residual.obp/se.obp
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">2.593556</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>OBP dropped, but it dropped by quite a bit. Without more information it&#8217;s hard to judge whether a change of this magnitude is due to better pitching or power being taken away from hitters.</p>
<p><strong><span style="text-decoration:underline;">Fourth Assumption:</span></strong> Slugging average will drop.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; slg.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>SLG ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>slg.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = SLG ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.0357646</span> -<span style="color:#cc66cc;">0.0087050</span> -<span style="color:#cc66cc;">0.0007988</span>  <span style="color:#cc66cc;">0.0115133</span>  <span style="color:#cc66cc;">0.0317497</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  3.937e-01  4.471e-03  <span style="color:#cc66cc;">88.050</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -2.058e-03  3.807e-04  -<span style="color:#cc66cc;">5.404</span> 4.04e-07 ***
tsq          5.049e-05  6.429e-06   <span style="color:#cc66cc;">7.853</span> 3.51e-12 ***
DH           1.693e-02  3.351e-03   <span style="color:#cc66cc;">5.054</span> 1.82e-06 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.01489</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.6452</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.6352</span>
F-statistic: <span style="color:#cc66cc;">64.27</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: &lt; 2.2e-16
 
&gt; slg.2010.fitted &lt;- <span style="color:#009900;">(</span>3.937e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-2.058e-03<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>5.049e-05<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>1.693e-02<span style="color:#009900;">)</span>
&gt; slg.2010.obs &lt;- <span style="color:#cc66cc;">.407</span>
&gt; residual.slg &lt;- slg.2010.obs - slg.2010.fitted
&gt; se.slg &lt;- <span style="color:#cc66cc;">.01489</span>
&gt; residual.slg/se.slg
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">3.137585</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>A drop in slugging average of over three standard errors indicates that we may be working with something that&#8217;s ruined hitters&#8217; power or that&#8217;s hurt their ability to hit in general. We have results that are consistent with either something harming power hitters specifically or hitters in general.</p>
<p>This isn&#8217;t evidence of steroid use. In fact, the same results would be consistent with a shift toward pitching talent. More work needs to be done on this year&#8217;s data before conclusions can be drawn. However, it does seem to indicate that, at least in the American League, the Year of the Pitcher narrative has some statistical foundation.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/463/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=463&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/22/diagnosing-the-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>What Happened to Home Runs This Year?</title>
		<link>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/</link>
		<comments>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 17:18:46 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[standard error]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=458</guid>
		<description><![CDATA[I was talking to Jim, the writer behind Apparently, I&#8217;m An Angels Fan, who&#8217;s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted &#8220;Year of the Pitcher&#8221; has affected home run production. Sure enough, on checking the AL Batting Encyclopedia at Baseball-Reference.com, production dropped [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=458&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I was talking to Jim, the writer behind <a href="http://apparentlyanangelsfan.wordpress.com">Apparently, I&#8217;m An Angels Fan</a>, who&#8217;s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted &#8220;Year of the Pitcher&#8221; has affected home run production. Sure enough, on checking the <a href="http://www.baseball-reference.com/leagues/AL/bat.shtml">AL Batting  Encyclopedia</a> at <a href="http://www.baseball-reference.com">Baseball-Reference.com</a>, production dropped by about .15 home runs per game (from 1.13 to .97). Is that normal statistical variation or does it show that this year was really different?</p>
<p>In two previous posts, I <a title="Back when it was hard to hit 55…" href="http://worldsworstsportsblog.com/2010/07/08/back-when-it-was-hard-to-hit-55/">looked at the trend of home runs per game to examine Stuff Keith Hernandez Says</a> and then <a title="More on Home Runs Per Game" href="http://worldsworstsportsblog.com/2010/07/09/more-on-home-runs-per-game/">examined Japanese baseball&#8217;s data for evidence of structural break</a>. I used the Batting Encyclopedia to run a time-series regression for a quadratic trend and added a dummy variable for the Designated Hitter. I found that the time trend and DH control account for approximately 56% of the variation in home runs per year, and that the functional form is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911++%5Ctimes+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911  &#92;times DH ' title='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911  &#92;times DH ' class='latex' /></p>
<p>with t=1 in 1955, t=2 in 1956, and so on. That means t=56 in 2010. Consequently, we&#8217;d expect home run production per game in 2010 in the American League to be approximately</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+56+%2B+.0004+%5Ctimes+3136+%2B+.0911+%5Capprox+1.25+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times 56 + .0004 &#92;times 3136 + .0911 &#92;approx 1.25 ' title='&#92;hat{HR} = .957 - .0188 &#92;times 56 + .0004 &#92;times 3136 + .0911 &#92;approx 1.25 ' class='latex' /></p>
<p>That means we expected production to increase this year and it dropped precipitously, for a residual of -.28. The residual standard error on the original regression was .1092, so on 106 degrees of freedom, so the t-value using <a href="http://www.stat.tamu.edu/stat30x/zttables.php">Texas A&amp;M&#8217;s table</a> is 1.984 (approximating using 100 df). That means we can be 95% confident that the actual number of home runs should fall within .1092*1.984, or about .2041, of the expected value. The lower bound would be about 1.05, meaning we&#8217;re still significantly below what we&#8217;d expect. In fact, the observed number is about 3.4 standard errors below the expected number. In other words, we&#8217;d expect that to happen by chance less than .1% (that is, less than one tenth of one percent) of the time.</p>
<p>Clearly, something else is in play.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/458/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=458&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>More on Home Runs Per Game</title>
		<link>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/</link>
		<comments>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 14:35:26 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Chow test]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Japanese baseball]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rays]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=335</guid>
		<description><![CDATA[In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=335&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used on data in an otherwise identical league, we should see similar trends but no dropoff around 1995. As usual, for replication purposes I&#8217;m going to use Japan&#8217;s Pro Baseball leagues, the Pacific and Central Leagues. They&#8217;re ideal because, just like the American Major Leagues, one league uses the designated hitter and one does not. There are some differences &#8211; the talent pool is a bit smaller because of the lower population base that the leagues draw from, and there are only 6 teams in each league as opposed to MLB&#8217;s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' title='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} ' title='&#92;hat{HR} ' class='latex' /> is the predicted number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1 in 1955, and <em>DH</em> is a binary variable that takes value 1 if the league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape that the MLB data showed, the Japanese data looks almost M-shaped with a maximum around 1984. (Why, I&#8217;m not sure &#8211; I&#8217;m not knowledgeable enough about Japanese baseball to know what might have caused that spike.) It reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that doesn&#8217;t. The unsurprising factor is the switching of signs for the <em>t</em> variables &#8211; we expected that based on the shape of the data. The surprising factor is that the designated hitter rule is insignificant. We can only be about 11% sure it&#8217;s significant. In addition, this model explains less of the variation than the MLB version &#8211; while that explained about 56% of the variation, the Japanese model has an <img src='http://s0.wp.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 ' title='R^2 ' class='latex' /> value of .4045, meaning it explains about 40% of the variation in home runs per game.</p>
<p>There&#8217;s a slightly interesting pattern to the residual home runs per game (<img src='http://s0.wp.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = &#92;hat{HR} - HR' title='Residual = &#92;hat{HR} - HR' class='latex' />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&h=82" alt="" width="150" height="82" /></a>it isn&#8217;t as pronounced, this data also shows a spike &#8211; but the spike is at <em>t</em>=55, so instead of showing up in 1995, the Japan leagues spiked around the early 2000s. Clearly the same effect is not in play, but why might the Japanese leagues see the same effect later than the MLB teams? It can&#8217;t be an expansion effect, since the Japanese leagues have stayed constant at 6 teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic (Breusch-Pagan test p-value .0796), so it might be better modeled using a generalized least squares formula, but doing so would have skewed the results of the replication.</p>
<p>In order to show that the parameters really are different, the appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow&#8217;s test for structural change</a>. To clean it up, I&#8217;m using only the data from 1960 on. (It&#8217;s quick and dirty, but it&#8217;ll do the job.) Chow&#8217;s test takes</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%5Csim%5C+F_%7Bk%2CN_1%2BN_2-2k%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' title='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=S_C+%3D+6.3666&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_C = 6.3666' title='S_C = 6.3666' class='latex' /> is the combined sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=S_1+%3D+1.2074&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_1 = 1.2074' title='S_1 = 1.2074' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_2+%3D+2.2983&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_2 = 2.2983' title='S_2 = 2.2983' class='latex' /> are the individual (i.e. MLB and Japan) sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=k%3D4&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='k=4' title='k=4' class='latex' /> is the number of parameters, and <img src='http://s0.wp.com/latex.php?latex=N_1+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_1 = 100' title='N_1 = 100' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=N_2+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_2 = 100' title='N_2 = 100' class='latex' /> are the number of observations in each group.</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%281.2074+%2B+2.2983%29%29%2F%284%29%7D%7B%28100%2B100%29%2F%28100%2B100-2%5Ctimes+4%29%7D+%5Csim%5C++F_%7B4%2C100%2B100-2+%5Ctimes+4%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' title='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%283.5057%29%29%2F%284%29%7D%7B%28200%29%2F%28192%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B2.8609%2F4%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B.7152%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=.6866+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='.6866 &#92;sim&#92;  F_{4,192}' title='.6866 &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p>The critical value for 90% significance at 4 and 192 degrees of freedom would be 1.974 according to <a href="http://www.stat.tamu.edu/~west/applets/fdemo.html">Texas A&amp;M&#8217;s F calculator</a>. That means we don&#8217;t have enough evidence that the parameters are different to treat them differently. This is probably an artifact of the small amount of data we have.</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:744px;width:1px;height:1px;overflow:hidden;">
<div class="snap_preview">
<p>In the previous post, I looked at the trend  in home runs per game in the Major Leagues and suggested that the  recent deviation from the increasing trend might have been due to the  development of strong farm systems like the Tampa Bay Rays’. That means  that if the same data analysis process is used on data in an otherwise  identical league, we should see similar trends but no dropoff around  1995. As usual, for replication purposes I’m going to use Japan’s Pro  Baseball leagues, the Pacific and Central Leagues. They’re ideal  because, just like the American Major Leagues, one league uses the  designated hitter and one does not. There are some differences – the  talent pool is a bit smaller because of the lower population base that  the leagues draw from, and there are only 6 teams in each league as  opposed to MLB’s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img class="latex" title="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911 \times  DH " /></p>
<p>where <img class="latex" title="\hat{HR} " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} " /> is the predicted  number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1  in 1954, and <em>DH</em> is a binary variable that takes value 1 if the  league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese  leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&amp;h=82&h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape  that the MLB data showed, the Japanese data looks almost M-shaped with a  maximum around 1984. (Why, I’m not sure – I’m not knowledgeable enough  about Japanese baseball to know what might have caused that spike.) It  reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got  these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that  doesn’t. The unsurprising factor is the switching of signs for the <em>t</em> variables – we expected that based on the shape of the data. The  surprising factor is that the designated hitter rule is insignificant.  We can only be about 11% sure it’s significant. In addition, this model  explains less of the variation than the MLB version – while that  explained about 56% of the variation, the Japanese model has an <img class="latex" title="R^2 " src="http://l.wordpress.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="R^2 " /> value of .4045, meaning it  explains about 40% of the variation in home runs per game.</p>
<p>There’s a slightly interesting pattern to the residual home runs per  game (<img class="latex" title="Residual = \hat{HR} - HR" src="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="Residual = \hat{HR} - HR" />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&amp;h=82&h=82" alt="" width="150" height="82" /></a>it isn’t as pronounced, this data  also shows a spike – but the spike is at <em>t</em>=55, so instead of  showing up in 1995, the Japan leagues spiked around the early 2000s.  Clearly the same effect is not in play, but why might the Japanese  leagues see the same effect later than the MLB teams? It can’t be an  expansion effect, since the Japanese leagues have stayed constant at 6  teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic  (Breusch-Pagan test p-value .0796), so it might be better modeled using a  generalized least squares formula, but doing so would have skewed the  results of the replication.</p>
<p>In order to show that the parameters really are different, the  appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow’s  test for structural change</a>. To clean it up, I’m using only the data  from 1960 on. (It’s quick and dirty, but it’ll do the job.) Chow’s test  takes</p>
<p><img class="latex" title="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" src="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" /></p>
</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/335/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=335&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} </media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=R%5E2+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">R^2 </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">Residual = \hat{HR} - HR</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F</media:title>
		</media:content>
	</item>
		<item>
		<title>Back when it was hard to hit 55&#8230;</title>
		<link>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/</link>
		<comments>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:06:05 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[sabermetrics]]></category>
		<category><![CDATA[Stuff Keith Hernandez Says]]></category>
		<category><![CDATA[talent pool dilution]]></category>
		<category><![CDATA[Willie Mays]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=319</guid>
		<description><![CDATA[Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about Willie Mays for some reason, and said that Mays hit 55 home [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=319&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about <a href="http://www.baseball-reference.com/players/m/mayswi01.shtml">Willie Mays </a>for some reason, and said that Mays hit 55 home runs &#8220;back when it was hard to hit 55.&#8221; Keith coyly said that, while it was easy for a while, it was &#8220;getting hard again,&#8221; at which point he abruptly stopped talking.</p>
<p>Keith&#8217;s unusual candor about drug use and Mays&#8217; career best of 52 home runs aside, this pinged my &#8220;Stuff Keith Hernandez Says&#8221; meter. After accounting for any time trend and other factors that might explain home run hitting, is there an upward trend? If so, is there a pattern to the remaining home runs?</p>
<p>The first step is to examine the data to see if there appears to be any trend. Just looking at it, there appears to be a messy U shape with a minimum around t=20, which indicates a quadratic trend. That means I want to include a term for time and a term for time squared.<a href="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg"><img class="alignright size-thumbnail  wp-image-325" title="homerunspergame" src="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150&h=102" alt="" width="150" height="102" /></a></p>
<p>Using the per-game averages for home runs from 1955 to 2009, I detrended the data using t=1 in 1955. I also had to correct for the effect of the designated hitter. That gives us an equation of the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+%5Chat%7B%5Cbeta_%7B0%7D%7D+%2B+%5Chat%7B%5Cbeta_%7B1%7D%7Dt+%2B+%5Chat%7B%5Cbeta_%7B2%7D%7D+t%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta_%7B3%7D%7D+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' title='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' class='latex' /></p>
<p>The results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.957</td>
<td align="right">0.0328</td>
<td align="right">29.189</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">-0.0188</td>
<td align="right">0.0028</td>
<td align="right">-6.738</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">0.0004</td>
<td align="right">0.00005</td>
<td align="right">8.599</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0911</td>
<td align="right">0.0246</td>
<td align="right">3.706</td>
<td align="right">0.0003</td>
<td align="right">0.9997</td>
</tr>
</tbody>
</table>
<p>We can see that there&#8217;s an upward quadratic trend in predicted home runs that together with the DH rule account for about 56% of the variation in the number of home runs per game in a season (<img src='http://s0.wp.com/latex.php?latex=R%5E2+%3D+.5618&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 = .5618' title='R^2 = .5618' class='latex' />). The Breusch-Pagan test has a p-value of .1610, indicating a possibility of mild homoskedasticity but nothing we should get concerned about.</p>
<p>Then, I needed to look at the difference between the predicted number of home runs per game and the actual number of home runs per game, which is accessible by subtracting</p>
<p><img src='http://s0.wp.com/latex.php?latex=Residual+%3D+HR+-+%5Chat%7BHR%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = HR - &#92;hat{HR}' title='Residual = HR - &#92;hat{HR}' class='latex' /></p>
<p>This represents the &#8220;abnormal&#8221; number of home runs per year. The question then becomes, &#8220;Is there a patt<a href="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg"><img class="alignright size-thumbnail  wp-image-331" title="homerunresiduals" src="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150&h=102" alt="" width="150" height="102" /></a>ern to the number of abnormal home runs?&#8221;  There are two ways to answer this. The first way is to look at the abnormal home runs. Up until about t=40 (the mid-1990s), the abnormal home runs are pretty much scattershot above and below 0. However, at t=40, the residual jumps up for both leagues and then begins a downward trend. It&#8217;s not clear what the cause of this is, but the knee-jerk reaction is that there might be a drug use effect. On the other hand, there are a couple of other explanations.</p>
<p>The most obvious is a boring old expansion effect. In 1993, the National League added two teams (the Marlins and the Rockies), and in 1998 each league added a team (the AL&#8217;s Rays and the NL&#8217;s Diamondbacks). Talent pool dilution has shown up in our discussion of hit batsmen, and I believe that it can be a real effect. It would be mitigated over time, however, by the establishment and development of farm systems, in particular strong systems like the one that&#8217;s producing good, cheap talent for the Rays.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/319/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=319&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150" medium="image">
			<media:title type="html">homerunspergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150" medium="image">
			<media:title type="html">homerunresiduals</media:title>
		</media:content>
	</item>
		<item>
		<title>Modeling Run Production</title>
		<link>http://tomflesher.com/2010/06/19/modeling-run-production/</link>
		<comments>http://tomflesher.com/2010/06/19/modeling-run-production/#comments</comments>
		<pubDate>Sat, 19 Jun 2010 18:28:39 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[run production]]></category>
		<category><![CDATA[sports economics]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=203</guid>
		<description><![CDATA[A baseball team can be thought of as a factory which uses a single crew to operate two machines. The first machine produces runs while the team bats, and the second machine produces outs while the team is on fields. This is a somewhat abstract way to look at the process of winning games, because [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=203&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A baseball team can be thought of as a factory which uses a single crew to operate two machines. The first machine produces runs while the team bats, and the second machine produces outs while the team is on fields. This is a somewhat abstract way to look at the process of winning games, because ordinarily machines have a fixed input and a fixed output. In a box factory, the input comprises man-hours and corrugated board, and the output is a finished box. Here, the input isn&#8217;t as well-defined.</p>
<p>Runs are a function of total bases, certainly, but total bases are functions of things like hits, home runs, and walks. Basically, runs are a function of getting on base and of advancing people who are already on base. Obviously, the best measure of getting on base is On-Base Percentage, and Slugging Average (expected number of bases per at-bat) is a good measure of advancement.</p>
<p>OBP wraps up a lot of things &#8211; walks, hits, and hit-by-pitch appearances &#8211; and SLG corrects for the greater effects of doubles, triples, and home runs. That doesn&#8217;t account for a few other things, though, like stolen bases, sacrifice flies, and sacrifice hits. It also doesn&#8217;t reflect batter ability directly, but that&#8217;s okay &#8211; the stats we have should represent batter ability since the defensive side is trying to prevent run production. The model might look something like this, then:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BRuns%7D+%3D+%5Chat%7B%5Cbeta_0%7D+%2B+%5Chat%7B%5Cbeta_1%7D+OBP+%2B+%5Chat%7B%5Cbeta_2%7D+SLG+%2B+%5Chat%7B%5Cbeta_3%7D+SB+%2B+%5Chat%7B%5Cbeta_4%7D+SF+%2B+%5Chat%7B%5Cbeta_5%7D+SH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{Runs} = &#92;hat{&#92;beta_0} + &#92;hat{&#92;beta_1} OBP + &#92;hat{&#92;beta_2} SLG + &#92;hat{&#92;beta_3} SB + &#92;hat{&#92;beta_4} SF + &#92;hat{&#92;beta_5} SH ' title='&#92;hat{Runs} = &#92;hat{&#92;beta_0} + &#92;hat{&#92;beta_1} OBP + &#92;hat{&#92;beta_2} SLG + &#92;hat{&#92;beta_3} SB + &#92;hat{&#92;beta_4} SF + &#92;hat{&#92;beta_5} SH ' class='latex' /></p>
<p>This is the simplest model we can start with &#8211; each factor contributes a discrete number of runs. If we need to (and we probably will), we can add terms to capture concavity of the marginal effect of different stats, or (more likely) an interaction term for SLG and, say, SB, so that a stolen base is worth more on a team where you&#8217;re more likely to be brought home by a batter because he&#8217;s more likely to give you extra bases. As it is, however, we can test this model with linear regression. The details of it are behind the cut.<span id="more-203"></span></p>
<p>I&#8217;m using a dataset (available on request) of American League data pulled from Baseball-Reference.com&#8217;s <a href="http://www.baseball-reference.com/leagues/">Leagues page</a>.  I&#8217;m using the AL only because I don&#8217;t want to correct for the designated hitter&#8217;s differential runs.</p>
<p>The first thing I need to do is decide whether to add a trend correction.</p>
<p style="text-align:center;"><a href="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg"><img class="alignnone size-medium wp-image-218" title="Alruntrend" src="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg?w=300&h=181" alt="Trend of league run total, 2000-2009" width="300" height="181" /></a></p>
<p>I don&#8217;t have to account for a time trend, so I&#8217;m just going to use the team-level data. Using linear regression, I fitted the model above and got the following output:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr style="text-align:center;">
<td width="64" height="20"></td>
<td width="64">Value</td>
<td width="64">Std Err</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">Intercept</td>
<td>-904.638</td>
<td>51.68286</td>
<td>-17.504</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">OBP</td>
<td>2893.123</td>
<td>233.7059</td>
<td>12.379</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">SLG</td>
<td>1601.076</td>
<td>122.3527</td>
<td>13.086</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">SB</td>
<td>-0.01907</td>
<td>0.06415</td>
<td>-0.297</td>
<td>0.76680</td>
<td>0.23320</td>
</tr>
<tr>
<td height="20">SF</td>
<td>0.65975</td>
<td>0.25356</td>
<td>2.602</td>
<td>0.01030</td>
<td>0.98970</td>
</tr>
<tr>
<td height="20">SH</td>
<td>0.28282</td>
<td>0.17445</td>
<td>1.621</td>
<td>0.10730</td>
<td>0.89270</td>
</tr>
</tbody>
</table>
<p>Multiple R-squared: 0.9164,     Adjusted R-squared: 0.9132</p>
<p>It looks like OBP and SLG are in fact highly significant, with each sac fly corresponding to about two-thirds of a run scored, a sac bunt corresponding to about .28 runs scored, and a stolen base actually having a negative effect (but it&#8217;s only significant at about the 23% level, so we can&#8217;t be sure it&#8217;s actually different from zero). This model explains about 91% of the variation in run scoring, which is reasonable since it ignores pitching and defense entirely.</p>
<p>This could be tightened up a bit, but as it stands it gives us a reasonable idea of how runs are produced.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/203/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=203&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/19/modeling-run-production/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg?w=300" medium="image">
			<media:title type="html">Alruntrend</media:title>
		</media:content>
	</item>
		<item>
		<title>Trends in DH use</title>
		<link>http://tomflesher.com/2010/06/11/trends-in-dh-use/</link>
		<comments>http://tomflesher.com/2010/06/11/trends-in-dh-use/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 19:56:16 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[designated hitter]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Interleague play]]></category>
		<category><![CDATA[Mets]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[sports economics]]></category>
		<category><![CDATA[Stuff Keith Hernandez Says]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=181</guid>
		<description><![CDATA[Last night, Keith Hernandez was talking about how the Mets are scheduled to play in American League parks starting, well, today. He pointed out that the Mets will be in a bit of a pickle because they aren&#8217;t built, as AL teams are, to carry one big hitter to be the full-time DH. Instead, an [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=181&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last night, Keith Hernandez was talking about how the Mets are scheduled to play in American League parks starting, well, today. He pointed out that the Mets will be in a bit of a pickle because they aren&#8217;t built, as AL teams are, to carry one big hitter to be the full-time DH. Instead, an NL team will be forced to spread the wealth among lighter hitters who are carried for their defensive acumen as well as their offensive prowess. Keith then corrected himself and said that AL managers are using the DH differently &#8211; to rest individual players instead of having an everyday DH.</p>
<p>That pinged my &#8220;Stuff Keith Hernandez says&#8221; meter, and so I decided to crunch some numbers and see if that&#8217;s true. I interpreted Keith&#8217;s statement as implying that the number of designated hitters should be increasing, since managers are moving away from an everyday DH and toward spreading the DH assignments around a bit more. The crunching also needs to account for interleague play, which should obviously increase the number of DHes. So, after controlling for interleague play, does DH use show an increasing trend with time?</p>
<p><span id="more-181"></span>To set up the regression, I modified an existing data set I had to include a variable for the number of people with at least one at-bat as a designated hitter (culled from <a href="http://www.baseball-reference.com/play-index">baseball-reference.com/play-index</a>). B-R.com didn&#8217;t have a listing for 1973, so I noted that 1974 had 106 DHs and 1975 had 107 and made an educated guess (that would be consistent with Keith&#8217;s statement) that 1973 had 105. Then, I added a binary variable <em>Inter </em>which took value 1 if there was interleague play that year and value 0 otherwise. Finally, I created time variables <em>DHt</em> (starts at 1 in 1973 and increases with each year), <em>Intert</em> (starts at 1 in 1997 and increases with each year), and squares of both of the time variables. My dependent variable is the number of players with at least one at-bat as a designated hitter (<em>DHes</em>) divided by the number of teams playing with the DH rule (<em>DHTms</em>). Finally, armed with <a href="http://tomflesher.com/docs/MLB19552009.txt">this dataset</a>, I pushed the numbers through R and came out with this result:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr style="text-align:right;">
<td width="64" height="20"></td>
<td width="64"><strong>Estimate</strong></td>
<td width="64"><strong>Std Error</strong></td>
<td width="64"><strong>t value</strong></td>
<td width="64"><strong>p value</strong></td>
<td width="64"><strong>Signif</strong></td>
</tr>
<tr>
<td height="20"><em>B0</em></td>
<td align="right">0.00483</td>
<td align="right">0.06735</td>
<td align="right">0.07200</td>
<td align="right">0.94295</td>
<td align="right">0.05706</td>
</tr>
<tr>
<td height="20"><em>DHt</em></td>
<td align="right">-0.19479</td>
<td align="right">0.07961</td>
<td align="right">-2.44700</td>
<td align="right">0.01610</td>
<td align="right">0.98390</td>
</tr>
<tr>
<td height="20"><em>DHtsq</em></td>
<td align="right">0.00600</td>
<td align="right">0.00299</td>
<td align="right">2.00600</td>
<td align="right">0.04753</td>
<td align="right">0.95247</td>
</tr>
<tr>
<td height="20"><em>DHTms</em></td>
<td align="right">0.74367</td>
<td align="right">0.03300</td>
<td align="right">22.53400</td>
<td align="right">0.00000</td>
<td align="right">1.00000</td>
</tr>
<tr>
<td height="20"><em>Inter</em></td>
<td align="right">3.08814</td>
<td align="right">0.65227</td>
<td align="right">4.73400</td>
<td align="right">0.00001</td>
<td align="right">0.99999</td>
</tr>
<tr>
<td height="20"><em>Intert</em></td>
<td align="right">0.44171</td>
<td align="right">0.19733</td>
<td align="right">2.23800</td>
<td align="right">0.02734</td>
<td align="right">0.97266</td>
</tr>
<tr>
<td height="20"><em>Intertsq</em></td>
<td align="right">-0.04639</td>
<td align="right">0.01321</td>
<td align="right">-3.51200</td>
<td align="right">0.00066</td>
<td align="right">0.99934</td>
</tr>
</tbody>
</table>
<p>Some caveats are in order. First of all, according to a Breusch-Pagan test, the error terms are absolutely heteroskedastic (that is, they&#8217;re correlated to something that I haven&#8217;t accounted for in my data). Second, I have an R[sup]2[/sup] of .9884, meaning that this data explains almost 99% of the variance in the number of designated hitters used. That&#8217;s a lot of explanatory value, and usually means you&#8217;re doing a regression that looks like &#8220;Right shoes = B0 + B1 Price + B2 Left shoes + error term&#8221; &#8211; that is, one where you&#8217;re missing some obvious highly correlated term. I&#8217;m not sure what that term might be, though. Also, there isn&#8217;t really enough data from interleague play to run robust time series analysis on it.</p>
<p>However, we can make some statements. First of all, interleague play adds about 43 designated hitters, or about 2.68 per National League team although that probably varies by the number of series played. Second, DHes per team decreased until they hit a minimum in 1989 and then began increasing again in terms of time series. What do you know? Keith might have been right after all.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/181/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=181&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/11/trends-in-dh-use/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>The DH Redux: Japan</title>
		<link>http://tomflesher.com/2010/06/07/the-dh-redux-japan/</link>
		<comments>http://tomflesher.com/2010/06/07/the-dh-redux-japan/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 19:42:24 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[baseballguru.com]]></category>
		<category><![CDATA[designated hitter]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[NPB]]></category>
		<category><![CDATA[OBP]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=152</guid>
		<description><![CDATA[In an earlier post, I analyzed team-level data from Major League Baseball to determine the size of the effect that the Designated Hitter rule has on on-base percentage. The conclusion I came to was that, if the model is properly specified, the effect of the designated hitter rule is about .008 in on-base percentage. If [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=152&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In an earlier post, I analyzed team-level data from Major League Baseball to determine the size of the effect that the Designated Hitter rule has on on-base percentage. The conclusion I came to was that, if the model is properly specified, the effect of the designated hitter rule is about .008 in on-base percentage. If the reasoning was correct, then when there are no other confounding variables, the effect should be similar in size for any other professional league.</p>
<p>Of course, the other major professional league is <a href="http://en.wikipedia.org/wiki/Nippon_Professional_Baseball">Nippon Professional Baseball</a>, the major leagues of Japan. Since it produces players at a level similar to MLB, and the other factors are similar &#8211; the DH rule was adopted in 1975 by one, but not both, of the two major leagues &#8211; NPB is an ideal place to try to test the model I specified in <a href="http://tomflesher.com/2010/05/what-is-the-effect-of-the-">this post</a>.</p>
<p><span id="more-152"></span>I&#8217;m working with a <a href="http://tomflesher.com/docs/japanseasons.txt">dataset</a> pulled from Jim Albright&#8217;s BaseballGuru.com <a href="http://baseballguru.com/jalbright/stats.html">Japanese Baseball Data archive</a>.  First, note that OBP, HBP, and SF data weren&#8217;t readily available. As a result, I&#8217;m approximating OBP by adding BB + H and dividing by AB + H. This neglects hit batsmen and sacrifice flies, so OBP is off by a shade. Since I have no idea how prevalent HBP and SF are in Japan, I can&#8217;t say whether OBP is overstated or understated. Second, it&#8217;s worth stating that there may be non-economic concerns such as strategy preferences (i.e., tastes) that may explain a similar result. Third, I don&#8217;t have enough data for the Japanese leagues to determine if the leagues are in fact statistically similar. However, with all that in mind, the DH rule is the same in NPB as it is in MLB, so the effect should be of a similar sign and magnitude.</p>
<p>Once again, we&#8217;re testing the null hypothesis of</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cbeta_%7B3%7D+%3D+0&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;beta_{3} = 0' title='&#92;beta_{3} = 0' class='latex' /></p>
<p>using a regression of the form:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BOBP%7D+%3D+%5Chat%7B%5Cbeta%7D_%7B0%7D+%2B+%5Chat%7B%5Cbeta%7D_%7B1%7Dt+%2B++%5Chat%7B%5Cbeta%7D_%7B2%7Dt%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta%7D_%7B3%7DDH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{OBP} = &#92;hat{&#92;beta}_{0} + &#92;hat{&#92;beta}_{1}t +  &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}DH ' title='&#92;hat{OBP} = &#92;hat{&#92;beta}_{0} + &#92;hat{&#92;beta}_{1}t +  &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}DH ' class='latex' /></p>
<p>Since I have data back to 1937, t begins with t=0 that year instead of in 1955 as with the MLB data.</p>
<p>Using <a href="http://cran.r-project.org/">R</a>, I ran the regression with the following results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64"><strong>Estimate</strong></td>
<td width="64"><strong>Std. Error</strong></td>
<td width="64"><strong>t value</strong></td>
<td width="64"><strong>Pr(&gt;|t|)</strong></td>
<td width="64"><strong>Significance</strong></td>
</tr>
<tr>
<td height="20"><em>(Intercept)</em></td>
<td align="right">0.03064</td>
<td align="right">0.00471</td>
<td align="right">65.07800</td>
<td align="right">0.00000</td>
<td align="right">1.00000</td>
</tr>
<tr>
<td height="20"><em>t</em></td>
<td align="right">-0.00050</td>
<td align="right">0.00030</td>
<td align="right">-1.69800</td>
<td align="right">0.09224</td>
<td align="right">0.90776</td>
</tr>
<tr>
<td height="20"><em>tsq</em></td>
<td align="right">0.00001</td>
<td align="right">0.00000</td>
<td align="right">3.11300</td>
<td align="right">0.00233</td>
<td align="right">0.99767</td>
</tr>
<tr>
<td height="20"><em>DH</em></td>
<td align="right">0.01097</td>
<td align="right">0.00354</td>
<td align="right">3.17700</td>
<td align="right">0.00191</td>
<td align="right">0.99809</td>
</tr>
</tbody>
</table>
<p>Multiple R-squared: 0.3929,     Adjusted R-squared: 0.3772<br />
F-statistic: 25.02 on 3 and 116 DF,  p-value: 1.475e-12</p>
<p>The regression had a Breusch-Pagan p-value of .5372, meaning the data are homoskedastic (which is good news for us). The adjusted R-squared shows that this regression explains about 38% of the variation in pseudo-OBP using our variables. Let&#8217;s look at how the effects stack up against each other:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"><strong><br />
</strong></td>
<td width="64"><strong>MLB</strong></td>
<td width="64"><strong>NPB</strong></td>
<td width="64"><strong>∆</strong></td>
<td width="64"><strong>S.E. MLB</strong></td>
<td width="64"><strong>S.E. NPB</strong></td>
</tr>
<tr>
<td height="20"><em>(Intercept)</em></td>
<td align="right">0.32310</td>
<td align="right">0.03064</td>
<td align="right">0.29246</td>
<td align="right">130.3879</td>
<td align="right">62.1198</td>
</tr>
<tr>
<td height="20"><em>t</em></td>
<td align="right">-0.00047</td>
<td align="right">-0.00050</td>
<td align="right">0.00003</td>
<td align="right">0.175</td>
<td align="right">0.111074</td>
</tr>
<tr>
<td height="20"><em>tsq</em></td>
<td align="right">0.000013</td>
<td align="right">0.00001</td>
<td align="right">0.00000</td>
<td align="right">0.083333</td>
<td align="right">0.06105</td>
</tr>
<tr>
<td height="20"><em>DH</em></td>
<td align="right">0.008036</td>
<td align="right">0.01097</td>
<td align="right">-0.00293</td>
<td align="right">1.74955</td>
<td align="right">0.82835</td>
</tr>
</tbody>
</table>
<p>The big surprises are, first of all, the fact that the difference in DH terms is so high in terms of MLB standard errors, and second of all, the difference in the intercept. The easy one: The starting times are different, so the intercept is of very little interest to us. As for the difference, there is more baseline data in the NPB dataset since it extends to 1937 instead of only to 1955. Second, the much larger MLB standard error obviously causes trouble here. However, the values are still within the 95% confidence intervals for each other&#8217;s standard errors, which means that we cannot reject the hypothesis that they are equal. The signs are the same and the magnitude is similar, and, again, we&#8217;re looking at pseudo-OBP for the NPB instead of professionally-calculated OBP.</p>
<p>If I can find data on HBP and SF, it will be interesting to examine the data more closely.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/152/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=152&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/07/the-dh-redux-japan/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Does the DH Rule Cause Batters to be Hit?</title>
		<link>http://tomflesher.com/2010/06/02/does-the-dh-rule-cause-batters-to-be-hit/</link>
		<comments>http://tomflesher.com/2010/06/02/does-the-dh-rule-cause-batters-to-be-hit/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 16:28:55 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[designated hitter]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[hit by pitch]]></category>
		<category><![CDATA[Kevin Youkilis]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[sports economics]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=136</guid>
		<description><![CDATA[In an earlier post, I crunched some numbers on the Designated Hitter rule and came to the conclusion that the DH adds about .3 extra trips to first base per game after accounting for trend. I&#8217;m going to play around with another stat that a lot of people seem to think should be affected indirectly [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=136&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In an earlier post, I crunched some numbers on the Designated Hitter rule and came to the conclusion that the DH adds about .3 extra trips to first base per game after accounting for trend. I&#8217;m going to play around with another stat that a lot of people seem to think should be affected indirectly by the DH rule.</p>
<p>The Conventional Wisdom™ is that the DH should increase hit batsman. The argument is that pitchers don&#8217;t bear the costs of hitting a batter with a pitch because they don&#8217;t bat, so they&#8217;ll be less careful to avoid hitting a batter or more likely to plunk a batter out of malice. Do the numbers bear that out?</p>
<p><span id="more-136"></span>To attack this question, I&#8217;m using the same dataset I used in the earlier post &#8211; the <a href="http://tomflesher.com/docs/MLB19552010.txt">per-game average data for each league since 1954</a>, with an added dummy variable for whether the DH rule was in effect that year, and with time normalized to begin with 1955 and an added quadratic term. (I pulled it from <a href="http://www.baseball-reference.com/">Baseball-Reference.com</a>.) I started using the same variables as the previous post:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHBP%7D+%3D+%5Chat%7B%5Cbeta%7D_%7B0%7D+%2B+%5Chat%7B%5Cbeta%7D_%7B1%7Dt+%2B+%5Chat%7B%5Cbeta%7D_%7B2%7Dt%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta%7D_%7B3%7DDH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HBP} = &#92;hat{&#92;beta}_{0} + &#92;hat{&#92;beta}_{1}t + &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}DH ' title='&#92;hat{HBP} = &#92;hat{&#92;beta}_{0} + &#92;hat{&#92;beta}_{1}t + &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}DH ' class='latex' /></p>
<p>That is, check for a trend and then after controlling for that check to see if there is a significant effect based on the DH rule. However, it occurred to me that there might be an experience effect &#8211; if more players are showing up in the league, you might get matching effects for pitchers with no control hitting batters and for batter with no experience crowding the plate because they haven&#8217;t been trained not to. I added a term for the number of batters in the league to control for that:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHBP%7D+%3D+%5Chat%7B%5Cbeta%7D_%7B0%7D+%2B%5Chat%7B%5Cbeta%7D_%7B1%7Dt+%2B++%5Chat%7B%5Cbeta%7D_%7B2%7Dt%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta%7D_%7B3%7DBatters+%2B+%5Chat%7B%5Cbeta%7D_%7B4%7DDH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HBP} = &#92;hat{&#92;beta}_{0} +&#92;hat{&#92;beta}_{1}t +  &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}Batters + &#92;hat{&#92;beta}_{4}DH ' title='&#92;hat{HBP} = &#92;hat{&#92;beta}_{0} +&#92;hat{&#92;beta}_{1}t +  &#92;hat{&#92;beta}_{2}t^{2} + &#92;hat{&#92;beta}_{3}Batters + &#92;hat{&#92;beta}_{4}DH ' class='latex' /></p>
<p>The regression output was:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t value</td>
<td width="64">Pr(&gt;|t|)</td>
<td width="64"></td>
</tr>
<tr>
<td height="20">(Intercept)</td>
<td align="right">0.11060</td>
<td align="right">0.02172</td>
<td align="right">5.092</td>
<td align="right">1.53E-06</td>
<td>***</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">-0.00838</td>
<td align="right">0.00091</td>
<td align="right">-9.159</td>
<td align="right">4.08E-15</td>
<td>***</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">0.00015</td>
<td align="right">0.00001</td>
<td align="right">10.792</td>
<td>&lt; 2E-16</td>
<td>***</td>
</tr>
<tr>
<td height="20">Batters</td>
<td align="right">0.00044</td>
<td align="right">0.00007</td>
<td align="right">6.498</td>
<td align="right">2.65E-09</td>
<td>***</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.08086</td>
<td align="right">0.01300</td>
<td align="right">6.22</td>
<td align="right">9.83E-09</td>
<td>***</td>
</tr>
</tbody>
</table>
<p>Residual standard error: 0.03256 on 107 degrees of freedom<br />
Multiple R-squared: 0.8038,     Adjusted R-squared: 0.7965<br />
F-statistic: 109.6 on 4 and 107 DF,  p-value: &lt; 2.2e-16</p>
<p>The Batters term (and the other three terms) are all statistically significant at the 99% level. These variables explain around 80% of the variation in HBP per game, based on the R-squared statistic. The Breusch-Pagan test, with a null hypothesis of no heteroskedasticity, has a p-value of .2 &#8211; not enough to reject that null hypothesis, so ordinary least squares are appropriate here.</p>
<p>After controlling for time and the effect of talent pool dilution, the designated hitter rule represents about .08 hit batsmen per game, or roughly one hit batsman every 12.5 games, which translates to about 13 additional hit batsmen over the course of a team&#8217;s season. (Of course, that effect could be almost entirely explained by <strong><a href="http://www.baseball-reference.com/players/y/youklke01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Kevin  Youkilis</a></strong> stubbornly refusing to back off home plate.)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/136/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=136&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/02/does-the-dh-rule-cause-batters-to-be-hit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>What is the effect of the Designated Hitter?</title>
		<link>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/</link>
		<comments>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/#comments</comments>
		<pubDate>Sun, 30 May 2010 22:36:37 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[designated hitter]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=95</guid>
		<description><![CDATA[Intuitively, the designated hitter rule seems like it should increase scoring. By getting on base more often than the pitcher would have, the designated hitter helps produce runs by hitting, by being on base so that other players can drive him in, and by not accumulating outs by bunting or striking out as often as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=95&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Intuitively, the designated hitter rule seems like it should increase scoring. By getting on base more often than the pitcher would have, the designated hitter helps produce runs by hitting, by being on base so that other players can drive him in, and by not accumulating outs by bunting or striking out as often as the pitcher does. However, there should be a corresponding effect from having pitchers left in the game longer: a better pitcher who remains in the game might get more outs than a reliever who came in simply because the manager pinch-hit for the starting pitcher because he needed offense.</p>
<p>Behind the cut, I&#8217;ll explain the testing I did to determine whether the effect of a DH is positive (hint: it is) and look at how big an effect is actually there.</p>
<p><span id="more-95"></span>MLB is the perfect setting for natural experiments about the DH rule for obvious reasons &#8211; the American League uses it, the National League doesn&#8217;t, and the talent pool is exactly the same. There are very few restrictions on player transfers between the leagues, so players are probably as good as randomly assigned to the leagues. With that in mind, if there is a difference between the leagues, then it can probably be attributed to the DH rule.</p>
<p>Using <a href="http://www.baseball-reference.com/">Baseball-Reference.com</a>, I pulled <a href="http://tomflesher.files.wordpress.com/2010/05/mlb19552009.doc">this dataset</a> of batting by league from both leagues from1955 on (with 1955 chosen because it&#8217;s the first year that all of B-R.com&#8217;s data was available). I changed Year to t and subtracted 1954 so that I could do a trend analysis and added a binary variable called &#8220;DH&#8221; that took value 1 if the Designated Hitter rule was used and 0 otherwise. Assuming the leagues are otherwise identical, my null hypothesis is that <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%28DH%29+%3D+0&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;beta(DH) = 0' title='&#92;beta(DH) = 0' class='latex' />; that is, the effect of the DH rule is nonexistent.</p>
<p>I used <a href="http://cran.r-project.org">R</a> to run the following regression on the data:</p>
<p><img src='http://s0.wp.com/latex.php?latex=OBP+%3D+t+%2B+t%5E2+%2B+DH&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='OBP = t + t^2 + DH' title='OBP = t + t^2 + DH' class='latex' /></p>
<p>and got the following results:</p>
<p>Call:<br />
lm(formula = OBP ~ t + tsq + DH)</p>
<p>Residuals:<br />
Min         1Q     Median         3Q        Max</p>
<p>-0.0219984 -0.0041721  0.0003126  0.0048915  0.0187776</p>
<table border="0" cellspacing="0" cellpadding="0" width="421">
<tbody>
<tr>
<td colspan="2" width="149" height="20">Coefficients:</td>
<td width="77"></td>
<td width="74"></td>
<td width="121"></td>
</tr>
<tr>
<td height="20"></td>
<td>Estimate</td>
<td>Std. Error</td>
<td>t value</td>
<td>Pr(&gt;|t|)</td>
</tr>
<tr>
<td height="20">(Intercept)</td>
<td>0.323100</td>
<td>0.002243</td>
<td>144.055</td>
<td>&lt; 2e-16 ***</td>
</tr>
<tr>
<td height="20">t</td>
<td>-0.000470</td>
<td>0.000188</td>
<td>-2.503</td>
<td>0.013827 *</td>
</tr>
<tr>
<td height="20">tsq</td>
<td>0.000013</td>
<td>0.000003</td>
<td>4.039</td>
<td>0.000101 ***</td>
</tr>
<tr>
<td height="20">DH</td>
<td>0.008036</td>
<td>0.001677</td>
<td>4.793</td>
<td>5.27e-06 ***</td>
</tr>
</tbody>
</table>
<p>The *** suffix indicates significance at the 99% level. A Breusch-Pagan test for heteroskedasticity returned a BP stat of 3.0789 and a p-value of .3796, which means we cannot reject the null hypothesis of homoskedasticity (that is, the tests work for this data).</p>
<p>Across MLB, OBP is increasing with time, and the DH rule adds roughly .008 to the league&#8217;s average OBP after accounting for an increasing time trend in OBP. .008 is roughly .8%, meaning you&#8217;d get slightly less than one additional trip to first in 100 plate appearances. Assuming a leaguewide mean of 38.5 plate appearances per team per game, that translates to about .3 extra trips to first per game.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/95/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&#038;blog=20518139&#038;post=95&#038;subd=tomflesher&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
	</channel>
</rss>
