<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Tom Flesher &#187; R</title>
	<atom:link href="http://tomflesher.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://tomflesher.com</link>
	<description>Mercenary Educator and Bad Economist</description>
	<lastBuildDate>Mon, 14 Mar 2011 03:02:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='tomflesher.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Tom Flesher &#187; R</title>
		<link>http://tomflesher.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://tomflesher.com/osd.xml" title="Tom Flesher" />
	<atom:link rel='hub' href='http://tomflesher.com/?pushpress=hub'/>
		<item>
		<title>Are This Year&#039;s Home Runs Really That Different?</title>
		<link>http://tomflesher.com/2010/12/22/are-this-years-home-runs-really-that-different/</link>
		<comments>http://tomflesher.com/2010/12/22/are-this-years-home-runs-really-that-different/#comments</comments>
		<pubDate>Thu, 23 Dec 2010 01:23:06 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[Carlos Pena]]></category>
		<category><![CDATA[Carlos Quentin]]></category>
		<category><![CDATA[home run distributions]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[Jose Bautista]]></category>
		<category><![CDATA[kurtosis]]></category>
		<category><![CDATA[Mark Teixeira]]></category>
		<category><![CDATA[Miguel Cabrera]]></category>
		<category><![CDATA[Paul Konerko]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[skewness]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=470</guid>
		<description><![CDATA[This year&#8217;s home runs are quite confounding. On the one hand, home runs per game in the AL have dropped precipitously (as noted and examined in the two previous posts). On the other hand, Jose Bautista had an absolutely outstanding year. How much different is this year&#8217;s distribution than those of previous years? To answer [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=470&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://tomflesher.files.wordpress.com/2010/12/001u4334_josc3a9_bautista1.jpg"><img class="size-thumbnail wp-image-469 alignleft" title="001U4334" src="http://tomflesher.files.wordpress.com/2010/12/001u4334_josc3a9_bautista1.jpg?w=135&#038;h=150" alt="" width="135" height="150" /></a>This year&#8217;s home runs are quite confounding. On the one hand, home runs per game in the AL have dropped precipitously (as noted and examined in the two previous posts). On the other hand, <strong><a href="http://www.baseball-reference.com/players/b/bautijo02.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Jose  Bautista</a></strong> had an absolutely outstanding year. How much different is this year&#8217;s distribution than those of previous years? To answer that question, I took off to Baseball Reference and found the list of all players with at least one plate appearance, sorted by home runs.</p>
<p>There are several parameters that are of interest when discussing the distribution of events. Th<a href="http://tomflesher.files.wordpress.com/2010/12/alhr2010dist1.jpg"><img class="alignright size-thumbnail wp-image-468" title="alhr2010dist" src="http://tomflesher.files.wordpress.com/2010/12/alhr2010dist1.jpg?w=150&#038;h=150" alt="" width="150" height="150" /></a>e first is the mean. This year&#8217;s mean was 5.43, meaning that of the players with at least one plate appearance, on average each one hit 5.43 homers. That&#8217;s down from 6.53 last year and 5.66 in 2008.</p>
<p>Next, consider the <a href="http://en.wikipedia.org/wiki/Variance">variance</a> and <a href="http://en.wikipedia.org/wiki/Standard_deviation">standard deviation</a>. (The variance is the standard deviation squared, so the numbers derive similarly.) A low variance means that the numbers are clumped tightly around the mean. This year&#8217;s variance was 68.4, down from last year&#8217;s 84.64 but up from 2008&#8242;s 66.44.</p>
<p>The <a href="http://en.wikipedia.org/wiki/Skewness">skewness</a> and <a href="http://en.wikipedia.org/wiki/">kurtosis</a> represent the length and thickness of the tails, respectively. Since a lot of people have very <a href="http://tomflesher.files.wordpress.com/2010/12/alhr2009dist1.jpg"><img class="size-thumbnail wp-image-467 alignleft" title="alhr2009dist" src="http://tomflesher.files.wordpress.com/2010/12/alhr2009dist1.jpg?w=150&#038;h=150" alt="" width="150" height="150" /></a>few home runs, the skewness of every year&#8217;s distribution is going to be positive. Roughly, that means that there are observations far larger than the mean, but very few that are far smaller. That makes sense, since there&#8217;s no such thing as a negative home run total. The kurtosis number represents how pointy the distribution is, or alternatively how much of the distribution is found in the tail.</p>
<p>For example, in 2009, <strong><a href="http://www.baseball-reference.com/players/t/teixema01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Mark  Teixeira</a></strong> and <strong><a href="http://www.baseball-reference.com/players/p/penaca01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Carlos  Pena</a></strong> jointly led the American League in home runs with 39. There was a high mean, but the tail was relatively thin with a <a href="http://tomflesher.files.wordpress.com/2010/12/alhr2008dist1.jpg"><img class="alignright size-thumbnail wp-image-466" title="alhr2008dist" src="http://tomflesher.files.wordpress.com/2010/12/alhr2008dist1.jpg?w=150&#038;h=150" alt="" width="150" height="150" /></a>high variance. Compared with this year, when Bautista led his nearest competitor (<strong><a href="http://www.baseball-reference.com/players/k/konerpa01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Paul  Konerko</a></strong>) by 15 runs and only 8 players were over 30 home runs, 2009 saw 15 players above 30 home runs with a pretty tight race for the lead. Kurtosis in 2010 was 7.72 compared with 2009&#8242;s 4.56 and 2008&#8242;s 5.55. (In 2008, 11 players were above the 30-mark, and <strong><a href="http://www.baseball-reference.com/players/c/cabremi01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Miguel  Cabrera</a></strong>&#8216;s 37 home runs edged <strong><a href="http://www.baseball-reference.com/players/q/quentca01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Carlos  Quentin</a></strong> by just one.)</p>
<p>The numbers say that 2008 and 2009 were much more similar than either of them is to 2010. A quick look at the distributions bears that out &#8211; this was a weird year.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/470/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/470/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/470/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=470&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/22/are-this-years-home-runs-really-that-different/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/12/001u4334_josc3a9_bautista1.jpg?w=135" medium="image">
			<media:title type="html">001U4334</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/12/alhr2010dist1.jpg?w=150" medium="image">
			<media:title type="html">alhr2010dist</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/12/alhr2009dist1.jpg?w=150" medium="image">
			<media:title type="html">alhr2009dist</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/12/alhr2008dist1.jpg?w=150" medium="image">
			<media:title type="html">alhr2008dist</media:title>
		</media:content>
	</item>
		<item>
		<title>Diagnosing the AL</title>
		<link>http://tomflesher.com/2010/12/22/diagnosing-the-al/</link>
		<comments>http://tomflesher.com/2010/12/22/diagnosing-the-al/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 21:20:26 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[2010]]></category>
		<category><![CDATA[American League]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=463</guid>
		<description><![CDATA[In the previous post, I crunched some numbers on a previous forecast I&#8217;d made and figured out that it was a pretty crappy forecast. (That&#8217;s the fun of forecasting, of course &#8211; sometimes you&#8217;re right and sometimes you&#8217;re wrong.) The funny part of it, though, is that the predicted home runs per game for the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=463&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the previous post, I crunched some numbers on a previous forecast I&#8217;d made and figured out that it was a pretty crappy forecast. (That&#8217;s the fun of forecasting, of course &#8211; sometimes you&#8217;re right and sometimes you&#8217;re wrong.) The funny part of it, though, is that the predicted home runs per game for the American League was so far off &#8211; 3.4 standard errors below the predicted value &#8211; that it&#8217;s highly unlikely that the regression model I used controls for all relevant variables. That&#8217;s not surprising, since it was only a time trend with a dummy variable for the designated hitter.</p>
<p>There are a couple of things to check for immediately. The first is the most common explanation thrown around when home runs drop &#8211; steroids. It seems to me that if the drop in home runs were due to better control of performance-enhancing drugs, then it should mostly be home runs that are affected. For example, intentional walks should probably be below expectation, since intentional walks are used to protect against a home run hitter. Unintentional walks should probably be about as expected, since walks are a function of plate discipline and pitcher control, not of strength. On-base percentage should probably drop at a lower magnitude than home runs, since some hits that would have been home runs will stay in the park as singles, doubles, or triples. Finally, slugging average should drop because a loss in power without a corresponding increase in speed will lower total bases.</p>
<p>I&#8217;ll analyze these with pretty new R code behind the cut.</p>
<p><span id="more-463"></span>Using R, I fitted time-series models of the same functional form as the home runs per game model. I pulled the data from the Baseball-Reference.com AL Batting Encyclopedia and regressed the variable of interest on a time trend, its square, and a dummy for the designated hitter.</p>
<p><span style="text-decoration:underline;"><strong>First Assumption:</strong></span> Intentional walks should decrease.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; ibb.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>IBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>ibb.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = IBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.1350376</span> -<span style="color:#cc66cc;">0.0261969</span>  <span style="color:#cc66cc;">0.0005516</span>  <span style="color:#cc66cc;">0.0294412</span>  <span style="color:#cc66cc;">0.1534536</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  2.656e-01  1.408e-02  <span style="color:#cc66cc;">18.870</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>            8.037e-03  1.199e-03   <span style="color:#cc66cc;">6.706</span> 1.01e-09 ***
tsq         -1.393e-04  2.024e-05  -<span style="color:#cc66cc;">6.882</span> 4.30e-10 ***
DH          -1.140e-01  1.055e-02 -<span style="color:#cc66cc;">10.805</span>  &lt; 2e-16 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.04689</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.5961</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.5847</span>
F-statistic: <span style="color:#cc66cc;">52.14</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: &lt; 2.2e-16
 
&gt; ibb.2010.fitted &lt;- <span style="color:#009900;">(</span>2.656e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>8.037e-03<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>-1.393e-04<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-1.140e-01<span style="color:#009900;">)</span>
&gt; ibb.2010.obs &lt;- <span style="color:#cc66cc;">.2</span>
&gt; residual.ibb &lt;- ibb.2010.obs - ibb.2010.fitted
&gt; se.ibb &lt;- <span style="color:#cc66cc;">.04689</span>
&gt; residual.ibb/se.ibb
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> <span style="color:#cc66cc;">0.750113</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>Intentional walks per game increased, but the increase was by less than one standard error. Statistically, intentional walks did not change.</p>
<p><strong><span style="text-decoration:underline;">Second Assumption:</span></strong> Unintentional walks should not change.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; uBB &lt;- <span style="color:#009900;">(</span>BB-IBB<span style="color:#009900;">)</span>
&gt; ubb.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>uBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>ubb.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = uBB ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
     Min       1Q   Median       3Q      Max
-<span style="color:#cc66cc;">0.69256</span> -<span style="color:#cc66cc;">0.12758</span> -<span style="color:#cc66cc;">0.01390</span>  <span style="color:#cc66cc;">0.13178</span>  <span style="color:#cc66cc;">0.77866</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  <span style="color:#cc66cc;">3.0879505</span>  <span style="color:#cc66cc;">0.0732669</span>  <span style="color:#cc66cc;">42.147</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -<span style="color:#cc66cc;">0.0190285</span>  <span style="color:#cc66cc;">0.0062392</span>  -<span style="color:#cc66cc;">3.050</span> <span style="color:#cc66cc;">0.002892</span> **
tsq          <span style="color:#cc66cc;">0.0003623</span>  <span style="color:#cc66cc;">0.0001054</span>   <span style="color:#cc66cc;">3.439</span> <span style="color:#cc66cc;">0.000837</span> ***
DH           <span style="color:#cc66cc;">0.1812598</span>  <span style="color:#cc66cc;">0.0549094</span>   <span style="color:#cc66cc;">3.301</span> <span style="color:#cc66cc;">0.001313</span> **
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.2441</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.1876</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.1647</span>
F-statistic: <span style="color:#cc66cc;">8.162</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: 6.127e-05
 
&gt; ubb.2010.fitted &lt;- <span style="color:#cc66cc;">3.0879505</span> + <span style="color:#009900;">(</span>-<span style="color:#cc66cc;">.0190285</span><span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span><span style="color:#cc66cc;">.0003623</span><span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#cc66cc;">.1812598</span>
&gt; ubb.2010.obs &lt;- <span style="color:#cc66cc;">3.25</span> - <span style="color:#cc66cc;">.2</span>
&gt; residual.ubb &lt;- ubb.2010.obs - ubb.2010.fitted
&gt; se.ubb &lt;- <span style="color:#cc66cc;">.2441</span>
&gt; residual.ubb/se.ubb
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">1.187166</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>Unintentional walks decreased by a bit over one standard error. Again, that isn&#8217;t evidence of a big enough fluctuation to say that it&#8217;s statistically different from our expectation.</p>
<p><strong><span style="text-decoration:underline;">Third Assumption:</span></strong> OBP drops, but by somewhat less than 3.4 standard errors.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; obp.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>OBP ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>obp.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = OBP ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.0217348</span> -<span style="color:#cc66cc;">0.0044903</span>  <span style="color:#cc66cc;">0.0002799</span>  <span style="color:#cc66cc;">0.0046695</span>  <span style="color:#cc66cc;">0.0182481</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  3.238e-01  2.230e-03 <span style="color:#cc66cc;">145.199</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -5.703e-04  1.899e-04  -<span style="color:#cc66cc;">3.003</span>  <span style="color:#cc66cc;">0.00334</span> **
tsq          1.472e-05  3.207e-06   <span style="color:#cc66cc;">4.591</span> 1.22e-05 ***
DH           8.245e-03  1.671e-03   <span style="color:#cc66cc;">4.933</span> 3.02e-06 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.00743</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.487</span><span style="color:#339933;">,</span>      Adjusted R-squared: <span style="color:#cc66cc;">0.4724</span>
F-statistic: <span style="color:#cc66cc;">33.54</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: 2.532e-15
 
&gt; obp.2010.fitted &lt;- <span style="color:#009900;">(</span>3.238e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-5.703e-04<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>1.472e-05<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + 8.245e-03
&gt; obp.2010.obs &lt;- <span style="color:#cc66cc;">.327</span>
&gt; residual.obp &lt;- obp.2010.obs - obp.2010.fitted
&gt; se.obp &lt;- <span style="color:#cc66cc;">.00743</span>
&gt; residual.obp/se.obp
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">2.593556</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>OBP dropped, but it dropped by quite a bit. Without more information it&#8217;s hard to judge whether a change of this magnitude is due to better pitching or power being taken away from hitters.</p>
<p><strong><span style="text-decoration:underline;">Fourth Assumption:</span></strong> Slugging average will drop.</p>
<p><strong><span style="text-decoration:underline;">Results:</span></strong></p>
<div style="overflow:auto;">
<div class="geshifilter">
<pre class="r geshifilter-R" style="font-family:monospace;">&gt; slg.lm &lt;- <a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span>SLG ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
&gt; <a href="http://inside-r.org/r-doc/base/summary"><span style="color:#003399;font-weight:bold;">summary</span></a><span style="color:#009900;">(</span>slg.lm<span style="color:#009900;">)</span>
 
Call:
<a href="http://inside-r.org/r-doc/stats/lm"><span style="color:#003399;font-weight:bold;">lm</span></a><span style="color:#009900;">(</span><a href="http://inside-r.org/r-doc/stats/formula"><span style="color:#003399;font-weight:bold;">formula</span></a> = SLG ~ <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> + tsq + DH<span style="color:#009900;">)</span>
 
Residuals:
       Min         1Q     Median         3Q        Max
-<span style="color:#cc66cc;">0.0357646</span> -<span style="color:#cc66cc;">0.0087050</span> -<span style="color:#cc66cc;">0.0007988</span>  <span style="color:#cc66cc;">0.0115133</span>  <span style="color:#cc66cc;">0.0317497</span>
 
Coefficients:
              Estimate Std. Error <a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a> value Pr<span style="color:#009900;">(</span>&gt;|t|<span style="color:#009900;">)</span>
<span style="color:#009900;">(</span>Intercept<span style="color:#009900;">)</span>  3.937e-01  4.471e-03  <span style="color:#cc66cc;">88.050</span>  &lt; 2e-16 ***
<a href="http://inside-r.org/r-doc/base/t"><span style="color:#003399;font-weight:bold;">t</span></a>           -2.058e-03  3.807e-04  -<span style="color:#cc66cc;">5.404</span> 4.04e-07 ***
tsq          5.049e-05  6.429e-06   <span style="color:#cc66cc;">7.853</span> 3.51e-12 ***
DH           1.693e-02  3.351e-03   <span style="color:#cc66cc;">5.054</span> 1.82e-06 ***
---
Signif. codes:  <span style="color:#cc66cc;">0</span> ‘***’ <span style="color:#cc66cc;">0.001</span> ‘**’ <span style="color:#cc66cc;">0.01</span> ‘*’ <span style="color:#cc66cc;">0.05</span> ‘.’ <span style="color:#cc66cc;">0.1</span> ‘ ’ <span style="color:#cc66cc;">1</span>
 
Residual standard error: <span style="color:#cc66cc;">0.01489</span> on <span style="color:#cc66cc;">106</span> degrees of freedom
Multiple R-squared: <span style="color:#cc66cc;">0.6452</span><span style="color:#339933;">,</span>     Adjusted R-squared: <span style="color:#cc66cc;">0.6352</span>
F-statistic: <span style="color:#cc66cc;">64.27</span> on <span style="color:#cc66cc;">3</span> and <span style="color:#cc66cc;">106</span> DF<span style="color:#339933;">,</span>  p-value: &lt; 2.2e-16
 
&gt; slg.2010.fitted &lt;- <span style="color:#009900;">(</span>3.937e-01<span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>-2.058e-03<span style="color:#009900;">)</span>*<span style="color:#cc66cc;">56</span> + <span style="color:#009900;">(</span>5.049e-05<span style="color:#009900;">)</span>*<span style="color:#009900;">(</span><span style="color:#cc66cc;">56</span>**<span style="color:#cc66cc;">2</span><span style="color:#009900;">)</span> + <span style="color:#009900;">(</span>1.693e-02<span style="color:#009900;">)</span>
&gt; slg.2010.obs &lt;- <span style="color:#cc66cc;">.407</span>
&gt; residual.slg &lt;- slg.2010.obs - slg.2010.fitted
&gt; se.slg &lt;- <span style="color:#cc66cc;">.01489</span>
&gt; residual.slg/se.slg
<span style="color:#009900;">[</span><span style="color:#cc66cc;">1</span><span style="color:#009900;">]</span> -<span style="color:#cc66cc;">3.137585</span></pre>
</div>
</div>
<p><a title="Created by Pretty R at inside-R.org" href="http://www.inside-r.org/pretty-r">Created by Pretty R at inside-R.org</a></p>
<p>A drop in slugging average of over three standard errors indicates that we may be working with something that&#8217;s ruined hitters&#8217; power or that&#8217;s hurt their ability to hit in general. We have results that are consistent with either something harming power hitters specifically or hitters in general.</p>
<p>This isn&#8217;t evidence of steroid use. In fact, the same results would be consistent with a shift toward pitching talent. More work needs to be done on this year&#8217;s data before conclusions can be drawn. However, it does seem to indicate that, at least in the American League, the Year of the Pitcher narrative has some statistical foundation.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/463/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=463&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/22/diagnosing-the-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>What Happened to Home Runs This Year?</title>
		<link>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/</link>
		<comments>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 17:18:46 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[standard error]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=458</guid>
		<description><![CDATA[I was talking to Jim, the writer behind Apparently, I&#8217;m An Angels Fan, who&#8217;s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted &#8220;Year of the Pitcher&#8221; has affected home run production. Sure enough, on checking the AL Batting Encyclopedia at Baseball-Reference.com, production dropped [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=458&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I was talking to Jim, the writer behind <a href="http://apparentlyanangelsfan.wordpress.com">Apparently, I&#8217;m An Angels Fan</a>, who&#8217;s gamely trying to learn baseball because he wants to be just like me. Jim wondered aloud how much the vaunted &#8220;Year of the Pitcher&#8221; has affected home run production. Sure enough, on checking the <a href="http://www.baseball-reference.com/leagues/AL/bat.shtml">AL Batting  Encyclopedia</a> at <a href="http://www.baseball-reference.com">Baseball-Reference.com</a>, production dropped by about .15 home runs per game (from 1.13 to .97). Is that normal statistical variation or does it show that this year was really different?</p>
<p>In two previous posts, I <a title="Back when it was hard to hit 55…" href="http://worldsworstsportsblog.com/2010/07/08/back-when-it-was-hard-to-hit-55/">looked at the trend of home runs per game to examine Stuff Keith Hernandez Says</a> and then <a title="More on Home Runs Per Game" href="http://worldsworstsportsblog.com/2010/07/09/more-on-home-runs-per-game/">examined Japanese baseball&#8217;s data for evidence of structural break</a>. I used the Batting Encyclopedia to run a time-series regression for a quadratic trend and added a dummy variable for the Designated Hitter. I found that the time trend and DH control account for approximately 56% of the variation in home runs per year, and that the functional form is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911++%5Ctimes+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911  &#92;times DH ' title='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911  &#92;times DH ' class='latex' /></p>
<p>with t=1 in 1955, t=2 in 1956, and so on. That means t=56 in 2010. Consequently, we&#8217;d expect home run production per game in 2010 in the American League to be approximately</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+56+%2B+.0004+%5Ctimes+3136+%2B+.0911+%5Capprox+1.25+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times 56 + .0004 &#92;times 3136 + .0911 &#92;approx 1.25 ' title='&#92;hat{HR} = .957 - .0188 &#92;times 56 + .0004 &#92;times 3136 + .0911 &#92;approx 1.25 ' class='latex' /></p>
<p>That means we expected production to increase this year and it dropped precipitously, for a residual of -.28. The residual standard error on the original regression was .1092, so on 106 degrees of freedom, so the t-value using <a href="http://www.stat.tamu.edu/stat30x/zttables.php">Texas A&amp;M&#8217;s table</a> is 1.984 (approximating using 100 df). That means we can be 95% confident that the actual number of home runs should fall within .1092*1.984, or about .2041, of the expected value. The lower bound would be about 1.05, meaning we&#8217;re still significantly below what we&#8217;d expect. In fact, the observed number is about 3.4 standard errors below the expected number. In other words, we&#8217;d expect that to happen by chance less than .1% (that is, less than one tenth of one percent) of the time.</p>
<p>Clearly, something else is in play.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/458/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=458&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/22/what-happened-to-home-runs-this-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>More on Home Runs Per Game</title>
		<link>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/</link>
		<comments>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 14:35:26 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Chow test]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Japanese baseball]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rays]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=335</guid>
		<description><![CDATA[In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=335&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used on data in an otherwise identical league, we should see similar trends but no dropoff around 1995. As usual, for replication purposes I&#8217;m going to use Japan&#8217;s Pro Baseball leagues, the Pacific and Central Leagues. They&#8217;re ideal because, just like the American Major Leagues, one league uses the designated hitter and one does not. There are some differences &#8211; the talent pool is a bit smaller because of the lower population base that the leagues draw from, and there are only 6 teams in each league as opposed to MLB&#8217;s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' title='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} ' title='&#92;hat{HR} ' class='latex' /> is the predicted number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1 in 1955, and <em>DH</em> is a binary variable that takes value 1 if the league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#038;h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape that the MLB data showed, the Japanese data looks almost M-shaped with a maximum around 1984. (Why, I&#8217;m not sure &#8211; I&#8217;m not knowledgeable enough about Japanese baseball to know what might have caused that spike.) It reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that doesn&#8217;t. The unsurprising factor is the switching of signs for the <em>t</em> variables &#8211; we expected that based on the shape of the data. The surprising factor is that the designated hitter rule is insignificant. We can only be about 11% sure it&#8217;s significant. In addition, this model explains less of the variation than the MLB version &#8211; while that explained about 56% of the variation, the Japanese model has an <img src='http://s0.wp.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 ' title='R^2 ' class='latex' /> value of .4045, meaning it explains about 40% of the variation in home runs per game.</p>
<p>There&#8217;s a slightly interesting pattern to the residual home runs per game (<img src='http://s0.wp.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = &#92;hat{HR} - HR' title='Residual = &#92;hat{HR} - HR' class='latex' />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#038;h=82" alt="" width="150" height="82" /></a>it isn&#8217;t as pronounced, this data also shows a spike &#8211; but the spike is at <em>t</em>=55, so instead of showing up in 1995, the Japan leagues spiked around the early 2000s. Clearly the same effect is not in play, but why might the Japanese leagues see the same effect later than the MLB teams? It can&#8217;t be an expansion effect, since the Japanese leagues have stayed constant at 6 teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic (Breusch-Pagan test p-value .0796), so it might be better modeled using a generalized least squares formula, but doing so would have skewed the results of the replication.</p>
<p>In order to show that the parameters really are different, the appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow&#8217;s test for structural change</a>. To clean it up, I&#8217;m using only the data from 1960 on. (It&#8217;s quick and dirty, but it&#8217;ll do the job.) Chow&#8217;s test takes</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%5Csim%5C+F_%7Bk%2CN_1%2BN_2-2k%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' title='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=S_C+%3D+6.3666&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_C = 6.3666' title='S_C = 6.3666' class='latex' /> is the combined sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=S_1+%3D+1.2074&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_1 = 1.2074' title='S_1 = 1.2074' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_2+%3D+2.2983&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_2 = 2.2983' title='S_2 = 2.2983' class='latex' /> are the individual (i.e. MLB and Japan) sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=k%3D4&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='k=4' title='k=4' class='latex' /> is the number of parameters, and <img src='http://s0.wp.com/latex.php?latex=N_1+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_1 = 100' title='N_1 = 100' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=N_2+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_2 = 100' title='N_2 = 100' class='latex' /> are the number of observations in each group.</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%281.2074+%2B+2.2983%29%29%2F%284%29%7D%7B%28100%2B100%29%2F%28100%2B100-2%5Ctimes+4%29%7D+%5Csim%5C++F_%7B4%2C100%2B100-2+%5Ctimes+4%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' title='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%283.5057%29%29%2F%284%29%7D%7B%28200%29%2F%28192%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B2.8609%2F4%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B.7152%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=.6866+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='.6866 &#92;sim&#92;  F_{4,192}' title='.6866 &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p>The critical value for 90% significance at 4 and 192 degrees of freedom would be 1.974 according to <a href="http://www.stat.tamu.edu/~west/applets/fdemo.html">Texas A&amp;M&#8217;s F calculator</a>. That means we don&#8217;t have enough evidence that the parameters are different to treat them differently. This is probably an artifact of the small amount of data we have.</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:744px;width:1px;height:1px;overflow:hidden;">
<div class="snap_preview">
<p>In the previous post, I looked at the trend  in home runs per game in the Major Leagues and suggested that the  recent deviation from the increasing trend might have been due to the  development of strong farm systems like the Tampa Bay Rays’. That means  that if the same data analysis process is used on data in an otherwise  identical league, we should see similar trends but no dropoff around  1995. As usual, for replication purposes I’m going to use Japan’s Pro  Baseball leagues, the Pacific and Central Leagues. They’re ideal  because, just like the American Major Leagues, one league uses the  designated hitter and one does not. There are some differences – the  talent pool is a bit smaller because of the lower population base that  the leagues draw from, and there are only 6 teams in each league as  opposed to MLB’s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img class="latex" title="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911 \times  DH " /></p>
<p>where <img class="latex" title="\hat{HR} " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} " /> is the predicted  number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1  in 1954, and <em>DH</em> is a binary variable that takes value 1 if the  league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese  leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#038;h=82&#038;h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape  that the MLB data showed, the Japanese data looks almost M-shaped with a  maximum around 1984. (Why, I’m not sure – I’m not knowledgeable enough  about Japanese baseball to know what might have caused that spike.) It  reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got  these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that  doesn’t. The unsurprising factor is the switching of signs for the <em>t</em> variables – we expected that based on the shape of the data. The  surprising factor is that the designated hitter rule is insignificant.  We can only be about 11% sure it’s significant. In addition, this model  explains less of the variation than the MLB version – while that  explained about 56% of the variation, the Japanese model has an <img class="latex" title="R^2 " src="http://l.wordpress.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="R^2 " /> value of .4045, meaning it  explains about 40% of the variation in home runs per game.</p>
<p>There’s a slightly interesting pattern to the residual home runs per  game (<img class="latex" title="Residual = \hat{HR} - HR" src="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="Residual = \hat{HR} - HR" />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#038;h=82&#038;h=82" alt="" width="150" height="82" /></a>it isn’t as pronounced, this data  also shows a spike – but the spike is at <em>t</em>=55, so instead of  showing up in 1995, the Japan leagues spiked around the early 2000s.  Clearly the same effect is not in play, but why might the Japanese  leagues see the same effect later than the MLB teams? It can’t be an  expansion effect, since the Japanese leagues have stayed constant at 6  teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic  (Breusch-Pagan test p-value .0796), so it might be better modeled using a  generalized least squares formula, but doing so would have skewed the  results of the replication.</p>
<p>In order to show that the parameters really are different, the  appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow’s  test for structural change</a>. To clean it up, I’m using only the data  from 1960 on. (It’s quick and dirty, but it’ll do the job.) Chow’s test  takes</p>
<p><img class="latex" title="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" src="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" /></p>
</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/335/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=335&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} </media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=R%5E2+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">R^2 </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">Residual = \hat{HR} - HR</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F</media:title>
		</media:content>
	</item>
		<item>
		<title>Back when it was hard to hit 55&#8230;</title>
		<link>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/</link>
		<comments>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:06:05 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[sabermetrics]]></category>
		<category><![CDATA[Stuff Keith Hernandez Says]]></category>
		<category><![CDATA[talent pool dilution]]></category>
		<category><![CDATA[Willie Mays]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=319</guid>
		<description><![CDATA[Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about Willie Mays for some reason, and said that Mays hit 55 home [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=319&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about <a href="http://www.baseball-reference.com/players/m/mayswi01.shtml">Willie Mays </a>for some reason, and said that Mays hit 55 home runs &#8220;back when it was hard to hit 55.&#8221; Keith coyly said that, while it was easy for a while, it was &#8220;getting hard again,&#8221; at which point he abruptly stopped talking.</p>
<p>Keith&#8217;s unusual candor about drug use and Mays&#8217; career best of 52 home runs aside, this pinged my &#8220;Stuff Keith Hernandez Says&#8221; meter. After accounting for any time trend and other factors that might explain home run hitting, is there an upward trend? If so, is there a pattern to the remaining home runs?</p>
<p>The first step is to examine the data to see if there appears to be any trend. Just looking at it, there appears to be a messy U shape with a minimum around t=20, which indicates a quadratic trend. That means I want to include a term for time and a term for time squared.<a href="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg"><img class="alignright size-thumbnail  wp-image-325" title="homerunspergame" src="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150&#038;h=102" alt="" width="150" height="102" /></a></p>
<p>Using the per-game averages for home runs from 1955 to 2009, I detrended the data using t=1 in 1955. I also had to correct for the effect of the designated hitter. That gives us an equation of the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+%5Chat%7B%5Cbeta_%7B0%7D%7D+%2B+%5Chat%7B%5Cbeta_%7B1%7D%7Dt+%2B+%5Chat%7B%5Cbeta_%7B2%7D%7D+t%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta_%7B3%7D%7D+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' title='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' class='latex' /></p>
<p>The results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.957</td>
<td align="right">0.0328</td>
<td align="right">29.189</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">-0.0188</td>
<td align="right">0.0028</td>
<td align="right">-6.738</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">0.0004</td>
<td align="right">0.00005</td>
<td align="right">8.599</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0911</td>
<td align="right">0.0246</td>
<td align="right">3.706</td>
<td align="right">0.0003</td>
<td align="right">0.9997</td>
</tr>
</tbody>
</table>
<p>We can see that there&#8217;s an upward quadratic trend in predicted home runs that together with the DH rule account for about 56% of the variation in the number of home runs per game in a season (<img src='http://s0.wp.com/latex.php?latex=R%5E2+%3D+.5618&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 = .5618' title='R^2 = .5618' class='latex' />). The Breusch-Pagan test has a p-value of .1610, indicating a possibility of mild homoskedasticity but nothing we should get concerned about.</p>
<p>Then, I needed to look at the difference between the predicted number of home runs per game and the actual number of home runs per game, which is accessible by subtracting</p>
<p><img src='http://s0.wp.com/latex.php?latex=Residual+%3D+HR+-+%5Chat%7BHR%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = HR - &#92;hat{HR}' title='Residual = HR - &#92;hat{HR}' class='latex' /></p>
<p>This represents the &#8220;abnormal&#8221; number of home runs per year. The question then becomes, &#8220;Is there a patt<a href="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg"><img class="alignright size-thumbnail  wp-image-331" title="homerunresiduals" src="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150&#038;h=102" alt="" width="150" height="102" /></a>ern to the number of abnormal home runs?&#8221;  There are two ways to answer this. The first way is to look at the abnormal home runs. Up until about t=40 (the mid-1990s), the abnormal home runs are pretty much scattershot above and below 0. However, at t=40, the residual jumps up for both leagues and then begins a downward trend. It&#8217;s not clear what the cause of this is, but the knee-jerk reaction is that there might be a drug use effect. On the other hand, there are a couple of other explanations.</p>
<p>The most obvious is a boring old expansion effect. In 1993, the National League added two teams (the Marlins and the Rockies), and in 1998 each league added a team (the AL&#8217;s Rays and the NL&#8217;s Diamondbacks). Talent pool dilution has shown up in our discussion of hit batsmen, and I believe that it can be a real effect. It would be mitigated over time, however, by the establishment and development of farm systems, in particular strong systems like the one that&#8217;s producing good, cheap talent for the Rays.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/319/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=319&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150" medium="image">
			<media:title type="html">homerunspergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150" medium="image">
			<media:title type="html">homerunresiduals</media:title>
		</media:content>
	</item>
		<item>
		<title>How often should Youk take his base?</title>
		<link>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/</link>
		<comments>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 14:55:00 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[binomial distribution]]></category>
		<category><![CDATA[Brett Carroll]]></category>
		<category><![CDATA[Greek God of Take Your Base]]></category>
		<category><![CDATA[hit batsmen]]></category>
		<category><![CDATA[hit by pitch]]></category>
		<category><![CDATA[Kevin Youkilis]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=286</guid>
		<description><![CDATA[Kevin Youkilis is sometimes called &#8220;The Greek God of Walks.&#8221; I prefer to think of him as &#8220;The Greek God of Take Your Base,&#8221; since he seems to get hit by pitches at an alarming rate. In fact, this year, he&#8217;s been hit 7 times in 313 plate appearances. (Rickie Weeks, however, is leading the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=286&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong><a href="http://www.baseball-reference.com/players/y/youklke01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Kevin  Youkilis</a></strong> is sometimes called &#8220;The Greek God of Walks.&#8221; I prefer to think of him as &#8220;The Greek God of Take Your Base,&#8221; since he seems to get hit by pitches at an alarming rate. In fact, this year, he&#8217;s been hit 7 times in 313 plate appearances. (<strong><a href="http://www.baseball-reference.com/players/w/weeksri01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Rickie  Weeks</a></strong>, however, is leading the pack with 13 in 362 plate appearances. We&#8217;ll look at him, too.) There are three explanations for this:</p>
<ol>
<li>There&#8217;s something about Youk&#8217;s batting or his hitting stance that causes him to be hit. This is my preferred explanation. Youkilis has an unusual batting grip that thrusts his lead elbow over the plate, and as he swings, he lunges forward, which exposes him to being plunked more often.</li>
<li>Youkilis is such a hitting machine that the gets hit often in order to keep him from swinging for the fences. This doesn&#8217;t hold water, to me. A pitcher could just as easily put him on base safely with an intentional walk, so unless there&#8217;s some other incentive to hit him, there&#8217;s no reason to risk ejection by throwing at Youkilis. This leads directly to&#8230;</li>
<li><a href="http://www.bareknucks.com/kevin-youkilis-is-everyones-favorite-beaning-target-because-hes-a-hitting-machine-no-because-he-bats-like-a-tool">Youk is a jerk</a>. This is pretty self-explanatory, and is probably a factor.</li>
</ol>
<p>First of all, we need to figure out whether it&#8217;s likely that Kevin is being hit by chance. To figure that out, we need to make some assumptions about hit batsmen and evaluate them using the <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>. I&#8217;m also excited to point out that Youk has been overtaken as the Greek God of Take Your Base by someone new: <strong><a href="http://www.baseball-reference.com/players/c/carrobr01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Brett  Carroll</a></strong>.<span id="more-286"></span></p>
<p>I&#8217;m going to assume that the rate of hit batsmen is constant over time. This assumption is probably justified, since the number of hit batsmen per team per American League game has stayed between .21 and .25 since 1996, and the number of plate appearances per team per game has stayed between 33.98 and 34.9 over the same time period. Based on that, I feel justified in using the 2009 hit batsman rate to evaluate Youkilis&#8217;s stats this year. It&#8217;s undesirable to use this year&#8217;s rates if 2009&#8242;s will fit, since this year has a much smaller number of occurrences. Since a number of players with only a few at-bats might distort the average, I limited my sample to only <a href="http://bbref.com/pi/shareit/fhbK6">players with 50 plate appearances or more</a>, then divided the total number of HBP by the total number of plate appearances and got .00859. (For the record, the sample of all players with at least one plate appearance had a rate of .00850.)</p>
<p>I&#8217;m also going to assume that occurrences of hit batsmen are binomially distributed. That is, they occur at a known rate, which is equivalent to the rate of hit batsmen in 2009, and that every individual hit-by-pitch is independent of all others. (I might have to relax this assumption later, but it&#8217;s good for a first approximation.) As a result, the probability of being hit by a pitch <em>k</em> times in <em>n</em> plate appearances with a known rate of <em>p</em> is</p>
<p><img src='http://s0.wp.com/latex.php?latex=f%28k%3Bn%2Cp%29+%3D+%7Bn%5Cchoose+k%7Dp%5Ek%281-p%29%5E%7Bn-k%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='f(k;n,p) = {n&#92;choose k}p^k(1-p)^{n-k}' title='f(k;n,p) = {n&#92;choose k}p^k(1-p)^{n-k}' class='latex' /></p>
<p>where</p>
<p><img src='http://s0.wp.com/latex.php?latex=%7Bn%5Cchoose+k%7D%3D%5Cfrac%7Bn%21%7D%7Bk%21%28n-k%29%21%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='{n&#92;choose k}=&#92;frac{n!}{k!(n-k)!}' title='{n&#92;choose k}=&#92;frac{n!}{k!(n-k)!}' class='latex' /></p>
<p>Using R, I estimated a binomial distribution using <em>n</em>=313 plate appearances, <em>k</em>=1,2,..,10, and <em>p</em>=.00859 to determine the probability that he&#8217;d be hit <em>k</em> times. My results are:</p>
<table border="0" cellspacing="0" cellpadding="0" width="192">
<col span="3" width="64"></col>
<tbody>
<tr>
<td width="64" height="20">HBP</td>
<td width="64">p(HBP)</td>
<td width="64">Total</td>
</tr>
<tr>
<td height="19">0</td>
<td>0.06721</td>
<td>0.06721</td>
</tr>
<tr>
<td height="20">1</td>
<td>0.18225</td>
<td>0.24946</td>
</tr>
<tr>
<td height="20">2</td>
<td>0.2463</td>
<td>0.49576</td>
</tr>
<tr>
<td height="20">3</td>
<td>0.2212</td>
<td>0.71696</td>
</tr>
<tr>
<td height="20">4</td>
<td>0.14851</td>
<td>0.86547</td>
</tr>
<tr>
<td height="20">5</td>
<td>0.07951</td>
<td>0.94498</td>
</tr>
<tr>
<td height="20">6</td>
<td>0.03536</td>
<td>0.98034</td>
</tr>
<tr>
<td height="20">7</td>
<td>0.01344</td>
<td>0.99378</td>
</tr>
<tr>
<td height="20">8</td>
<td>0.00445</td>
<td>0.99823</td>
</tr>
<tr>
<td height="20">9</td>
<td>0.0013</td>
<td>0.99953</td>
</tr>
<tr>
<td height="20">10</td>
<td>0.00034</td>
<td>0.99987</td>
</tr>
</tbody>
</table>
<p>If Youkilis is a normal hitter, then it&#8217;s 98% likely that Youkilis would be hit less than seven times. It&#8217;s very unlikely that in those 313 plate appearances he&#8217;d be hit by chance alone 7 times.</p>
<p>Youkilis has company, though: the aforementioned Rickie Weeks, who&#8217;s been hit 13 times in 362 plate appearances. I re-estimated the distribution using <em>k</em>=1,2,&#8230;,15, <em>n</em>=362, <em>p</em>=.00859 and got the following results:</p>
<table style="border-collapse:collapse;width:183pt;" border="0" cellspacing="0" cellpadding="0" width="244">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:76pt;" width="101"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:76pt;" width="101">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.404E-02</td>
<td class="xl65">0.04404</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">1.381E-01</td>
<td class="xl65">0.18216</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">2.160E-01</td>
<td class="xl65">0.39815</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">2.245E-01</td>
<td class="xl65">0.62269</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">1.746E-01</td>
<td class="xl65">0.79727</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">1.083E-01</td>
<td class="xl65">0.90556</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">5.582E-02</td>
<td class="xl65">0.96138</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">2.459E-02</td>
<td class="xl65">0.98597</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">9.450E-03</td>
<td class="xl65">0.99542</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">3.220E-03</td>
<td class="xl65">0.99864</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">9.900E-04</td>
<td class="xl65">0.99963</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">11</td>
<td class="xl66">2.731E-04</td>
<td class="xl65">0.999903127</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">12</td>
<td class="xl66">6.921E-05</td>
<td class="xl65">0.999972337</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">13</td>
<td class="xl66">1.614E-05</td>
<td class="xl65">0.99998848</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">14</td>
<td class="xl66">3.486E-06</td>
<td class="xl65">0.999991966</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">15</td>
<td class="xl66">7.007E-07</td>
<td class="xl65">0.999992667</td>
</tr>
</tbody>
</table>
<p>It&#8217;s almost impossible for Weeks to have been hit that much. Again, he&#8217;s 95% or more likely to have been hit six times or fewer, and there&#8217;s a whopping 99.99885% chance that if he&#8217;s an average hitter he&#8217;d be hit less than he has this season in as many plate appearances.</p>
<p>The king of hit batsmen, though, and the new Greek God of Take Your Base, is Florida Marlins pinch hitter and outfielder Brett Carroll. In 90 plate appearances this year, he&#8217;s been hit seven times! That&#8217;s as much as Youkilis, but far more efficient &#8211; he required less than one-third of the plate appearances to achieve the same number of plunks. Using his 90 plate appearances and <em>k</em>=1,2,..10, Carroll&#8217;s distribution is below:</p>
<table style="border-collapse:collapse;width:183pt;" border="0" cellspacing="0" cellpadding="0" width="244">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:76pt;" width="101"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:76pt;" width="101">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.60E-01</td>
<td class="xl65">0.4600902</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">3.59E-01</td>
<td class="xl65">0.8188182</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">1.38E-01</td>
<td class="xl65">0.9571132</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">3.51E-02</td>
<td class="xl65">0.9922562</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">6.62E-03</td>
<td class="xl65">0.99887814</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">9.87E-04</td>
<td class="xl65">0.99986485</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">1.21E-04</td>
<td class="xl65">0.99998594</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">1.26E-05</td>
<td class="xl65">0.99999852</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">1.13E-06</td>
<td class="xl65">0.999999652</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">8.93E-08</td>
<td class="xl65">0.999999741</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">6.27E-09</td>
<td class="xl65">0.999999747</td>
</tr>
</tbody>
</table>
<p>Carroll, in 90 plate appearances, should have been hit less than <strong>twice</strong>. His rate &#8211; .078 times hit by pitch per plate appearance &#8211; is more than <strong>nine times</strong> the league&#8217;s rate. Ascend Mount Olympus, Brett, and work on getting out of the way more often.</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:972px;width:1px;height:1px;overflow:hidden;">
<table style="border-collapse:collapse;width:155pt;" border="0" cellspacing="0" cellpadding="0" width="207">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:48pt;" width="64"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:48pt;" width="64">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.40400E-02</td>
<td class="xl65">0.04404</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">1.38120E-01</td>
<td class="xl65">0.18216</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">2.15990E-01</td>
<td class="xl65">0.39815</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">2.24540E-01</td>
<td class="xl65">0.62269</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">1.74580E-01</td>
<td class="xl65">0.79727</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">1.08290E-01</td>
<td class="xl65">0.90556</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">5.58200E-02</td>
<td class="xl65">0.96138</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">2.45900E-02</td>
<td class="xl65">0.98597</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">9.45000E-03</td>
<td class="xl65">0.99542</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">3.22000E-03</td>
<td class="xl65">0.99864</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">9.90000E-04</td>
<td class="xl65">0.99963</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">11</td>
<td class="xl67" align="right">2.73127E-04</td>
<td class="xl65">0.999903</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">12</td>
<td class="xl67" align="right">6.92103E-05</td>
<td class="xl65">0.999972</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">13</td>
<td class="xl67" align="right">1.61427E-05</td>
<td class="xl65">0.999988</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">14</td>
<td class="xl67" align="right">3.48620E-06</td>
<td class="xl65">0.999992</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">15</td>
<td class="xl67" align="right">7.00680E-07</td>
<td class="xl65">0.999993</td>
</tr>
</tbody>
</table>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/286/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=286&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>What is the effect of the Designated Hitter?</title>
		<link>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/</link>
		<comments>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/#comments</comments>
		<pubDate>Sun, 30 May 2010 22:36:37 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[designated hitter]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=95</guid>
		<description><![CDATA[Intuitively, the designated hitter rule seems like it should increase scoring. By getting on base more often than the pitcher would have, the designated hitter helps produce runs by hitting, by being on base so that other players can drive him in, and by not accumulating outs by bunting or striking out as often as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=95&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Intuitively, the designated hitter rule seems like it should increase scoring. By getting on base more often than the pitcher would have, the designated hitter helps produce runs by hitting, by being on base so that other players can drive him in, and by not accumulating outs by bunting or striking out as often as the pitcher does. However, there should be a corresponding effect from having pitchers left in the game longer: a better pitcher who remains in the game might get more outs than a reliever who came in simply because the manager pinch-hit for the starting pitcher because he needed offense.</p>
<p>Behind the cut, I&#8217;ll explain the testing I did to determine whether the effect of a DH is positive (hint: it is) and look at how big an effect is actually there.</p>
<p><span id="more-95"></span>MLB is the perfect setting for natural experiments about the DH rule for obvious reasons &#8211; the American League uses it, the National League doesn&#8217;t, and the talent pool is exactly the same. There are very few restrictions on player transfers between the leagues, so players are probably as good as randomly assigned to the leagues. With that in mind, if there is a difference between the leagues, then it can probably be attributed to the DH rule.</p>
<p>Using <a href="http://www.baseball-reference.com/">Baseball-Reference.com</a>, I pulled <a href="http://tomflesher.files.wordpress.com/2010/05/mlb19552009.doc">this dataset</a> of batting by league from both leagues from1955 on (with 1955 chosen because it&#8217;s the first year that all of B-R.com&#8217;s data was available). I changed Year to t and subtracted 1954 so that I could do a trend analysis and added a binary variable called &#8220;DH&#8221; that took value 1 if the Designated Hitter rule was used and 0 otherwise. Assuming the leagues are otherwise identical, my null hypothesis is that <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%28DH%29+%3D+0&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;beta(DH) = 0' title='&#92;beta(DH) = 0' class='latex' />; that is, the effect of the DH rule is nonexistent.</p>
<p>I used <a href="http://cran.r-project.org">R</a> to run the following regression on the data:</p>
<p><img src='http://s0.wp.com/latex.php?latex=OBP+%3D+t+%2B+t%5E2+%2B+DH&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='OBP = t + t^2 + DH' title='OBP = t + t^2 + DH' class='latex' /></p>
<p>and got the following results:</p>
<p>Call:<br />
lm(formula = OBP ~ t + tsq + DH)</p>
<p>Residuals:<br />
Min         1Q     Median         3Q        Max</p>
<p>-0.0219984 -0.0041721  0.0003126  0.0048915  0.0187776</p>
<table border="0" cellspacing="0" cellpadding="0" width="421">
<tbody>
<tr>
<td colspan="2" width="149" height="20">Coefficients:</td>
<td width="77"></td>
<td width="74"></td>
<td width="121"></td>
</tr>
<tr>
<td height="20"></td>
<td>Estimate</td>
<td>Std. Error</td>
<td>t value</td>
<td>Pr(&gt;|t|)</td>
</tr>
<tr>
<td height="20">(Intercept)</td>
<td>0.323100</td>
<td>0.002243</td>
<td>144.055</td>
<td>&lt; 2e-16 ***</td>
</tr>
<tr>
<td height="20">t</td>
<td>-0.000470</td>
<td>0.000188</td>
<td>-2.503</td>
<td>0.013827 *</td>
</tr>
<tr>
<td height="20">tsq</td>
<td>0.000013</td>
<td>0.000003</td>
<td>4.039</td>
<td>0.000101 ***</td>
</tr>
<tr>
<td height="20">DH</td>
<td>0.008036</td>
<td>0.001677</td>
<td>4.793</td>
<td>5.27e-06 ***</td>
</tr>
</tbody>
</table>
<p>The *** suffix indicates significance at the 99% level. A Breusch-Pagan test for heteroskedasticity returned a BP stat of 3.0789 and a p-value of .3796, which means we cannot reject the null hypothesis of homoskedasticity (that is, the tests work for this data).</p>
<p>Across MLB, OBP is increasing with time, and the DH rule adds roughly .008 to the league&#8217;s average OBP after accounting for an increasing time trend in OBP. .008 is roughly .8%, meaning you&#8217;d get slightly less than one additional trip to first in 100 plate appearances. Assuming a leaguewide mean of 38.5 plate appearances per team per game, that translates to about .3 extra trips to first per game.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/95/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=95&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/05/30/what-is-the-effect-of-the-designated-hitter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Cy Young gives me a headache.</title>
		<link>http://tomflesher.com/2010/01/15/cy-young-gives-me-a-headache/</link>
		<comments>http://tomflesher.com/2010/01/15/cy-young-gives-me-a-headache/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 17:01:29 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Bill James]]></category>
		<category><![CDATA[Cy Young predictor]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Eric Gagne]]></category>
		<category><![CDATA[linear regression]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rob Neyer]]></category>
		<category><![CDATA[sabermetrics]]></category>
		<category><![CDATA[Tim Lincecum]]></category>
		<category><![CDATA[Weighted saves]]></category>
		<category><![CDATA[Weighted shutouts]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=71</guid>
		<description><![CDATA[As usual, I&#8217;ve started my yearly struggle against a Cy Young predictor. Bill James and Rob Neyer&#8217;s predictor (which I&#8217;ve preserved for posterity here) did a pretty poor job this year, having predicted the wrong winner in both leagues and even getting the order very wrong compared to the actual results. Inside, I&#8217;d like to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=71&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As usual, I&#8217;ve started my yearly struggle against a Cy Young predictor. Bill James and Rob Neyer&#8217;s <a title="ESPN.com" href="http://espn.go.com/mlb/features/cyyoung">predictor</a> (which I&#8217;ve preserved for posterity <a href="http://tomflesher.com/docs/CyPredictor.pdf">here</a>) did a pretty poor job this year, having predicted the wrong winner in both leagues and even getting the order very wrong compared to the <a href="http://www.baseball-reference.com/awards/awards_2009.shtml#ALcya">actual results</a>. Inside, I&#8217;d like to share some of my pain, since I can&#8217;t seem to do much better.</p>
<p><span id="more-71"></span></p>
<p>I&#8217;m using a <a href="http://tomflesher.com/docs/pitchers0509.txt">dataset</a> I culled from baseball-reference.com&#8217;s <a href="http://www.baseball-reference.com/play-index/">Play Index</a> to which I added Cy Young points for each year, as well as a number of binary variables for team division wins, team wildcard appearances, and so on. It includes every player who pitched from the 2005 through 2009 seasons, all told about 3000 observations. Using <a href="http://cran.r-project.org/">R</a>, I tried a number of linear regression models to test their veracity.</p>
<p>First, I tried a variation of the James/Neyer formula, CYP = ((5*IP/9)-ER) + (SO/12) + (SV*2.5) + Shutouts + ((W*6)-(L*2)) + VB. I included IP, ER, SO, SV, SHO, W, L, and VB and got this result:</p>
<p><em>Call:<br />
lm(formula = model &lt;- cypoints ~ IP + ER + SO + SV + SHO + W +<br />
L + VB)</em></p>
<p><em>Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-31.2641  -1.4715   0.1084   0.9949 144.4079</em></p>
<p><em>Coefficients:<br />
Estimate Std. Error t value Pr(&gt;|t|)<br />
(Intercept) -0.1057887  0.2341857  -0.452    0.651<br />
IP           0.0080245  0.0136774   0.587    0.557<br />
ER          -0.0960892  0.0184517  -5.208 2.03e-07 ***<br />
SO           0.0483835  0.0090107   5.370 8.45e-08 ***<br />
SV           0.0001499  0.0218261   0.007    0.995<br />
SHO          5.5749651  0.4340868  12.843  &lt; 2e-16 ***<br />
W            0.5653568  0.0899062   6.288 3.64e-10 ***<br />
L           -0.3987691  0.0901410  -4.424 1.00e-05 ***<br />
VB          -0.0191531  0.3781868  -0.051    0.960<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</em></p>
<p><em>Residual standard error: 7.977 on 3213 degrees of freedom<br />
Multiple R-squared: 0.1952,     Adjusted R-squared: 0.1932<br />
F-statistic: 97.43 on 8 and 3213 DF,  p-value: &lt; 2.2e-16</em></p>
<p>This isn&#8217;t promising. Over the past five years, these factors aren&#8217;t very predictive at all &#8211; the model explains only about 19% of the variation in voting; innings pitched, saves, and the victory bonus aren&#8217;t statistically significant, and the victory bonus has a negative effect. The caveat, of course, is that James and Neyer aren&#8217;t predicting <em>actual</em> Cy Young voting points but rather a statistical construct that shows the relative likelihood that a given pitcher will receive the Cy. I&#8217;m predicting actual Cy Young points. Still, the effects should be similar.</p>
<p>In fact, the model grossly overestimates the proclivity of Cy Young voters for choosing relievers. A pitcher with Saves as his primary statistic hasn&#8217;t been given the Cy since Eric Gagne in 2003. This is a double-edged sword &#8211; on the one hand, saves have apparently been historically significant for the Cy, but on the other hand, the voting appears to be trending away from them. The five-year time set I used is a compromise to get enough data without compromising the trend.</p>
<p>After playing with R for a little while, I ended up creating a few extra measures that seem to capture the voting a little bit better (but not much). First, to approximate the relief effect, I created a &#8220;weighted saves&#8221; statistic that multiplies SV*GF and then takes the square root. To maximize the stat for a given number of games finished, all of those games would be saves. (Every save is a game finished, by definition.) Thus, it helps show that the pitcher was relied on as a clutch player. I did the same thing for Complete Games and Shutouts &#8211; weighted shutouts is the square root of CG*SHO. Again, to maximize this, every complete game should be a shutout. It ends up being far more predictive than CG or SHO alone. Finally, to capture the added value of each marginal win and marginal strikeout and the added penalty for each marginal home run and marginal walk, I included the squares of those terms. I also tried a dummy variable for previous year winner, since Lincecum&#8217;s so-so predicted points must have been bumped up by something.</p>
<p>After playing with the stats with parsimony in mind, I came up with a number of models, the best of which is:</p>
<p><em>Call:<br />
lm(formula = model &lt;- cypoints ~ W + Wsq + HR + HRsq + K + Ksq +<br />
BB + BBsq + weightedsv + weightedsho)</em></p>
<p><em>Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-40.7374  -1.0710  -0.1198   1.1044 122.7243</em></p>
<p><em>Coefficients:<br />
Estimate Std. Error t value Pr(&gt;|t|)<br />
(Intercept)  1.995e-03  2.795e-01   0.007   0.9943<br />
W           -1.295e+00  1.315e-01  -9.844  &lt; 2e-16 ***<br />
Wsq          1.260e-01  7.371e-03  17.091  &lt; 2e-16 ***<br />
HR           1.807e-01  7.286e-02   2.480   0.0132 *<br />
HRsq        -1.499e-02  2.143e-03  -6.996 3.19e-12 ***<br />
K           -8.473e-02  1.642e-02  -5.161 2.61e-07 ***<br />
Ksq          5.972e-04  6.734e-05   8.869  &lt; 2e-16 ***<br />
BB           2.292e-01  3.143e-02   7.292 3.82e-13 ***<br />
BBsq        -2.826e-03  3.041e-04  -9.295  &lt; 2e-16 ***<br />
weightedsv   7.411e-02  1.652e-02   4.487 7.49e-06 ***<br />
weightedsho  2.443e+00  3.252e-01   7.513 7.43e-14 ***<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</em></p>
<p><em>Residual standard error: 7.245 on 3211 degrees of freedom<br />
Multiple R-squared: 0.3367,     Adjusted R-squared: 0.3346<br />
F-statistic:   163 on 10 and 3211 DF,  p-value: &lt; 2.2e-16</em></p>
<p>It&#8217;s not a great predictor, explaining only about 33% of the variation in points. However, all of the regressors are statistically significant at at leas the 99% level. Some of the other models I tried are <a href="http://tomflesher.com/docs/cymodels2009.txt">here</a>, so you can get an idea of how significant or insignificant other stats might have been at predicting the Cy Young winner.</p>
<p>The long and the short of it is, there appears to be very little predictive value for the Cy Young voting with respect to common statistical measures.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/71/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=71&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/01/15/cy-young-gives-me-a-headache/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
	</channel>
</rss>
