<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Tom Flesher &#187; Baseball</title>
	<atom:link href="http://tomflesher.com/tag/baseball/feed/" rel="self" type="application/rss+xml" />
	<link>http://tomflesher.com</link>
	<description>Mercenary Educator and Bad Economist</description>
	<lastBuildDate>Mon, 14 Mar 2011 03:02:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='tomflesher.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Tom Flesher &#187; Baseball</title>
		<link>http://tomflesher.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://tomflesher.com/osd.xml" title="Tom Flesher" />
	<atom:link rel='hub' href='http://tomflesher.com/?pushpress=hub'/>
		<item>
		<title>Home Run Derby: Does it ruin swings?</title>
		<link>http://tomflesher.com/2010/12/15/home-run-derby-does-it-ruin-swings/</link>
		<comments>http://tomflesher.com/2010/12/15/home-run-derby-does-it-ruin-swings/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 17:26:12 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Chris Young]]></category>
		<category><![CDATA[Corey Hart]]></category>
		<category><![CDATA[David Ortiz]]></category>
		<category><![CDATA[Hanley Ramirez]]></category>
		<category><![CDATA[home run derby]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[Matt Holliday]]></category>
		<category><![CDATA[Miguel Cabrera]]></category>
		<category><![CDATA[Nick Swisher]]></category>
		<category><![CDATA[Vernon Wells]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=449</guid>
		<description><![CDATA[Earlier this year, there was a lot of discussion about the alleged home run derby curse. This post by Andy on Baseball-Reference.com asked if the Home Run Derby is bad for baseball, and this Hardball Times piece agrees with him that it is not. The standard explanation involves selection bias &#8211; sure, players tend to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=449&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Earlier this year, there was a lot of discussion about the alleged home run derby curse. <a href="http://www.baseball-reference.com/blog/archives/7188">This post</a> by Andy on Baseball-Reference.com asked if the Home Run Derby is bad for baseball, and <a href="http://www.hardballtimes.com/main/fantasy/article/do-hitters-decline-after-the-home-run-derby/">this Hardball Times piece</a> agrees with him that it is not. The standard explanation involves selection bias &#8211; sure, players tend to hit fewer home runs in the second half after they hit in the Derby, but that&#8217;s because the people who hit in the Derby get invited to do so because they had an abnormally high number of home runs in the first half.</p>
<p>Though this deserves a much more thorough macro-level treatment, let&#8217;s just take a look at the density of home runs in either half of the season for each player who participated in the Home Run Derby. Those players include <a href="http://www.baseball-reference.com/players/o/ortizda01.shtml">David Ortiz</a>, <a href="http://www.baseball-reference.com/players/r/ramirha01.shtml">Hanley Ramirez</a>, <a href="http://www.baseball-reference.com/players/y/youngch04.shtml">Chris Young</a>, <a href="http://www.baseball-reference.com/players/s/swishni01.shtml">Nick Swisher</a>, <a href="http://www.baseball-reference.com/players/h/hartco01.shtml">Corey Hart</a>, <a href="http://www.baseball-reference.com/players/c/cabremi01.shtml">Miguel Cabrera</a>, <a href="http://www.baseball-reference.com/players/h/hollima01.shtml">Matt Holliday</a>, and <a href="http://www.baseball-reference.com/players/w/wellsve01.shtml">Vernon Wells</a>.</p>
<p>For each player, plus <a href="http://www.baseball-reference.com/players/c/canoro01.shtml">Robinson Cano</a> (who was of interest to Andy in the Baseball-Reference.com post), I took the percentage of games before the Derby and compared it with the percentage of home runs before the Derby. If the Ruined Swing theory holds, then we&#8217;d expect</p>
<p><img src='http://s0.wp.com/latex.php?latex=g%28HR%29+%5Cequiv+HR_%7Bbefore%7D%2FHR_%7BSeason%7D+%3E+g%28Games%29+%5Cequiv+Games_%7Bbefore%7D%2F162&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='g(HR) &#92;equiv HR_{before}/HR_{Season} &gt; g(Games) &#92;equiv Games_{before}/162' title='g(HR) &#92;equiv HR_{before}/HR_{Season} &gt; g(Games) &#92;equiv Games_{before}/162' class='latex' /></p>
<p>The table below shows that in almost every case, including Cano (who did not participate), the density of home runs in the pre-Derby games was much higher than the post-Derby games.</p>
<table border="0" cellspacing="0" cellpadding="0" width="414">
<col width="64"></col>
<col width="81"></col>
<col width="77"></col>
<col span="3" width="64"></col>
<tbody>
<tr>
<td width="64" height="20">Player</td>
<td width="81">HR Before</td>
<td width="77">HR Total</td>
<td width="64">g(Games)</td>
<td width="64">g(HR)</td>
<td width="64">Diff</td>
</tr>
<tr>
<td height="20">Ortiz</td>
<td>18</td>
<td>32</td>
<td>0.54321</td>
<td>0.5625</td>
<td>0.01929</td>
</tr>
<tr>
<td height="20">Hanley</td>
<td>13</td>
<td>21</td>
<td>0.54321</td>
<td>0.619048</td>
<td>0.075838</td>
</tr>
<tr>
<td height="20">Swisher</td>
<td>15</td>
<td>29</td>
<td>0.537037</td>
<td>0.517241</td>
<td>-0.0198</td>
</tr>
<tr>
<td height="20">Wells</td>
<td>19</td>
<td>31</td>
<td>0.549383</td>
<td>0.612903</td>
<td>0.063521</td>
</tr>
<tr>
<td height="20">Holliday</td>
<td>16</td>
<td>28</td>
<td>0.54321</td>
<td>0.571429</td>
<td>0.028219</td>
</tr>
<tr>
<td height="20">Hart</td>
<td>21</td>
<td>31</td>
<td>0.549383</td>
<td>0.677419</td>
<td>0.128037</td>
</tr>
<tr>
<td height="20">Cabrera</td>
<td>22</td>
<td>38</td>
<td>0.530864</td>
<td>0.578947</td>
<td>0.048083</td>
</tr>
<tr>
<td height="20">Young</td>
<td>15</td>
<td>27</td>
<td>0.549383</td>
<td>0.555556</td>
<td>0.006173</td>
</tr>
<tr>
<td height="20">Cano</td>
<td>16</td>
<td>29</td>
<td>0.537037</td>
<td>0.551724</td>
<td>0.014687</td>
</tr>
</tbody>
</table>
<p>Is this evidence that the Derby causes home run percentages to drop off? Certainly not. There are some caveats:</p>
<ul>
<li>This should be normalized based on games the player played, instead of team games.</li>
<li>It would probably even be better to look at a home run per plate appearance rate instead.</li>
<li>It could stand to be corrected for deviation from the mean to explain selection bias.</li>
<li>Cano&#8217;s numbers are almost identical to Swisher&#8217;s. They play for the same team. If there was an effect to be seen, it would probably show up here, and it doesn&#8217;t.</li>
</ul>
<p>Once finals are up, I&#8217;ll dig into this a little more deeply.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/449/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/449/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/449/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=449&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/12/15/home-run-derby-does-it-ruin-swings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Burnett, Hughes, and Playoff Rotations</title>
		<link>http://tomflesher.com/2010/10/12/burnett-hughes-and-playoff-rotations/</link>
		<comments>http://tomflesher.com/2010/10/12/burnett-hughes-and-playoff-rotations/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 17:19:32 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[A.J. Burnett]]></category>
		<category><![CDATA[ALCS]]></category>
		<category><![CDATA[ALDS]]></category>
		<category><![CDATA[Andy Pettitte]]></category>
		<category><![CDATA[CC Sabathia]]></category>
		<category><![CDATA[Dustin Moseley]]></category>
		<category><![CDATA[Javier Vazquez]]></category>
		<category><![CDATA[Joe Girardi]]></category>
		<category><![CDATA[Phil Hughes]]></category>
		<category><![CDATA[playoffs]]></category>
		<category><![CDATA[rotations]]></category>
		<category><![CDATA[world series]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=435</guid>
		<description><![CDATA[There was much discussion of the Yankees&#8217; specialized playoff rotation for the American League Division Series. As is conventional in the ALDS, Joe Girardi went with a three-man rotation. CC Sabathia and Andy Pettitte were locks; the third starter could have been A.J. Burnett, Javier Vazquez, or Dustin Moseley. Girardi went with young All-Star Phil [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=435&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There was <a href="http://waswatching.com/2010/10/05/burnett-out-of-alds-rotation-pettitte-goes-in-game-2/">much discussion</a> of the Yankees&#8217; specialized playoff rotation for the American League Division Series. As is conventional in the ALDS, Joe Girardi went with a three-man rotation. <strong><a href="http://www.baseball-reference.com/players/s/sabatc.01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">CC  Sabathia</a></strong> and <strong><a href="http://www.baseball-reference.com/players/p/pettian01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Andy  Pettitte</a></strong> were locks; the third starter could have been <strong><a href="http://www.baseball-reference.com/players/b/burnea.01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">A.J.  Burnett</a></strong>, <strong><a href="http://www.baseball-reference.com/players/v/vazquja01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Javier  Vazquez</a></strong>, or <strong><a href="http://www.baseball-reference.com/players/m/moseldu01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Dustin  Moseley</a></strong>. Girardi went with young All-Star <strong><a href="http://www.baseball-reference.com/players/h/hugheph01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Phil  Hughes</a></strong> in the third slot. That, of course, led to a sweep of the Minnestoa Twins to advance to the American League Championship Series.</p>
<p>First of all, I think it was probably the right decision. Hughes pitched 176 1/3 innings and gave up 82 earned runs, for an ER/IP of about .47. In Burnett&#8217;s 186 2/3 innings, he allowed 109 runs for an ER/IP of about .58. Surprisingly, Burnett allowed 9 unearned runs for a rate of about .048 unearned runs per inning pitched, whereas Hughes had only one unearned run for a rate of about .006, but of course those numbers probably don&#8217;t say anything significant. With 730 batters faced, he allowed about .11 earned runs per batter, or about 1 earned run every 9 batters faced, while Burnett&#8217;s 829 batters faced mean he had similar numbers of .13 earned runs per batter and 7.69 batters.</p>
<p>Most importantly to me, Hughes was much more predictable. Burnett faced, on average, 4.68 batters per inning pitched, with a variance of .92. Hughes faced over half a batter less per inning &#8211; 4.13 &#8211; and had a variance of .33. That means that not only did Burnett allow more baserunners, but when he was off, he was very off. Although the decision gets tougher when you have a higher BF/IP and a lower variance, Hughes was both better and more consistent in a similar number of innings, so he has to get the nod.</p>
<p>(That said, it&#8217;s shocking that such similar numbers produced one 18-8 pitcher and one 10-15 pitcher.)</p>
<p>The only question now is what order to pitch <a href="http://nymag.com/daily/sports/2010/10/so_how_will_the_yankees_line_u.html">the announced four-man rotation</a> for the ALCS. Of the choices,</p>
<blockquote><p><strong>OPTION 3</strong><br />
Sabathia<br />
Hughes<br />
Pettitte<br />
Burnett<br />
Sabathia<br />
Hughes<br />
Pettitte</p></blockquote>
<p>seems clearly superior to me. It allows Burnett to start but avoids starting him twice, gets Hughes in play quite often, and puts the very reliable <strong><a href="http://www.baseball-reference.com/players/p/pettian01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Andy  Pettitte</a></strong> in play for a potential Game Seven. The linked article lists as a con that Pettitte is considered the number 2 starter, but at the Major League level a manager can&#8217;t be concerned with such frivolities. Besides, Pettitte is an established company man. I&#8217;d be surprised if he balked at a rotation that both maximized the team&#8217;s chances to win and put him in position to be the clutch hero.</p>
<p>Incidentally, this option lends itself to using the same rotation in the World Series. Option 2:</p>
<blockquote><p>Sabathia<br />
Pettitte<br />
Hughes<br />
Sabathia<br />
Burnett<br />
Pettitte<br />
Sabathia</p></blockquote>
<p>leaves Sabathia unavailable to start Game 1 of the World Series and might put Pettitte on short rest depending on the schedule to start Game 1. I can&#8217;t see starting the Series with Hughes or Burnett.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/435/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/435/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=435&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/10/12/burnett-hughes-and-playoff-rotations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>The 600 Home Run Almanac</title>
		<link>http://tomflesher.com/2010/07/28/the-600-home-run-almanac/</link>
		<comments>http://tomflesher.com/2010/07/28/the-600-home-run-almanac/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 15:20:04 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[600 home runs]]></category>
		<category><![CDATA[A-Rod]]></category>
		<category><![CDATA[Alex Rodriguez]]></category>
		<category><![CDATA[Babe Ruth]]></category>
		<category><![CDATA[Barry Bonds]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Hank Aaron]]></category>
		<category><![CDATA[Jim Thome]]></category>
		<category><![CDATA[Ken Griffey Jr.]]></category>
		<category><![CDATA[Manny Ramirez]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[Sammy Sosa]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Willie Mays]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=398</guid>
		<description><![CDATA[People are interested in players who hit 600 home runs, at least judging by the Google searches that point people here. With that in mind, let&#8217;s take a look at some quick facts about the 600th home run and the people who have hit it. Age: There are six players to have hit #600. Sammy [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=398&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>People are interested in players who hit 600 home runs, at least judging by the Google searches that point people here. With that in mind, let&#8217;s take a look at some quick facts about the 600th home run and the people who have hit it.</p>
<p><strong>Age: </strong>There are <a href="http://bbref.com/pi/shareit/y3VbM">six players</a> to have hit #600. <strong><a href="http://www.baseball-reference.com/players/s/sosasa01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Sammy  Sosa</a></strong> was the oldest at 39 years old in 2007. <strong><a href="http://www.baseball-reference.com/players/g/griffke02.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Ken  Griffey</a></strong>, Jr. was 38 in 2007, as were <a href="http://www.baseball-reference.com/players/m/mayswi01.shtml"><strong>Willie Mays</strong></a> in 1969 and <strong><a href="http://www.baseball-reference.com/players/b/bondsba01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Barry  Bonds</a></strong> in 2002. <strong><a href="http://www.baseball-reference.com/players/a/aaronha01.shtml">Hank Aaron</a></strong> was 37. <strong><a href="http://www.baseball-reference.com/players/r/ruthba01.shtml">Babe Ruth</a></strong> was the youngest at 36 in 1931. <strong><a href="http://www.baseball-reference.com/players/r/rodrial01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Alex  Rodriguez</a></strong>, who is 35 as of July 27, will almost certainly be the youngest player to reach 600 home runs. If both <strong><a href="http://www.baseball-reference.com/players/r/ramirma02.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Manny  Ramirez</a></strong> and <strong><a href="http://www.baseball-reference.com/players/t/thomeji01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Jim  Thome</a></strong> hang on to hit #600 over the next two to three seasons, Thome (who was born in August of 1970) will probably be 42 in 2012; Ramirez (born in May of 1972) will be 41 in 2013. (In an earlier post that&#8217;s when I estimated each player would hit #600.) If Thome holds on, then, he&#8217;ll be the oldest player to hit his 600th home run.</p>
<p><strong>Productivity:</strong> Since 2000 (which encompasses Rodriguez, Ramirez, and Thome in their primes), the average league rate of home runs per plate appearances has been about .028. That is, a home run was hit in about 2.8% of plate appearances. Over the same time period, Rodriguez&#8217; rate was .064 &#8211; more than double the league average. Ramirez hit .059 &#8211; again, over double the league rate. Thome, for his part, hit at a rate of .065 home runs per plate appearance. From 2000 to 2009, Thome was more productive than Rodriguez.</p>
<p><strong>Standing Out:</strong> Obviously it&#8217;s unusual for them to be that far above the curve. There were 1,877,363 plate appearances (trials) from 2000 to 2009. The margin of error for a proportion like the rate of home runs per plate appearance is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Csqrt%7B%5Cfrac%7Bp%281-p%29%7D%7Bn-1%7D%7D+%3D+%5Csqrt%7B%5Cfrac%7B.028%28.972%29%7D%7B1%2C877%2C362%7D%7D+%3D+%5Csqrt%7B%5Cfrac%7B.027%7D%7B1%2C877%2C362%7D%7D+%5Capprox+%5Csqrt%7B%5Cfrac%7B14%7D%7B1%2C000%2C000%2C000%7D%7D+%3D+.00012&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;sqrt{&#92;frac{p(1-p)}{n-1}} = &#92;sqrt{&#92;frac{.028(.972)}{1,877,362}} = &#92;sqrt{&#92;frac{.027}{1,877,362}} &#92;approx &#92;sqrt{&#92;frac{14}{1,000,000,000}} = .00012' title='&#92;sqrt{&#92;frac{p(1-p)}{n-1}} = &#92;sqrt{&#92;frac{.028(.972)}{1,877,362}} = &#92;sqrt{&#92;frac{.027}{1,877,362}} &#92;approx &#92;sqrt{&#92;frac{14}{1,000,000,000}} = .00012' class='latex' /></p>
<p>Ordinarily, we expect a random individual chosen from the population to land within the space of <img src='http://s0.wp.com/latex.php?latex=p+%5Cpm+1.96+%5Ctimes+MoE&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='p &#92;pm 1.96 &#92;times MoE' title='p &#92;pm 1.96 &#92;times MoE' class='latex' /> 95% of the time. That means our interval is</p>
<p><img src='http://s0.wp.com/latex.php?latex=.027+%5Cpm+.00024&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='.027 &#92;pm .00024' title='.027 &#92;pm .00024' class='latex' /></p>
<p>That means that all three of the players are well without that confidence interval. (However, it&#8217;s likely that home run hitting is highly correlated with other factors that make this test less useful than it is in other situations.)</p>
<p><strong>Alex&#8217;s Drought:</strong> Finally, just how likely is it that Alex Rodriguez will go this long without a home run? He hit his last home run in his fourth plate appearance on <a href="http://www.baseball-reference.com/boxes/NYA/NYA201007220.shtml">July 22</a>. He had a fifth plate appearance in which he doubled. Since then, he&#8217;s played in five games totalling 22 plate appearances, so he&#8217;s gone 23 plate appearances without a home run. Assuming his rate of .064 home runs per plate appearance, how likely is that? We&#8217;d expect (.064*23) = about 1.5 home runs in that time, but how unlikely is this drought?</p>
<p>The binomial distribution is used to model strings of successes and failures in tests where we can say clearly whether each trial ended in a &#8220;yes&#8221; or &#8220;no.&#8221; We don&#8217;t need to break out that tool here, though &#8211; if the probability of a home run is .064, the probability of anything else is .936. The likelihood of a string of 23 non-home runs is</p>
<p><img src='http://s0.wp.com/latex.php?latex=.936%5E%7B23%7D+%3D+.218&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='.936^{23} = .218' title='.936^{23} = .218' class='latex' /></p>
<p>It&#8217;s only about 22% likely that this drought happened only by chance. The better guess is that, as Rodriguez has said, he&#8217;s distracted by the switching to marked baseballs and media pressure to finally hit #600.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/398/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/398/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/398/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=398&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/28/the-600-home-run-almanac/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>More on Home Runs Per Game</title>
		<link>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/</link>
		<comments>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 14:35:26 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Chow test]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Japanese baseball]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rays]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=335</guid>
		<description><![CDATA[In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=335&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the previous post, I looked at the trend in home runs per game in the Major Leagues and suggested that the recent deviation from the increasing trend might have been due to the development of strong farm systems like the Tampa Bay Rays&#8217;. That means that if the same data analysis process is used on data in an otherwise identical league, we should see similar trends but no dropoff around 1995. As usual, for replication purposes I&#8217;m going to use Japan&#8217;s Pro Baseball leagues, the Pacific and Central Leagues. They&#8217;re ideal because, just like the American Major Leagues, one league uses the designated hitter and one does not. There are some differences &#8211; the talent pool is a bit smaller because of the lower population base that the leagues draw from, and there are only 6 teams in each league as opposed to MLB&#8217;s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' title='&#92;hat{HR} = .957 - .0188 &#92;times t + .0004 &#92;times t^2 + .0911 &#92;times DH ' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} ' title='&#92;hat{HR} ' class='latex' /> is the predicted number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1 in 1955, and <em>DH</em> is a binary variable that takes value 1 if the league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#038;h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape that the MLB data showed, the Japanese data looks almost M-shaped with a maximum around 1984. (Why, I&#8217;m not sure &#8211; I&#8217;m not knowledgeable enough about Japanese baseball to know what might have caused that spike.) It reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that doesn&#8217;t. The unsurprising factor is the switching of signs for the <em>t</em> variables &#8211; we expected that based on the shape of the data. The surprising factor is that the designated hitter rule is insignificant. We can only be about 11% sure it&#8217;s significant. In addition, this model explains less of the variation than the MLB version &#8211; while that explained about 56% of the variation, the Japanese model has an <img src='http://s0.wp.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 ' title='R^2 ' class='latex' /> value of .4045, meaning it explains about 40% of the variation in home runs per game.</p>
<p>There&#8217;s a slightly interesting pattern to the residual home runs per game (<img src='http://s0.wp.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = &#92;hat{HR} - HR' title='Residual = &#92;hat{HR} - HR' class='latex' />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#038;h=82" alt="" width="150" height="82" /></a>it isn&#8217;t as pronounced, this data also shows a spike &#8211; but the spike is at <em>t</em>=55, so instead of showing up in 1995, the Japan leagues spiked around the early 2000s. Clearly the same effect is not in play, but why might the Japanese leagues see the same effect later than the MLB teams? It can&#8217;t be an expansion effect, since the Japanese leagues have stayed constant at 6 teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic (Breusch-Pagan test p-value .0796), so it might be better modeled using a generalized least squares formula, but doing so would have skewed the results of the replication.</p>
<p>In order to show that the parameters really are different, the appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow&#8217;s test for structural change</a>. To clean it up, I&#8217;m using only the data from 1960 on. (It&#8217;s quick and dirty, but it&#8217;ll do the job.) Chow&#8217;s test takes</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%5Csim%5C+F_%7Bk%2CN_1%2BN_2-2k%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' title='&#92;frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} &#92;sim&#92; F_{k,N_1+N_2-2k}' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=S_C+%3D+6.3666&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_C = 6.3666' title='S_C = 6.3666' class='latex' /> is the combined sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=S_1+%3D+1.2074&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_1 = 1.2074' title='S_1 = 1.2074' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=S_2+%3D+2.2983&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='S_2 = 2.2983' title='S_2 = 2.2983' class='latex' /> are the individual (i.e. MLB and Japan) sum of squared residuals, <img src='http://s0.wp.com/latex.php?latex=k%3D4&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='k=4' title='k=4' class='latex' /> is the number of parameters, and <img src='http://s0.wp.com/latex.php?latex=N_1+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_1 = 100' title='N_1 = 100' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=N_2+%3D+100&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='N_2 = 100' title='N_2 = 100' class='latex' /> are the number of observations in each group.</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%281.2074+%2B+2.2983%29%29%2F%284%29%7D%7B%28100%2B100%29%2F%28100%2B100-2%5Ctimes+4%29%7D+%5Csim%5C++F_%7B4%2C100%2B100-2+%5Ctimes+4%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' title='&#92;frac{(6.3666 -(1.2074 + 2.2983))/(4)}{(100+100)/(100+100-2&#92;times 4)} &#92;sim&#92;  F_{4,100+100-2 &#92;times 4}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B%286.3666+-%283.5057%29%29%2F%284%29%7D%7B%28200%29%2F%28192%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{(6.3666 -(3.5057))/(4)}{(200)/(192)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B2.8609%2F4%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{2.8609/4}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7B.7152%7D%7B1.0417%29%7D+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' title='&#92;frac{.7152}{1.0417)} &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=.6866+%5Csim%5C++F_%7B4%2C192%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='.6866 &#92;sim&#92;  F_{4,192}' title='.6866 &#92;sim&#92;  F_{4,192}' class='latex' /></p>
<p>The critical value for 90% significance at 4 and 192 degrees of freedom would be 1.974 according to <a href="http://www.stat.tamu.edu/~west/applets/fdemo.html">Texas A&amp;M&#8217;s F calculator</a>. That means we don&#8217;t have enough evidence that the parameters are different to treat them differently. This is probably an artifact of the small amount of data we have.</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:744px;width:1px;height:1px;overflow:hidden;">
<div class="snap_preview">
<p>In the previous post, I looked at the trend  in home runs per game in the Major Leagues and suggested that the  recent deviation from the increasing trend might have been due to the  development of strong farm systems like the Tampa Bay Rays’. That means  that if the same data analysis process is used on data in an otherwise  identical league, we should see similar trends but no dropoff around  1995. As usual, for replication purposes I’m going to use Japan’s Pro  Baseball leagues, the Pacific and Central Leagues. They’re ideal  because, just like the American Major Leagues, one league uses the  designated hitter and one does not. There are some differences – the  talent pool is a bit smaller because of the lower population base that  the leagues draw from, and there are only 6 teams in each league as  opposed to MLB’s 14 and 16.</p>
<p>As a reminder, the MLB regression gave us a regression equation of</p>
<p><img class="latex" title="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911 \times  DH " /></p>
<p>where <img class="latex" title="\hat{HR} " src="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\hat{HR} " /> is the predicted  number of home runs per game,<em> t</em> is a time variable starting at <em>t</em>=1  in 1954, and <em>DH</em> is a binary variable that takes value 1 if the  league uses the designated hitter in the season in question.</p>
<p>Just examining the data on home runs per game from the Japanese  leagues, the trend looks significantly differe<a href="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg"><img class="alignright size-thumbnail  wp-image-336" title="japanhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#038;h=82&#038;h=82" alt="" width="150" height="82" /></a>nt.  Instead of the rough U-shape  that the MLB data showed, the Japanese data looks almost M-shaped with a  maximum around 1984. (Why, I’m not sure – I’m not knowledgeable enough  about Japanese baseball to know what might have caused that spike.) It  reaches a minimum again and then keeps rising.</p>
<p>After running the same regression with <em>t</em>=1 in 1950, I got  these results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.2462</td>
<td align="right">0.0992</td>
<td align="right">2.481</td>
<td align="right">0.0148</td>
<td align="right">0.9852</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">0.0478</td>
<td align="right">0.0062</td>
<td align="right">7.64</td>
<td align="right">1.63E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">-0.0006</td>
<td align="right">0.00009</td>
<td align="right">-7.463</td>
<td align="right">3.82E-11</td>
<td align="right">1</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0052</td>
<td align="right">0.0359</td>
<td align="right">0.144</td>
<td align="right">0.8855</td>
<td align="right">0.1145</td>
</tr>
</tbody>
</table>
<p>This equation shows two things, one that surprises me and one that  doesn’t. The unsurprising factor is the switching of signs for the <em>t</em> variables – we expected that based on the shape of the data. The  surprising factor is that the designated hitter rule is insignificant.  We can only be about 11% sure it’s significant. In addition, this model  explains less of the variation than the MLB version – while that  explained about 56% of the variation, the Japanese model has an <img class="latex" title="R^2 " src="http://l.wordpress.com/latex.php?latex=R%5E2+&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="R^2 " /> value of .4045, meaning it  explains about 40% of the variation in home runs per game.</p>
<p>There’s a slightly interesting pattern to the residual home runs per  game (<img class="latex" title="Residual = \hat{HR} - HR" src="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="Residual = \hat{HR} - HR" />. Although <a href="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg"><img class="alignright size-thumbnail wp-image-338" title="japanresidualhrpergame" src="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#038;h=82&#038;h=82" alt="" width="150" height="82" /></a>it isn’t as pronounced, this data  also shows a spike – but the spike is at <em>t</em>=55, so instead of  showing up in 1995, the Japan leagues spiked around the early 2000s.  Clearly the same effect is not in play, but why might the Japanese  leagues see the same effect later than the MLB teams? It can’t be an  expansion effect, since the Japanese leagues have stayed constant at 6  teams since their inception.</p>
<p>Incidentally, the Japanese league data is heteroskedastic  (Breusch-Pagan test p-value .0796), so it might be better modeled using a  generalized least squares formula, but doing so would have skewed the  results of the replication.</p>
<p>In order to show that the parameters really are different, the  appropriate test is <a href="http://en.wikipedia.org/wiki/Chow_test">Chow’s  test for structural change</a>. To clean it up, I’m using only the data  from 1960 on. (It’s quick and dirty, but it’ll do the job.) Chow’s test  takes</p>
<p><img class="latex" title="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" src="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&amp;bg=ffffff&amp;fg=000000&amp;s=0" alt="\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F" /></p>
</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/335/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/335/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/335/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=335&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/09/more-on-home-runs-per-game/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+%3D+.957+-+.0188+%5Ctimes+t+%2B+.0004+%5Ctimes+t%5E2+%2B+.0911+%5Ctimes+DH+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} = .957 - .0188 \times t + .0004 \times t^2 + .0911  \times DH </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Chat%7BHR%7D+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\hat{HR} </media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanhrpergame1.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=R%5E2+&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">R^2 </media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=Residual+%3D+%5Chat%7BHR%7D+-+HR&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">Residual = \hat{HR} - HR</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/japanresidualhrpergame11.jpg?w=150&#38;h=82" medium="image">
			<media:title type="html">japanresidualhrpergame</media:title>
		</media:content>

		<media:content url="http://l.wordpress.com/latex.php?latex=%5Cfrac%7B%28S_C+-%28S_1%2BS_2%29%29%2F%28k%29%7D%7B%28S_1%2BS_2%29%2F%28N_1%2BN_2-2k%29%7D+%7E+F&#38;bg=ffffff&#38;fg=000000&#38;s=0" medium="image">
			<media:title type="html">\frac{(S_C -(S_1+S_2))/(k)}{(S_1+S_2)/(N_1+N_2-2k)} ~ F</media:title>
		</media:content>
	</item>
		<item>
		<title>Back when it was hard to hit 55&#8230;</title>
		<link>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/</link>
		<comments>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:06:05 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[home runs]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[sabermetrics]]></category>
		<category><![CDATA[Stuff Keith Hernandez Says]]></category>
		<category><![CDATA[talent pool dilution]]></category>
		<category><![CDATA[Willie Mays]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=319</guid>
		<description><![CDATA[Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about Willie Mays for some reason, and said that Mays hit 55 home [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=319&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last night was one of those classic Keith Hernandez moments where he started talking and then stopped abruptly, which I always like to assume is because the guys in the truck are telling him to shut the hell up. He was talking about <a href="http://www.baseball-reference.com/players/m/mayswi01.shtml">Willie Mays </a>for some reason, and said that Mays hit 55 home runs &#8220;back when it was hard to hit 55.&#8221; Keith coyly said that, while it was easy for a while, it was &#8220;getting hard again,&#8221; at which point he abruptly stopped talking.</p>
<p>Keith&#8217;s unusual candor about drug use and Mays&#8217; career best of 52 home runs aside, this pinged my &#8220;Stuff Keith Hernandez Says&#8221; meter. After accounting for any time trend and other factors that might explain home run hitting, is there an upward trend? If so, is there a pattern to the remaining home runs?</p>
<p>The first step is to examine the data to see if there appears to be any trend. Just looking at it, there appears to be a messy U shape with a minimum around t=20, which indicates a quadratic trend. That means I want to include a term for time and a term for time squared.<a href="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg"><img class="alignright size-thumbnail  wp-image-325" title="homerunspergame" src="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150&#038;h=102" alt="" width="150" height="102" /></a></p>
<p>Using the per-game averages for home runs from 1955 to 2009, I detrended the data using t=1 in 1955. I also had to correct for the effect of the designated hitter. That gives us an equation of the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BHR%7D+%3D+%5Chat%7B%5Cbeta_%7B0%7D%7D+%2B+%5Chat%7B%5Cbeta_%7B1%7D%7Dt+%2B+%5Chat%7B%5Cbeta_%7B2%7D%7D+t%5E%7B2%7D+%2B+%5Chat%7B%5Cbeta_%7B3%7D%7D+DH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' title='&#92;hat{HR} = &#92;hat{&#92;beta_{0}} + &#92;hat{&#92;beta_{1}}t + &#92;hat{&#92;beta_{2}} t^{2} + &#92;hat{&#92;beta_{3}} DH ' class='latex' /></p>
<p>The results:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20"></td>
<td width="64">Estimate</td>
<td width="64">Std. Error</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">B0</td>
<td align="right">0.957</td>
<td align="right">0.0328</td>
<td align="right">29.189</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">t</td>
<td align="right">-0.0188</td>
<td align="right">0.0028</td>
<td align="right">-6.738</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">tsq</td>
<td align="right">0.0004</td>
<td align="right">0.00005</td>
<td align="right">8.599</td>
<td align="right">0.0001</td>
<td align="right">0.9999</td>
</tr>
<tr>
<td height="20">DH</td>
<td align="right">0.0911</td>
<td align="right">0.0246</td>
<td align="right">3.706</td>
<td align="right">0.0003</td>
<td align="right">0.9997</td>
</tr>
</tbody>
</table>
<p>We can see that there&#8217;s an upward quadratic trend in predicted home runs that together with the DH rule account for about 56% of the variation in the number of home runs per game in a season (<img src='http://s0.wp.com/latex.php?latex=R%5E2+%3D+.5618&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='R^2 = .5618' title='R^2 = .5618' class='latex' />). The Breusch-Pagan test has a p-value of .1610, indicating a possibility of mild homoskedasticity but nothing we should get concerned about.</p>
<p>Then, I needed to look at the difference between the predicted number of home runs per game and the actual number of home runs per game, which is accessible by subtracting</p>
<p><img src='http://s0.wp.com/latex.php?latex=Residual+%3D+HR+-+%5Chat%7BHR%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='Residual = HR - &#92;hat{HR}' title='Residual = HR - &#92;hat{HR}' class='latex' /></p>
<p>This represents the &#8220;abnormal&#8221; number of home runs per year. The question then becomes, &#8220;Is there a patt<a href="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg"><img class="alignright size-thumbnail  wp-image-331" title="homerunresiduals" src="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150&#038;h=102" alt="" width="150" height="102" /></a>ern to the number of abnormal home runs?&#8221;  There are two ways to answer this. The first way is to look at the abnormal home runs. Up until about t=40 (the mid-1990s), the abnormal home runs are pretty much scattershot above and below 0. However, at t=40, the residual jumps up for both leagues and then begins a downward trend. It&#8217;s not clear what the cause of this is, but the knee-jerk reaction is that there might be a drug use effect. On the other hand, there are a couple of other explanations.</p>
<p>The most obvious is a boring old expansion effect. In 1993, the National League added two teams (the Marlins and the Rockies), and in 1998 each league added a team (the AL&#8217;s Rays and the NL&#8217;s Diamondbacks). Talent pool dilution has shown up in our discussion of hit batsmen, and I believe that it can be a real effect. It would be mitigated over time, however, by the establishment and development of farm systems, in particular strong systems like the one that&#8217;s producing good, cheap talent for the Rays.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/319/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=319&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/08/back-when-it-was-hard-to-hit-55/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunspergame1.jpg?w=150" medium="image">
			<media:title type="html">homerunspergame</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/07/homerunresiduals1.jpg?w=150" medium="image">
			<media:title type="html">homerunresiduals</media:title>
		</media:content>
	</item>
		<item>
		<title>Tough Losses</title>
		<link>http://tomflesher.com/2010/07/08/tough-losses-2/</link>
		<comments>http://tomflesher.com/2010/07/08/tough-losses-2/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 13:12:24 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[Dan Haren]]></category>
		<category><![CDATA[Jon Niese]]></category>
		<category><![CDATA[Roy Halladay]]></category>
		<category><![CDATA[Roy Oswalt]]></category>
		<category><![CDATA[Ubaldo Jimenez]]></category>
		<category><![CDATA[weird lines]]></category>
		<category><![CDATA[Year of the Pitcher]]></category>
		<category><![CDATA[Yovani Gallardo]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=316</guid>
		<description><![CDATA[Last night, Jonathon Niese pitched 7.2 innings of respectable work (6 hits, 3 runs, all earned, 1 walk, 8 strikeouts, 2 home runs, for a game score of 62) but still took the loss due to his unfortunate lack of run support &#8211; the Mets&#8217; only run came in from an Angel Pagan solo homer. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=316&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.baseball-reference.com/boxes/NYN/NYN201007070.shtml">Last night</a>, <strong><a href="http://www.baseball-reference.com/players/n/niesejo01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Jonathon  Niese</a></strong> pitched 7.2 innings of respectable work (6 hits, 3 runs, all earned, 1 walk, 8 strikeouts, 2 home runs, for a game score of 62) but still took the loss due to his unfortunate lack of run support &#8211; the Mets&#8217; only run came in from an <strong><a href="http://www.baseball-reference.com/players/p/paganan01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Angel  Pagan</a></strong> solo homer. This is a prime example of what Bill James called a &#8220;Tough Loss&#8221;: a game in which the starting pitcher made a quality start but took a loss anyway.</p>
<p>There are two accepted measures of what a quality start is. Officially, a quality start is one with 6 or more innings pitched and 3 or fewer runs. Bill James&#8217; definition used his <a href="http://en.wikipedia.org/wiki/Game_Score">game score</a> statistic and used 50 as the cutoff point for a quality start. Since a pitcher gets 50 points for walking out on the mound and then adds to or subtracts from that value based on his performance, game score has the nice property of showing whether a pitcher added value to the team or not.</p>
<p>Using the game score definition, there were 393 <a href="http://bbref.com/pi/shareit/GGFxp">losses in quality starts last year</a>, including 109 by July 7th. <strong><a href="http://www.baseball-reference.com/players/j/jimenub01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Ubaldo  Jimenez</a></strong> and <strong><a href="http://www.baseball-reference.com/players/h/harenda01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Dan  Haren</a></strong> led the league with 7, <strong><a href="http://www.baseball-reference.com/players/h/hallaro01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Roy  Halladay</a></strong> had 6, and <strong><a href="http://www.baseball-reference.com/players/g/gallayo01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Yovani  Gallardo</a></strong> (who&#8217;s quickly becoming my favorite player because he seems to show up in every category) was also up there with 6.</p>
<p>So far this year, though, it seems to be the Year of the Tough Loss. There have already been 230, and <strong><a href="http://www.baseball-reference.com/players/o/oswalro01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Roy  Oswalt</a></strong> is already at the 6-tough-loss mark. Halladay is already up at 4. This is consistent with the talk of the Year of the Pitcher, with better pitching (and potentially less use of performance-enhancing drugs) leading to lower run support. That will require a bit more work to confirm, though.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/316/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/316/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/316/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=316&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/07/08/tough-losses-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>How often should Youk take his base?</title>
		<link>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/</link>
		<comments>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 14:55:00 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[binomial distribution]]></category>
		<category><![CDATA[Brett Carroll]]></category>
		<category><![CDATA[Greek God of Take Your Base]]></category>
		<category><![CDATA[hit batsmen]]></category>
		<category><![CDATA[hit by pitch]]></category>
		<category><![CDATA[Kevin Youkilis]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=286</guid>
		<description><![CDATA[Kevin Youkilis is sometimes called &#8220;The Greek God of Walks.&#8221; I prefer to think of him as &#8220;The Greek God of Take Your Base,&#8221; since he seems to get hit by pitches at an alarming rate. In fact, this year, he&#8217;s been hit 7 times in 313 plate appearances. (Rickie Weeks, however, is leading the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=286&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong><a href="http://www.baseball-reference.com/players/y/youklke01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Kevin  Youkilis</a></strong> is sometimes called &#8220;The Greek God of Walks.&#8221; I prefer to think of him as &#8220;The Greek God of Take Your Base,&#8221; since he seems to get hit by pitches at an alarming rate. In fact, this year, he&#8217;s been hit 7 times in 313 plate appearances. (<strong><a href="http://www.baseball-reference.com/players/w/weeksri01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Rickie  Weeks</a></strong>, however, is leading the pack with 13 in 362 plate appearances. We&#8217;ll look at him, too.) There are three explanations for this:</p>
<ol>
<li>There&#8217;s something about Youk&#8217;s batting or his hitting stance that causes him to be hit. This is my preferred explanation. Youkilis has an unusual batting grip that thrusts his lead elbow over the plate, and as he swings, he lunges forward, which exposes him to being plunked more often.</li>
<li>Youkilis is such a hitting machine that the gets hit often in order to keep him from swinging for the fences. This doesn&#8217;t hold water, to me. A pitcher could just as easily put him on base safely with an intentional walk, so unless there&#8217;s some other incentive to hit him, there&#8217;s no reason to risk ejection by throwing at Youkilis. This leads directly to&#8230;</li>
<li><a href="http://www.bareknucks.com/kevin-youkilis-is-everyones-favorite-beaning-target-because-hes-a-hitting-machine-no-because-he-bats-like-a-tool">Youk is a jerk</a>. This is pretty self-explanatory, and is probably a factor.</li>
</ol>
<p>First of all, we need to figure out whether it&#8217;s likely that Kevin is being hit by chance. To figure that out, we need to make some assumptions about hit batsmen and evaluate them using the <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>. I&#8217;m also excited to point out that Youk has been overtaken as the Greek God of Take Your Base by someone new: <strong><a href="http://www.baseball-reference.com/players/c/carrobr01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Brett  Carroll</a></strong>.<span id="more-286"></span></p>
<p>I&#8217;m going to assume that the rate of hit batsmen is constant over time. This assumption is probably justified, since the number of hit batsmen per team per American League game has stayed between .21 and .25 since 1996, and the number of plate appearances per team per game has stayed between 33.98 and 34.9 over the same time period. Based on that, I feel justified in using the 2009 hit batsman rate to evaluate Youkilis&#8217;s stats this year. It&#8217;s undesirable to use this year&#8217;s rates if 2009&#8242;s will fit, since this year has a much smaller number of occurrences. Since a number of players with only a few at-bats might distort the average, I limited my sample to only <a href="http://bbref.com/pi/shareit/fhbK6">players with 50 plate appearances or more</a>, then divided the total number of HBP by the total number of plate appearances and got .00859. (For the record, the sample of all players with at least one plate appearance had a rate of .00850.)</p>
<p>I&#8217;m also going to assume that occurrences of hit batsmen are binomially distributed. That is, they occur at a known rate, which is equivalent to the rate of hit batsmen in 2009, and that every individual hit-by-pitch is independent of all others. (I might have to relax this assumption later, but it&#8217;s good for a first approximation.) As a result, the probability of being hit by a pitch <em>k</em> times in <em>n</em> plate appearances with a known rate of <em>p</em> is</p>
<p><img src='http://s0.wp.com/latex.php?latex=f%28k%3Bn%2Cp%29+%3D+%7Bn%5Cchoose+k%7Dp%5Ek%281-p%29%5E%7Bn-k%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='f(k;n,p) = {n&#92;choose k}p^k(1-p)^{n-k}' title='f(k;n,p) = {n&#92;choose k}p^k(1-p)^{n-k}' class='latex' /></p>
<p>where</p>
<p><img src='http://s0.wp.com/latex.php?latex=%7Bn%5Cchoose+k%7D%3D%5Cfrac%7Bn%21%7D%7Bk%21%28n-k%29%21%7D&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='{n&#92;choose k}=&#92;frac{n!}{k!(n-k)!}' title='{n&#92;choose k}=&#92;frac{n!}{k!(n-k)!}' class='latex' /></p>
<p>Using R, I estimated a binomial distribution using <em>n</em>=313 plate appearances, <em>k</em>=1,2,..,10, and <em>p</em>=.00859 to determine the probability that he&#8217;d be hit <em>k</em> times. My results are:</p>
<table border="0" cellspacing="0" cellpadding="0" width="192">
<col span="3" width="64"></col>
<tbody>
<tr>
<td width="64" height="20">HBP</td>
<td width="64">p(HBP)</td>
<td width="64">Total</td>
</tr>
<tr>
<td height="19">0</td>
<td>0.06721</td>
<td>0.06721</td>
</tr>
<tr>
<td height="20">1</td>
<td>0.18225</td>
<td>0.24946</td>
</tr>
<tr>
<td height="20">2</td>
<td>0.2463</td>
<td>0.49576</td>
</tr>
<tr>
<td height="20">3</td>
<td>0.2212</td>
<td>0.71696</td>
</tr>
<tr>
<td height="20">4</td>
<td>0.14851</td>
<td>0.86547</td>
</tr>
<tr>
<td height="20">5</td>
<td>0.07951</td>
<td>0.94498</td>
</tr>
<tr>
<td height="20">6</td>
<td>0.03536</td>
<td>0.98034</td>
</tr>
<tr>
<td height="20">7</td>
<td>0.01344</td>
<td>0.99378</td>
</tr>
<tr>
<td height="20">8</td>
<td>0.00445</td>
<td>0.99823</td>
</tr>
<tr>
<td height="20">9</td>
<td>0.0013</td>
<td>0.99953</td>
</tr>
<tr>
<td height="20">10</td>
<td>0.00034</td>
<td>0.99987</td>
</tr>
</tbody>
</table>
<p>If Youkilis is a normal hitter, then it&#8217;s 98% likely that Youkilis would be hit less than seven times. It&#8217;s very unlikely that in those 313 plate appearances he&#8217;d be hit by chance alone 7 times.</p>
<p>Youkilis has company, though: the aforementioned Rickie Weeks, who&#8217;s been hit 13 times in 362 plate appearances. I re-estimated the distribution using <em>k</em>=1,2,&#8230;,15, <em>n</em>=362, <em>p</em>=.00859 and got the following results:</p>
<table style="border-collapse:collapse;width:183pt;" border="0" cellspacing="0" cellpadding="0" width="244">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:76pt;" width="101"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:76pt;" width="101">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.404E-02</td>
<td class="xl65">0.04404</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">1.381E-01</td>
<td class="xl65">0.18216</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">2.160E-01</td>
<td class="xl65">0.39815</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">2.245E-01</td>
<td class="xl65">0.62269</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">1.746E-01</td>
<td class="xl65">0.79727</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">1.083E-01</td>
<td class="xl65">0.90556</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">5.582E-02</td>
<td class="xl65">0.96138</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">2.459E-02</td>
<td class="xl65">0.98597</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">9.450E-03</td>
<td class="xl65">0.99542</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">3.220E-03</td>
<td class="xl65">0.99864</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">9.900E-04</td>
<td class="xl65">0.99963</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">11</td>
<td class="xl66">2.731E-04</td>
<td class="xl65">0.999903127</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">12</td>
<td class="xl66">6.921E-05</td>
<td class="xl65">0.999972337</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">13</td>
<td class="xl66">1.614E-05</td>
<td class="xl65">0.99998848</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">14</td>
<td class="xl66">3.486E-06</td>
<td class="xl65">0.999991966</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">15</td>
<td class="xl66">7.007E-07</td>
<td class="xl65">0.999992667</td>
</tr>
</tbody>
</table>
<p>It&#8217;s almost impossible for Weeks to have been hit that much. Again, he&#8217;s 95% or more likely to have been hit six times or fewer, and there&#8217;s a whopping 99.99885% chance that if he&#8217;s an average hitter he&#8217;d be hit less than he has this season in as many plate appearances.</p>
<p>The king of hit batsmen, though, and the new Greek God of Take Your Base, is Florida Marlins pinch hitter and outfielder Brett Carroll. In 90 plate appearances this year, he&#8217;s been hit seven times! That&#8217;s as much as Youkilis, but far more efficient &#8211; he required less than one-third of the plate appearances to achieve the same number of plunks. Using his 90 plate appearances and <em>k</em>=1,2,..10, Carroll&#8217;s distribution is below:</p>
<table style="border-collapse:collapse;width:183pt;" border="0" cellspacing="0" cellpadding="0" width="244">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:76pt;" width="101"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:76pt;" width="101">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.60E-01</td>
<td class="xl65">0.4600902</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">3.59E-01</td>
<td class="xl65">0.8188182</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">1.38E-01</td>
<td class="xl65">0.9571132</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">3.51E-02</td>
<td class="xl65">0.9922562</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">6.62E-03</td>
<td class="xl65">0.99887814</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">9.87E-04</td>
<td class="xl65">0.99986485</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">1.21E-04</td>
<td class="xl65">0.99998594</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">1.26E-05</td>
<td class="xl65">0.99999852</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">1.13E-06</td>
<td class="xl65">0.999999652</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">8.93E-08</td>
<td class="xl65">0.999999741</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">6.27E-09</td>
<td class="xl65">0.999999747</td>
</tr>
</tbody>
</table>
<p>Carroll, in 90 plate appearances, should have been hit less than <strong>twice</strong>. His rate &#8211; .078 times hit by pitch per plate appearance &#8211; is more than <strong>nine times</strong> the league&#8217;s rate. Ascend Mount Olympus, Brett, and work on getting out of the way more often.</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:972px;width:1px;height:1px;overflow:hidden;">
<table style="border-collapse:collapse;width:155pt;" border="0" cellspacing="0" cellpadding="0" width="207">
<col style="width:48pt;" width="64"></col>
<col style="width:59pt;" width="79"></col>
<col style="width:48pt;" width="64"></col>
<tbody>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;width:48pt;" width="64" height="20">HBP</td>
<td class="xl65" style="width:59pt;" width="79">p(HBP)</td>
<td class="xl65" style="width:48pt;" width="64">Total</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">0</td>
<td class="xl66">4.40400E-02</td>
<td class="xl65">0.04404</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">1</td>
<td class="xl66">1.38120E-01</td>
<td class="xl65">0.18216</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">2</td>
<td class="xl66">2.15990E-01</td>
<td class="xl65">0.39815</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">3</td>
<td class="xl66">2.24540E-01</td>
<td class="xl65">0.62269</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">4</td>
<td class="xl66">1.74580E-01</td>
<td class="xl65">0.79727</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">5</td>
<td class="xl66">1.08290E-01</td>
<td class="xl65">0.90556</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">6</td>
<td class="xl66">5.58200E-02</td>
<td class="xl65">0.96138</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">7</td>
<td class="xl66">2.45900E-02</td>
<td class="xl65">0.98597</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">8</td>
<td class="xl66">9.45000E-03</td>
<td class="xl65">0.99542</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">9</td>
<td class="xl66">3.22000E-03</td>
<td class="xl65">0.99864</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">10</td>
<td class="xl66">9.90000E-04</td>
<td class="xl65">0.99963</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">11</td>
<td class="xl67" align="right">2.73127E-04</td>
<td class="xl65">0.999903</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">12</td>
<td class="xl67" align="right">6.92103E-05</td>
<td class="xl65">0.999972</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">13</td>
<td class="xl67" align="right">1.61427E-05</td>
<td class="xl65">0.999988</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">14</td>
<td class="xl67" align="right">3.48620E-06</td>
<td class="xl65">0.999992</td>
</tr>
<tr style="height:15pt;">
<td class="xl65" style="height:15pt;" height="20">15</td>
<td class="xl67" align="right">7.00680E-07</td>
<td class="xl65">0.999993</td>
</tr>
</tbody>
</table>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/286/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/286/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/286/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=286&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/30/does-kevin-youkilis-really-get-hit-that-much/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Carlos Zambrano, Ace Pinch Hitter?</title>
		<link>http://tomflesher.com/2010/06/21/carlos-zambrano-ace-pinch-hitter/</link>
		<comments>http://tomflesher.com/2010/06/21/carlos-zambrano-ace-pinch-hitter/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 01:40:27 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[baseball-reference.com]]></category>
		<category><![CDATA[bullpen]]></category>
		<category><![CDATA[Carlos Zambrano]]></category>
		<category><![CDATA[Cubs]]></category>
		<category><![CDATA[Joba Chamberlain]]></category>
		<category><![CDATA[Lou Piniella]]></category>
		<category><![CDATA[Micah Owings]]></category>
		<category><![CDATA[RE24]]></category>
		<category><![CDATA[relief]]></category>
		<category><![CDATA[setup man]]></category>
		<category><![CDATA[starter]]></category>
		<category><![CDATA[Ubaldo Jimenez]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=251</guid>
		<description><![CDATA[Earlier this year, Chicago Cubs manager Lou Piniella experimented with moving starting pitcher and relatively big hitter Carlos Zambrano to the bullpen, briefly making him the Major Leagues&#8217; best-paid setup man. Zambrano is back in the rotation as of the beginning of June. I&#8217;m curious what the effect of moving him to the bullpen was. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=251&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Earlier this year, Chicago Cubs manager Lou Piniella experimented with moving starting pitcher and relatively big hitter <strong><a href="http://www.baseball-reference.com/players/z/zambrca01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Carlos  Zambrano</a></strong> to the bullpen, briefly making him the Major Leagues&#8217; best-paid setup man. Zambrano is back in the rotation as of the beginning of June. I&#8217;m curious what the effect of moving him to the bullpen was.</p>
<p>The thing is that not only is Zambrano an excellent pitcher (though he was slumping at the time), he&#8217;s also a regarded as a very good hitter for a pitcher. He&#8217;s a career .237 hitter, with a slump last year at &#8220;only&#8221; .217 in 72 plate appearances (17th most in the National League), which was 6th in the National League among <a href="http://bbref.com/pi/shareit/X7pUg">pitchers with at least 50 plate appearances</a>. He didn&#8217;t walk enough (his OBP was 13th on the same list), but he was 9th of the 51 pitchers on the list in terms of Base-Out Runs Added (RE24) with about 5.117 runs below a replacement-level batter. <strong><a href="http://www.baseball-reference.com/players/j/jimenub01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Ubaldo  Jimenez</a></strong> was also up there with a respectable .220 BA, .292 OBP, but -8.950 RE24.</p>
<p>It should be pointed out that pitcher RE24 is almost always negative for starters &#8211; the best RE24 on that list is <strong><a href="http://www.baseball-reference.com/players/o/owingmi01.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Micah  Owings</a></strong> with -2.069. Zambrano&#8217;s run contribution was negative, sure, but it was a lot less negative than most starters. Zambrano also lost a bit of flexibility as an emergency pinch hitter (something that Owings is going through right now due to his recent move to the bullpen) &#8211; he&#8217;s more valuable as a reliever, so they won&#8217;t use him to pinch hit. As a result, he loses at-bats, and that not only keeps him from amassing hits. It also allows him to get rusty.</p>
<p>It&#8217;s hard to precisely value the loss of Zambrano&#8217;s contribution, although <a href="http://www.baseball-reference.com/players/gl.cgi?n1=zambrca01&amp;t=b&amp;year=2010&amp;share=1.44#272-293-sum:batting_gamelogs">he&#8217;s already on pace for -6.1 batting RE24</a>. It&#8217;s likely, in my opinion, that his RE24 will rise as he continues hitting over the course of the year. His pitching value is also negative, however, which is unusual. He&#8217;s always been <a href="http://bbref.com/pi/shareit/QEmjg">very respectable among Cubs starters</a>. It&#8217;s possible that although he was pitching very well in relief, the fact that he has the ability to go long means that it&#8217;s inefficient to use him as a reliever. This is the opposite of, say, <strong><a href="http://www.baseball-reference.com/players/c/chambjo03.shtml?utm_source=direct&amp;utm_medium=linker&amp;utm_campaign=Linker">Joba  Chamberlain</a></strong>, who is overpowering in relief but struggles as a starter.</p>
<p>As a starter, Zambrano has never been a net loss of runs. He needs to stay out of the bullpen, and Joba needs to stay there.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/251/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/251/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/251/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=251&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/21/carlos-zambrano-ace-pinch-hitter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
		<item>
		<title>Modeling Run Production</title>
		<link>http://tomflesher.com/2010/06/19/modeling-run-production/</link>
		<comments>http://tomflesher.com/2010/06/19/modeling-run-production/#comments</comments>
		<pubDate>Sat, 19 Jun 2010 18:28:39 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[run production]]></category>
		<category><![CDATA[sports economics]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=203</guid>
		<description><![CDATA[A baseball team can be thought of as a factory which uses a single crew to operate two machines. The first machine produces runs while the team bats, and the second machine produces outs while the team is on fields. This is a somewhat abstract way to look at the process of winning games, because [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=203&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A baseball team can be thought of as a factory which uses a single crew to operate two machines. The first machine produces runs while the team bats, and the second machine produces outs while the team is on fields. This is a somewhat abstract way to look at the process of winning games, because ordinarily machines have a fixed input and a fixed output. In a box factory, the input comprises man-hours and corrugated board, and the output is a finished box. Here, the input isn&#8217;t as well-defined.</p>
<p>Runs are a function of total bases, certainly, but total bases are functions of things like hits, home runs, and walks. Basically, runs are a function of getting on base and of advancing people who are already on base. Obviously, the best measure of getting on base is On-Base Percentage, and Slugging Average (expected number of bases per at-bat) is a good measure of advancement.</p>
<p>OBP wraps up a lot of things &#8211; walks, hits, and hit-by-pitch appearances &#8211; and SLG corrects for the greater effects of doubles, triples, and home runs. That doesn&#8217;t account for a few other things, though, like stolen bases, sacrifice flies, and sacrifice hits. It also doesn&#8217;t reflect batter ability directly, but that&#8217;s okay &#8211; the stats we have should represent batter ability since the defensive side is trying to prevent run production. The model might look something like this, then:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BRuns%7D+%3D+%5Chat%7B%5Cbeta_0%7D+%2B+%5Chat%7B%5Cbeta_1%7D+OBP+%2B+%5Chat%7B%5Cbeta_2%7D+SLG+%2B+%5Chat%7B%5Cbeta_3%7D+SB+%2B+%5Chat%7B%5Cbeta_4%7D+SF+%2B+%5Chat%7B%5Cbeta_5%7D+SH+&amp;bg=ffffff&amp;fg=666666&amp;s=0' alt='&#92;hat{Runs} = &#92;hat{&#92;beta_0} + &#92;hat{&#92;beta_1} OBP + &#92;hat{&#92;beta_2} SLG + &#92;hat{&#92;beta_3} SB + &#92;hat{&#92;beta_4} SF + &#92;hat{&#92;beta_5} SH ' title='&#92;hat{Runs} = &#92;hat{&#92;beta_0} + &#92;hat{&#92;beta_1} OBP + &#92;hat{&#92;beta_2} SLG + &#92;hat{&#92;beta_3} SB + &#92;hat{&#92;beta_4} SF + &#92;hat{&#92;beta_5} SH ' class='latex' /></p>
<p>This is the simplest model we can start with &#8211; each factor contributes a discrete number of runs. If we need to (and we probably will), we can add terms to capture concavity of the marginal effect of different stats, or (more likely) an interaction term for SLG and, say, SB, so that a stolen base is worth more on a team where you&#8217;re more likely to be brought home by a batter because he&#8217;s more likely to give you extra bases. As it is, however, we can test this model with linear regression. The details of it are behind the cut.<span id="more-203"></span></p>
<p>I&#8217;m using a dataset (available on request) of American League data pulled from Baseball-Reference.com&#8217;s <a href="http://www.baseball-reference.com/leagues/">Leagues page</a>.  I&#8217;m using the AL only because I don&#8217;t want to correct for the designated hitter&#8217;s differential runs.</p>
<p>The first thing I need to do is decide whether to add a trend correction.</p>
<p style="text-align:center;"><a href="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg"><img class="alignnone size-medium wp-image-218" title="Alruntrend" src="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg?w=300&#038;h=181" alt="Trend of league run total, 2000-2009" width="300" height="181" /></a></p>
<p>I don&#8217;t have to account for a time trend, so I&#8217;m just going to use the team-level data. Using linear regression, I fitted the model above and got the following output:</p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr style="text-align:center;">
<td width="64" height="20"></td>
<td width="64">Value</td>
<td width="64">Std Err</td>
<td width="64">t-value</td>
<td width="64">p-value</td>
<td width="64">Signif</td>
</tr>
<tr>
<td height="20">Intercept</td>
<td>-904.638</td>
<td>51.68286</td>
<td>-17.504</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">OBP</td>
<td>2893.123</td>
<td>233.7059</td>
<td>12.379</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">SLG</td>
<td>1601.076</td>
<td>122.3527</td>
<td>13.086</td>
<td>0.00000</td>
<td>1.00000</td>
</tr>
<tr>
<td height="20">SB</td>
<td>-0.01907</td>
<td>0.06415</td>
<td>-0.297</td>
<td>0.76680</td>
<td>0.23320</td>
</tr>
<tr>
<td height="20">SF</td>
<td>0.65975</td>
<td>0.25356</td>
<td>2.602</td>
<td>0.01030</td>
<td>0.98970</td>
</tr>
<tr>
<td height="20">SH</td>
<td>0.28282</td>
<td>0.17445</td>
<td>1.621</td>
<td>0.10730</td>
<td>0.89270</td>
</tr>
</tbody>
</table>
<p>Multiple R-squared: 0.9164,     Adjusted R-squared: 0.9132</p>
<p>It looks like OBP and SLG are in fact highly significant, with each sac fly corresponding to about two-thirds of a run scored, a sac bunt corresponding to about .28 runs scored, and a stolen base actually having a negative effect (but it&#8217;s only significant at about the 23% level, so we can&#8217;t be sure it&#8217;s actually different from zero). This model explains about 91% of the variation in run scoring, which is reasonable since it ignores pitching and defense entirely.</p>
<p>This could be tightened up a bit, but as it stands it gives us a reasonable idea of how runs are produced.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/203/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=203&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/19/modeling-run-production/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://tomflesher.files.wordpress.com/2010/06/alruntrend1.jpg?w=300" medium="image">
			<media:title type="html">Alruntrend</media:title>
		</media:content>
	</item>
		<item>
		<title>June 15 Wins Above Expectation</title>
		<link>http://tomflesher.com/2010/06/16/june-15-wins-above-expectation/</link>
		<comments>http://tomflesher.com/2010/06/16/june-15-wins-above-expectation/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 22:17:17 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
				<category><![CDATA[Baseball]]></category>
		<category><![CDATA[Angels]]></category>
		<category><![CDATA[Rays]]></category>
		<category><![CDATA[Tigers]]></category>
		<category><![CDATA[wins above expectation]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=195</guid>
		<description><![CDATA[Wins Above Expectation are a statistic determined using team wins and the Pythagorean expectation, which is in turn determined using runs scored by and against each team. The Pythagorean expectation is the proportion of runs scored squared to runs scored squared plus runs against squared. It&#8217;s interpreted as an expected winning percentage. Wins Above Expectation [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=195&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Wins Above Expectation are a statistic determined using team wins and the Pythagorean expectation, which is in turn determined using runs scored by and against each team. The Pythagorean expectation is the proportion of runs scored squared to runs scored squared plus runs against squared. It&#8217;s interpreted as an expected winning percentage.</p>
<p>Wins Above Expectation (WAE) is then the difference between Wins and Expected Wins, which are simply the Pythagorean Expectation multiplied by Games played. It&#8217;s a useful measure because it can be interpreted as wins that are due to efficiency (in economic terms) or, more simply, play that&#8217;s some combination of smart, clutch, and non-wasteful. It rewards winning close games and penalizes teams that win lots of laughers but lose close games, since the big wins predict more games will be won when all those runs are spent winning only one game.</p>
<p>Using <a href="http://www.baseball-reference.com/">Baseball-Reference.com</a>, I crunched the numbers for AL teams up to June 15. As usual, the Los Angeles Angels of Anaheim lead the league in WAE with 3.68, with Detroit&#8217;s 2.39 a close second,  but the Tampa Bay Rays are a surprising last with -1.96 WAE. Obviously, this early in the season it&#8217;s too soon to conclude anything based on this, but the complete data is behind the cut.<span id="more-195"></span></p>
<table border="0" cellspacing="0" cellpadding="0" width="384">
<col span="6" width="64"></col>
<tbody>
<tr>
<td width="64" height="20">Team</td>
<td width="64">RS</td>
<td width="64">RA</td>
<td width="64">W</td>
<td width="64">pythwin</td>
<td width="64">WAE</td>
</tr>
<tr>
<td height="20">BAL</td>
<td>211</td>
<td>342</td>
<td>18</td>
<td>17.92</td>
<td>0.08</td>
</tr>
<tr>
<td height="20">BOS</td>
<td>359</td>
<td>308</td>
<td>38</td>
<td>38.02</td>
<td>-0.02</td>
</tr>
<tr>
<td height="20">CHW</td>
<td>273</td>
<td>300</td>
<td>29</td>
<td>28.54</td>
<td>0.46</td>
</tr>
<tr>
<td height="20">CLE</td>
<td>268</td>
<td>318</td>
<td>25</td>
<td>26.16</td>
<td>-1.16</td>
</tr>
<tr>
<td height="20">DET</td>
<td>278</td>
<td>277</td>
<td>34</td>
<td>31.61</td>
<td>2.39</td>
</tr>
<tr>
<td height="20">KCR</td>
<td>300</td>
<td>334</td>
<td>28</td>
<td>29.02</td>
<td>-1.02</td>
</tr>
<tr>
<td height="20">LAA</td>
<td>316</td>
<td>332</td>
<td>36</td>
<td>32.32</td>
<td>3.68</td>
</tr>
<tr>
<td height="20">MIN</td>
<td>304</td>
<td>248</td>
<td>37</td>
<td>38.43</td>
<td>-1.43</td>
</tr>
<tr>
<td height="20">NYY</td>
<td>363</td>
<td>255</td>
<td>41</td>
<td>42.85</td>
<td>-1.85</td>
</tr>
<tr>
<td height="20">OAK</td>
<td>270</td>
<td>283</td>
<td>33</td>
<td>31.45</td>
<td>1.55</td>
</tr>
<tr>
<td height="20">SEA</td>
<td>226</td>
<td>300</td>
<td>24</td>
<td>23.53</td>
<td>0.47</td>
</tr>
<tr>
<td height="20">TBR</td>
<td>343</td>
<td>240</td>
<td>41</td>
<td>42.96</td>
<td>-1.96</td>
</tr>
<tr>
<td height="20">TEX</td>
<td>324</td>
<td>284</td>
<td>36</td>
<td>36.19</td>
<td>-0.19</td>
</tr>
<tr>
<td height="20">TOR</td>
<td>313</td>
<td>294</td>
<td>35</td>
<td>35.06</td>
<td>-0.06</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/tomflesher.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/tomflesher.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/tomflesher.wordpress.com/195/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=tomflesher.com&amp;blog=20518139&amp;post=195&amp;subd=tomflesher&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/06/16/june-15-wins-above-expectation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4cc81c8ef60cdc1c146147aed58a6174?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>
	</item>
	</channel>
</rss>
