Home Run Derby: Does it ruin swings? December 15, 2010
Posted by tomflesher in Baseball, Economics.Tags: Baseball, baseball-reference.com, Chris Young, Corey Hart, David Ortiz, Hanley Ramirez, home run derby, home runs, Matt Holliday, Miguel Cabrera, Nick Swisher, Vernon Wells
add a comment
Earlier this year, there was a lot of discussion about the alleged home run derby curse. This post by Andy on Baseball-Reference.com asked if the Home Run Derby is bad for baseball, and this Hardball Times piece agrees with him that it is not. The standard explanation involves selection bias – sure, players tend to hit fewer home runs in the second half after they hit in the Derby, but that’s because the people who hit in the Derby get invited to do so because they had an abnormally high number of home runs in the first half.
Though this deserves a much more thorough macro-level treatment, let’s just take a look at the density of home runs in either half of the season for each player who participated in the Home Run Derby. Those players include David Ortiz, Hanley Ramirez, Chris Young, Nick Swisher, Corey Hart, Miguel Cabrera, Matt Holliday, and Vernon Wells.
For each player, plus Robinson Cano (who was of interest to Andy in the Baseball-Reference.com post), I took the percentage of games before the Derby and compared it with the percentage of home runs before the Derby. If the Ruined Swing theory holds, then we’d expect
The table below shows that in almost every case, including Cano (who did not participate), the density of home runs in the pre-Derby games was much higher than the post-Derby games.
| Player | HR Before | HR Total | g(Games) | g(HR) | Diff |
| Ortiz | 18 | 32 | 0.54321 | 0.5625 | 0.01929 |
| Hanley | 13 | 21 | 0.54321 | 0.619048 | 0.075838 |
| Swisher | 15 | 29 | 0.537037 | 0.517241 | -0.0198 |
| Wells | 19 | 31 | 0.549383 | 0.612903 | 0.063521 |
| Holliday | 16 | 28 | 0.54321 | 0.571429 | 0.028219 |
| Hart | 21 | 31 | 0.549383 | 0.677419 | 0.128037 |
| Cabrera | 22 | 38 | 0.530864 | 0.578947 | 0.048083 |
| Young | 15 | 27 | 0.549383 | 0.555556 | 0.006173 |
| Cano | 16 | 29 | 0.537037 | 0.551724 | 0.014687 |
Is this evidence that the Derby causes home run percentages to drop off? Certainly not. There are some caveats:
- This should be normalized based on games the player played, instead of team games.
- It would probably even be better to look at a home run per plate appearance rate instead.
- It could stand to be corrected for deviation from the mean to explain selection bias.
- Cano’s numbers are almost identical to Swisher’s. They play for the same team. If there was an effect to be seen, it would probably show up here, and it doesn’t.
Once finals are up, I’ll dig into this a little more deeply.
In Memoriam November 11, 2010
Posted by tomflesher in Baseball.add a comment
In Flanders Fields the poppies blow
Between the crosses row on row,
That mark our place; and in the sky
The larks, still bravely singing, fly
Scarce heard amid the guns below.
We are the Dead. Short days ago
We lived, felt dawn, saw sunset glow,
Loved and were loved, and now we lie
In Flanders fields.
Take up our quarrel with the foe:
To you from failing hands we throw
The torch; be yours to hold it high.
If ye break faith with us who die
We shall not sleep, though poppies grow
In Flanders fields.
Fire Up The Hot Stove November 2, 2010
Posted by tomflesher in Baseball.Tags: Aubrey Huff, Buster Posey, Cliff Lee, Giants, Rangers, Tim Lincecum, Yankees
add a comment
Although I’m usually fairly heavy on the statistical content, I can’t help but mention a few impressions from Game 5 of the World Series last night.
- If I didn’t have Baseball-Reference.com to tell me different, I’d have assumed Aubrey Huff wasn’t an everyday first baseman from the way he played last night. He was competent and made some nice picks, but he didn’t seem to have the ankle-preservation instinct that most everyday 1Bs do. He seemed to have his heels back quite far on the bag most of the time.
- The rumors about the Yankees pursuing Cliff Lee strike me as cartoonish supervillainy. “If I cannot defeat you, I will simply BUY you!”
- Game 3 was the Lee vs. Tim Lincecum gem that we all assumed Game 1 would be.
- Somewhere, Bengie Molina is secretly pouring champagne all over himself.
- If the postseason came before voting, Buster Posey would be a lock for Rookie of the
Quickie: Ryan Howard's Choke Index October 25, 2010
Posted by tomflesher in Baseball.Tags: baseball-reference.com, binomial distribution, Choke Index, Phillies, Ryan Howard, statistics
1 comment so far
The Choke Index is alive and well.
Previous to 2010, Ryan Howard of the Philadelphia Phillies hit home runs in three consecutive postseasons. He managed 7 in his 140 plate appearances, averaging out to .05 home runs per plate appearance. Not too shabby. It’s a bit below his regular season rate of about .067, but there are a bunch of things that could account for that.
This year, Ryan made 38 plate appearances and hit a grand total of 0 home runs in the postseason. What’s the likelihood of that happening? I use the Choke Index (one minus the probability of hitting 0 home runs in a given number of plate appearances) to measure that. As always, the closer a player gets to 1, the more unlikely his homer-free streak is.
The binomial probability can be calculated using the formula
Or, since we’re looking for the probability of an event NOT occurring,
or
using his career postseason numbers. That means that Ryan Howard’s 2010 postseason Choke Index is .858. Pretty impressive!
Burnett, Hughes, and Playoff Rotations October 12, 2010
Posted by tomflesher in Baseball.Tags: A.J. Burnett, ALCS, ALDS, Andy Pettitte, Baseball, CC Sabathia, Dustin Moseley, Javier Vazquez, Joe Girardi, Phil Hughes, playoffs, rotations, world series
add a comment
There was much discussion of the Yankees’ specialized playoff rotation for the American League Division Series. As is conventional in the ALDS, Joe Girardi went with a three-man rotation. CC Sabathia and Andy Pettitte were locks; the third starter could have been A.J. Burnett, Javier Vazquez, or Dustin Moseley. Girardi went with young All-Star Phil Hughes in the third slot. That, of course, led to a sweep of the Minnestoa Twins to advance to the American League Championship Series.
First of all, I think it was probably the right decision. Hughes pitched 176 1/3 innings and gave up 82 earned runs, for an ER/IP of about .47. In Burnett’s 186 2/3 innings, he allowed 109 runs for an ER/IP of about .58. Surprisingly, Burnett allowed 9 unearned runs for a rate of about .048 unearned runs per inning pitched, whereas Hughes had only one unearned run for a rate of about .006, but of course those numbers probably don’t say anything significant. With 730 batters faced, he allowed about .11 earned runs per batter, or about 1 earned run every 9 batters faced, while Burnett’s 829 batters faced mean he had similar numbers of .13 earned runs per batter and 7.69 batters.
Most importantly to me, Hughes was much more predictable. Burnett faced, on average, 4.68 batters per inning pitched, with a variance of .92. Hughes faced over half a batter less per inning – 4.13 – and had a variance of .33. That means that not only did Burnett allow more baserunners, but when he was off, he was very off. Although the decision gets tougher when you have a higher BF/IP and a lower variance, Hughes was both better and more consistent in a similar number of innings, so he has to get the nod.
(That said, it’s shocking that such similar numbers produced one 18-8 pitcher and one 10-15 pitcher.)
The only question now is what order to pitch the announced four-man rotation for the ALCS. Of the choices,
OPTION 3
Sabathia
Hughes
Pettitte
Burnett
Sabathia
Hughes
Pettitte
seems clearly superior to me. It allows Burnett to start but avoids starting him twice, gets Hughes in play quite often, and puts the very reliable Andy Pettitte in play for a potential Game Seven. The linked article lists as a con that Pettitte is considered the number 2 starter, but at the Major League level a manager can’t be concerned with such frivolities. Besides, Pettitte is an established company man. I’d be surprised if he balked at a rotation that both maximized the team’s chances to win and put him in position to be the clutch hero.
Incidentally, this option lends itself to using the same rotation in the World Series. Option 2:
Sabathia
Pettitte
Hughes
Sabathia
Burnett
Pettitte
Sabathia
leaves Sabathia unavailable to start Game 1 of the World Series and might put Pettitte on short rest depending on the schedule to start Game 1. I can’t see starting the Series with Hughes or Burnett.
Jim Thome's Marginal Value October 5, 2010
Posted by tomflesher in Baseball, Economics.Tags: Jim Thome, Manny Ramirez, White Sox
add a comment
I’ve alluded to the similarity between Manny Ramirez and Jim Thome quite a bit. They both played in Cleveland for a few years before moving on to other teams. They’re each in the DH phase of their careers. Thome is about two years older than Ramirez, but otherwise they’ve had relatively similar production. That’s why it was so odd for the White Sox to let Thome go a few years back only to pick an injured, probably going-downhill Manny for about a quarter of the season when Ramirez is making about $18 million and Thome’s maximum salary was about $15.7 million. There’s an argument that Manny still has more productive years left than Thome, of course. (I happen to think that argument is wrong, but that’s just me.)
Just for fun, let’s take a look at their production since Manny’s trade.
In the last 24 games he played, Ramirez had 88 plate appearances, a respectable .420 OBP, and a Jeteresque .261 batting average. His win probability added was -.273, for those of you who are into that sort of thing. Meanwhile, over the same number of games, the flagging, decrepit Thome had only 79 plate appearances, with a paltry .333 batting average, and only a .494 OBP.
Thome’s salary this year for the Twins was $1.5 million.
I think the winner here is clear.
Mariano's Walk-Off Beanball September 12, 2010
Posted by tomflesher in Baseball.Tags: As, David Robertson, Derek Jeter, hit batsman, hit by pitch, Jeff Francoeur, Jose Molina, Lenny DiNardo, Mariano Rivera, Nelson Cruz, odds, probability, Rangers, Yankees
add a comment
Mariano Rivera did something strange tonight: He plunked in the winning run. He hit Jeff Francoeur of the Texas Rangers to force in Nelson Cruz for the winning run in extra innings. It was his fourth hit batsman of the year and only his third loss.
A walk-off beaning requires an extraordinary set of circumstances. First of all, like all walk-off plays, it requires the home team to be at bat in the bottom of the inning. In this case, it was in extra innings rather than the bottom of the 9th. It additionally requires a tied game in the bottom of said inning. Finally, it requires the bases to be loaded when the plunking occurs.
This is all magnified by the face that Rivera does not ordinarily load the bases. Assuming his 2010 OBP against (.214) held, the probability the bases being loaded with two outs or fewer is:
Then, if that situation occurs, we still have to deal with the unlikely event of Mariano hitting a player with a pitch. Before this evening, Mo had hit three batters in 196 plate appearances, for a rate of about .0153. Thus, the probability of Mariano Rivera hitting a batter with a pitch after having loaded the bases is
That means that in 10,000 innings, we would expect that to occur about 4 times, assuming that Mariano wasn’t removed after having walked the bases (which would obviously introduce some bias).
Oddly, the last walk-off hit by pitch also involved the Yankees, albeit on the other side, way back on July 19 of 2008. That night, the A’s’ Lenny DiNardo hit Jose Molina with a pitch to force in Derek Jeter, again in extra innings. David Robertson grabbed the win that night.
Teixeira and Cano: Picking up slack? August 5, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, binomial distribution, Mark Teixeira, probability, Robinson Cano, statistics, Yankees
add a comment
Michael Kaye, the YES broadcaster for the Yankees, often pointed out between July 22 and August 4 that the Yankees were turning up their offense to make up for Alex Rodriguez‘s lack of home run production. That seems like it might be subject to significant confirmation bias – seeing a few guys hit home runs when you wouldn’t expect them to might lead you to believe that the team in general has increased its production. So, did the Yankees produce more home runs during A-Rod’s drought?
During the first 93 games of the season, the Yankees hit 109 home runs in 3660 plate appearances for rates of 1.17 home runs per game and .0298 home runs per plate appearance. From July 23 to August 3, they hit 17 home runs in 451 plate appearances over 12 games for rates of 1.42 home runs per game and .0377 home runs per plate appearances. Obviously those numbers are quite a bit higher than expected, but can it be due simply to chance?
Assume for the moment that the first 93 games represent the team’s true production capabilities. Then, using the binomial distribution, the likelihood of hitting at least 17 home runs in 451 plate appearances is
The cumulative probability is about .868, meaning the probability of hitting 17 or fewer home runs is .868 and the probability of hitting more than that is about .132. The probability of hitting 16 or fewer is .805, which means out of 100 strings of 451 plate appearances about 81 of them should end with 16 or fewer plate appearances. This is a perfectly reasonable number and not inherently indicative of a special performance by A-Rod’s teammates.
Kaye frequently cited Mark Teixeira and Robinson Cano as upping their games. Teixeira hit 18 home runs over the first 93 games and made 423 plate appearances for rates of .194 home runs per game and .0426 home runs per plate appearance. From July 23 to August 3, he had 5 home runs in 12 games and 54 plate appearances for rates of .417 per game and .0926. That rate of home runs per plate appearance is about 8% likely, meaning that either Teixeira did up his game considerably or he was exceptionally lucky.
Cano played 92 games up to July 21, hitting 18 home runs in 400 plate appearances for rates of .196 home runs per game and .045 per plate appearance. During A-Rod’s drought, he hit 3 home runs in 50 plate appearances over 12 games for rates of .25 and .06. That per-plate-appearance rate is about 39% likely, which means we don’t have enough evidence to reject the idea that Cano’s performance (though better than usual) is just a random fluctuation.
It will be interesting to see if Teixeira slows down as a home-run hitter now that Rodriguez’s drought is over.
Quickie: 600th Home Run for A-Rod August 4, 2010
Posted by tomflesher in Baseball.Tags: 599 home runs, 600 home runs, A-Rod, Alex Rodriguez, Choke Index
add a comment
Alex Rodriguez finally hit #600 deep to center field in Yankee Stadium on the third anniversary of his 500th home run. A-Rod hit the home run in his first plate appearance. There were 51 plate appearances since #599. He had a final Choke Index of .944, but luckily he won’t run into another milestone home run for at least a few years.
The ball landed in Monument Park, so the Yankees didn’t need to negotiate with a fan to get it back. (A security guard picked it up.) According to Michael Kaye, if the ball had landed in the stands, the Yankees would have been willing to pay for the person who caught the ball to have lunch with Alex Rodriguez and Cameron Diaz in exchange for getting the ball back, on top of an autographed baseball, hat, and bat. That opens interesting questions of valuation, much like those that came up after Doug Mientkiewicz attempted to keep the ball that he caught to make the final out in the 2004 World Series.
Is A-Rod's Performance Different? August 3, 2010
Posted by tomflesher in Baseball, Economics.Tags: A-Rod, Alex Rodriguez, Choke Index, OBP, p-value, probability, SLG, statistics, t-value, Yankees
add a comment
In games between milestone home runs, is Alex Rodriguez’ hitting similar to other times? (This is all a very polite way of asking, “Does A-Rod choke?”) It’s difficult to answer, because there’s so little data about those milestone home runs. A-Rod, though, has some statistically improbable results and it would be interesting to look at it a bit more closely.
Over 2008-2009, Alex played in 262 games and had 1129 plate appearances with 281 hits, 65 home runs, a triple:double ratio of 1:50, an OBP of .397, and a SLG of .553. His OBP has a margin of error of .0146, so we can be 95% confident that over those years his baseline production would be somewhere between .368 and .426 and absent any time or age effect that is the range in which A-Rod should produce for any given period.
Two recent milestone home runs come to mind as examples of Rodriguez’s reputed choking. First, the stretch between home run #499 and #500 was 8 games and 36 plate appearances. (I’m intentionally ignoring extra plate appearances on the days he hit #499 and #500.) During that time, Alex had an OBP of only .306. That’s a difference of .091 over 36 plate appearances and that performance has a standard error of about .078 when compared with his regular performance, implying a t-value of about 1.16. With 35 degrees of freedom, Texas A&M’s t Calculator gives a p-value of about .127, so this difference is marginally within the realm of chance. (The usual cutoff for significance would be .05.)
A-Rod hit his last home run on July 22. Discounting the plate appearances after his last home run, he’s played in 11 games with a paltry .255 OBP and .238 SLG over 47 plate appearances. His .255 OBP has a difference of about .142 and a standard error of about .064. That implies a t-value of about 2.21, with a p-value of about .016. That is, the probability of this difference occurring by chance is less than 2%. That gives us one result as close to significant and one as probably significant.
As a side note, A-Rod’s Choke Index continues to rise. He’s gone 48 plate appearances without a home run, and at a rate of .055 home runs per plate appearance the probability of that occurring by chance is about .066. That leaves his Choke Index at .934.