The Math

If math gives you a headache, click here to skip to the results:

The processes used for calculating the likelihood of a 56 game streak in a 139 game season, given Joe DiMaggio's actual batting characteristics that year, are simple to understand, if a bit complicated. They involve nothing more advanced than the math you learned in high school.

Step 1: The likelihood that one game will include a hit.

Before being able to calculate the likelihood of whether a string of games includes a hit in each one, it is necessary to know the likelihood that any one given game includes a hit.

Before getting into the specific DiMaggio statistics, it may be a worthwhile memory or learning exercise to review this process with simple numbers. Suppose that a player gets a hit in exactly one third of his plate appearances, and he gets four plate appearances per game. How likely is he to hit in any one game, any two consecutive games, any three, or any 56?

His chances of failure in any one plate appearance are 2/3, two out of three. Therefore, his changes of failing four times in a row (a hitless game), will be two-thirds to the fourth power, or 16 times out of every 81 times. As it happens, 81 is exactly half of 162, and 162 is the length of a baseball season, so our hypothetical player is likely to fail to get a hit in 32 games in his hypothetical year.

His likely statistical distribution of hits per game is as follows:

  fractional likelihood approximate frequency # games # hits # plate appearances
4 hit games 1/81 1% 2 8 8
3 hit games 8/81 10% 16 48 64
2 hit games 24/81 30% 48 96 192
1 hit games 32/81 39% 64 64 256
Hitless games 16/81 20% 32 0 128
TOTAL 81/81 100% 162 216 648

This hypothetical player is a great  star. Napoleon Lajoie actually had a hitting streak in a season quite similar to this, in 1906. The schedule was shorter then, so he played in only 152 games rather than 162.  Lajoie came to the plate 4.16 times per game, and made a hit in 33.9% of his plate appearances. He amassed a 31 game hitting streak that year.

For a normal level of player, one who only converts one of every four plate appearances into a hit, the likelihood of a hitless game is about 32%. For a player at the lower end of the spectrum, one who converts only one plate appearance out of five, the likelihood increases to 48%

For the purposes of a discussion on streaks, there is no distinction between games with four hits or games with one hit. They all count the same toward a streak. Similarly, a game with three homers counts the same as a game with an infield single. Slugging is as unimportant as multiple hit games. Nor is "on base percentage" relevant. In fact, a walk is equal to an out when calculating the likelihood of a hit streak. Therefore, the key statistic is the likelihood to achieve one or more hits in a game. For the player who hits once per three plate appearances, that likelihood is about 80%, which consists of all games except the 20% when he fails to hit. The one-for-four guy will get a hit in about 68% of his games. The one-for-five player will get a hit in only about 52% of his games.

In 1941, Joe DiMaggio came to the plate 4.44 times per game, and achieved a hit in 31.3% of his plate appearances. Since he succeeded in getting a hit in 31.3% of his plate appearances, he failed in 68.7%. His likelihood to go hitless in any given game was therefore .687 (4.44), or 18.9%. Looking at the full side of the glass rather than the empty side, he was likely to get a hit in 81.1% of his games.

continue

 
Sidebar:

You may be asking yourself, "Why use plate appearances instead of at bats? After all, the math would eventually work out exactly the same, and I am used to thinking of every famous season in baseball history in terms of 'at bats divided by hits', not 'at bats divided by plate appearances'? I think of DiMaggio as a .357 hitter that year, not a .313 hitter.

That is a fair question. Baseball has enough familiar statistics that we should not introduce unfamiliar ones unless they are absolutely necessary to our understanding of a problem, so I owe you an explanation. Plate appearances are the necessary denominator for two major reasons.

Reason 1.

A player with a higher "hits per plate appearance" ratio (hereinafter called HPA) is more likely to amass a hitting streak than a player with a lower one, given everything else equal.

This same axiom is not true when applied to batting averages. Some players with high batting averages are not at all likely to produce hitting streaks. Ted Williams was a lifetime .344 hitter, but he had only a 27% likelihood to convert a plate appearance into a hit because 21% of those plate appearances ended up as walks. For the purposes of a hit streak, a walk is the same as an out.

Consider these three seasons:

  Batting average HPA
Barry Bonds, 2002 .370 .246
Ted Williams,1947 .343 .262
Benito Santiago, 1987 .300 .292

You think of Williams and Bonds as great hitters, and rightly so. If you think of Santiago at all, you think of him as a mediocre stick man, yet in 1987 he was more likely to turn a plate appearance into a hit than either of these these two great seasons of two of the true titans of the game. In fact, I should not say, "more likely". I should say "far more likely". With a HPA of .292 rather than .246, Santiago was 18 times more likely to amass a thirty game hitting streak than "2002 Bonds" would have been, batting in his stead.

No player has ever accumulated a 30 game hitting streak in a season with an HPA lower than .260. And the one guy who made the list with a mere .265 was able to do so because he averaged an incredible 4.83 plate appearances per game by batting leadoff for the high scoring Red Sox teams in the late 40s and early 50s. (It was DiMaggio's brother, Dom)

That tells you why Bonds and Williams are not hit streak types, despite fat batting averages, and why Santiago was a hit streak kind of guy. Santiago actually hit in 34 consecutive games that year.

Reason 2.

Plate appearances can easily be adjusted up and down in a single, simple calculation useful for "what if" simulations.

For example, we know that any player can increase his plate appearances by moving up higher in the line-up. The exact statistical likelihood is .111 plate appearances per line-up spot. On the average, a #1 hitter will have .888 plate appearances per game more than a #9 hitter, for example.

Additional plate appearances are critical to maintaining a hit streak, as shown in the Dom DiMaggio example.

This allows us to compose many hypothetical scenarios simply. We know that the average line-up position for the Boston Red Sox in 2003 had 4.41 plate appearances per game. This can be derived by dividing the team's total plate appearances by 1458 (162 games, 9 players). Once we know this, we don't even need a pencil and paper to figure out how many chances each line-up position will have. #5 will be 4.41, therefore, the lead-off guy will get 4.85 chances, the ninth hitter will get 3.97.

Therefore, it is simple to plug in numbers for "what-ifs". How would Pujols's chance for a 56 game hit streak be affected if he hit exactly as he did last year, but batted lead off for the Red Sox? What if he batted ninth for the Mets? What if he simply moved up a few spots to hit lead-off for the Cardinals? How much more likely is a hit streak if he bats third rather than fourth?

These adjustments are not simple when at bats are the denominator. If I want to see Williams's chances to hit in 30 in a row batting lead-off instead of third, I have to perform several calculations involving his walks before I can figure out how many additional at bats he will obtain. With plate appearances, I just add .222 per game. Done.

Continue