Josh Gibson, Negro National League, Cuban Winter League discussed on Effectively Wild: A FanGraphs Baseball Podcast
Hits dot com is amazing. And without it, there's none of this is really possible. And the reason it's not possible is that until their work went up, we didn't have a lot of information about league totals. It's very difficult to find anything that says that, for example, the 300 bases in 1926, you just can't find that stuff. But you need that stuff in order to compare players against their own league so that you can then translate them into a major league. And so once that stuff began appearing, those lead totals began appearing at seam heads, then we had the ingredients where we could really start. We could really start cooking. And so that's huge. It's huge. And there's holes in the data. There are stolen bases for every season. There isn't hit by pitch for every season. Not every box score has been uncovered, and obviously that's an ongoing an ongoing task that the guys at seamen are still working on. I think that in terms of missing data, there are individual players. We just don't have anything because they either went off to some semi pro league. There's a guy named heavy Johnson who was a tremendous hitter and unfortunately he played like 6 or 7 years in the Negro national league in the 20s, starting around age 26 or 27. The preceding 6 years he was in the 25th infantry records, which was an army division that just basically played baseball. So we don't have any stats for that. And we don't have any stats for 1929 and 1930 because he went off to the northwest to play semi pro ball. And so there's people like that where we're just never going to get a lot of the information. But the information that could come through that I'm pretty sure that seems geyser are working on, is a full accounting of the Cuban winter league. Right now there's about 20 seasons in their database, and getting a fuller county of that will really help. Because it's just more it's more plate appearances. It's a bigger sample, and the bigger the samples get, the more confidence we can have. And the same thing is true for the Puerto Rican winter Lake. There's currently no information on that available that's usable, and if that comes online, that will provide some really good really beef up the sample for a lot of latter day players and will give us insight into players like perugia cepeda, the bull Orlando cicadas father who currently we have zero stats on. But this guy's a legend and I'd love to know more about him. So there's things like that that are likely on the way someday, Gary and Kevin have mentioned to me that that's a goal of theirs, and really anything we can do to increase what we know and what's documented is going to be helpful. And if you go to seam heads, or you go to baseball reference, then you see our best accounting available now of the official league games that those players played. Of course, they played in many exhibition and barnstorming contest too. And those schedules, the official league schedules were considerably shorter than what people are used to with AL and NL schedules of today or of that time. And of course, you want to preserve that difference because you want to remember the reason why these players were playing in a different league why they had these shorter schedule so you don't kind of want to pretend that this is some alternate happy history where there was no color barrier, but I think one of the dangers of presenting these stats the way they are without anything else, which I think it's wonderful to have them available and I hope that it does lead to more people discovering these players and their names and their accomplishments. But you could inadvertently lead to some of these players being underrated because people might look at their counting stats or their war or whatever, and we'll see lower totals than they expect to or that the greats of the AL and NL of the time. And so one thing that I think your work allows us to do and as Adam to do with the hall of stats is to present these things on somewhat of a comparable playing field here in terms of totals in terms of playing time and they may still be conservative as Adam explains here. But they at least look a little more like you would expect these stats to look like the career wars to look. So I don't know if you want to run through an example or two, say take a Josh Gibson or a satchel page or any player you want to pick and explain what their career stats look like once you run them through this MLE process and how that could potentially change your understanding of how these players stacked up. Yeah, so take Josh Gibson, for example. So our information on Josh Gibson is, we have roughly a little over 3500 plate appearances for him, and in steam heads with 230 some home runs, and a tremendous batting average, and all of that. And when we run him through the process, what we get is a guy who has about 9000 plate appearances. And 83 war, and that's a damn good player. And we're talking we're talking about 500 and almost 600 career batting runs. And man, that's not chopped liver, you know? That's high end stuff. And what that translates to in terms of traditional stats is 9 17 OPS, one 60 OPS plus 435 home runs, and he's just, you know, he's a monster. He's a great hitter. And it's true that slightly conservative by nature, because we have to use a lot of measures of central tendency. It's because we don't have all the data. And we have to, in some cases, in a season when somebody has fewer than 200 plate appearances, I try to beef the sample up by using surrounding seasons or using their career averages to increase the sample size so that we're not so that we're not giving a hundred home runs to a guy who has ten plate appearances and hits two or three home runs. So it is a little conservative. So could Josh Gibson have hit 500 home runs in those 9000 plate appearances? Absolutely. Absolutely. The fact that I've got him down for 435 simply means that that's what my math is saying right now. But as more data comes through, we could see a whole different look. We could see totals shooting up. We just don't know. And we're not going to know until the data does come through. I think that it's important that what you said is really important that seeing numbers that are familiar looking, putting this in a familiar context brings Josh Gibson to life in a different way. When I can look and say, she was. You know, the only people who had hit 500 home runs by the time, by the time that Gibson retired were Babe Ruth mallard and Jimmy Fox, and he's at four 35. That's one heck of a slugger. It puts some frame of reference around Josh Gibson. When we say things like, well, he hit 800 home runs against all competition. Well, Babe Ruth hit a thousand home runs against all competition in all likelihood. I don't know the exact count. So how does that all fit together? And it's hard to say. But when we have a, when we have a solidly internally consistently derived figure, at least we can say, hey, we estimate that he's around 450 home runs. Dang good player. That connects me from the legend of Josh Gibson to what kind of player he really was. And I happen to have Gibson open because we were talking about, give me a second to pull up satchel..