|
|||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| General Discussions Discuss Out of the Park Developments' games, web site, downloads, research and anything else you like. |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|
#1 (permalink) |
|
Bat Boy
Join Date: Jul 2008
Posts: 12
|
need some help with sabermetrics/modeling
Hey all, I have an interesting delimma I am trying to work out, and wanted to see if anyone on these boards had any feedback that might be helpful.
I am working on a system to model baseball. This is NOT going to be any type of competition for OOTP. I am working on a very basic computer aid to help with the old Pursue the Pennant card and dice game. What I am specifically working on, is a means to generate new PTP cards. unlike the real PTP though, I do not need to recreate specific statistical seasons for a player. These are fictional players. What I want to do to keep it simple is just rate hitters as either Excellent, Very Good, Average, Fair, or Poor in a few different categories (power - HR, contact - total hits, patience - drawing walks, and eye - avoiding K's). I will then use those very simplified ratings to churn out a random PTP card for this player. The randomization is there to add variation, while still insuring that the cards are consistent with the guy's abilities. Here's where my delimma is: I am trying to determine if there is some reasonable (and fairly accurate) way to determine how many doubles and triples a guy should expect to hit based on those basic ratings, or if i just need to add new ratings that measure a guy's ability to generate those types of hits. If I can show/see statistically that doubles (for example) are somehow directly related to a guy's batting average and homeruns, then I can just use a combination of his power and contact ratings to generate doubles. I have tried to look through historical baseball stats, but so far i am coming up empty. I'm not sure if this is just due to my own limited knowledge fo statistics mining though. I do notice power hitters like Mark McGwire that hit 29 HR's in 2001, but only 4 doubles. I also notice guys like Lance Berkman that hit 39 HR's, and 55 doubles. Then there are guys like Jeff Cirillo who usually hit 12-15 HR's, but hit 35-45 doubles. Even speed seems like a dubious indicator, as players like Vince Coleman (an obvious burner) routinely hit very few doubles (lots of triples though). Any thoughts on this? Should the abiltiy to hit doubles and triples just be broken off into seperate abilities, or is there some way to reasonably indicate those hits based on power/speed/hitting? Markus, I'd love to hear your thoughts on this as well. Thanks. Last edited by HolyRomanEmperor : 07-24-2008 at 12:02 PM. |
|
|
|
|
|
#2 (permalink) |
|
Hall Of Famer
Join Date: Jan 2002
Location: Orlando, FL
Posts: 3,827
|
I would think that line drive hitters are more likely to hit doubles and triples than fly ball or slap hitters. You could assign a line drive ability, and then the player's speed and/or the park dimensions would determine whether the hit is a double or triple. McGwire hit just four doubles because he hit most everything high in the air (fly outs, pop outs, or homeruns).
__________________
"Read books, get brain." |
|
|
|
|
|
#3 (permalink) |
|
Banned
Join Date: May 2004
Posts: 3,113
|
Also because he was really, really slow.
If I were doing this (and I did something similar when I wrote a random boxer generator for Title Bout), I would get something like the Lahman database/spreadsheet, choose a given range of years, toss out every player with less than, say, 150 at-bats, reduce every player-season to a bunch of ratios, and then work out correlations between everything. Pretty soon (meaning, not very soon at all) you'll have a bunch of groups you can bundle the different stats into, and then you can just make stuff up from there (for example, if you're basing Power on HRs/PA and Contact on singles/PA and you find out that singles/PA has a .30 correlation with doubles/PA and HRs/PA has a .15 correlation (note: numbers are completely made up) you could have Contact weigh twice as heavily on doubles creation than Power). I also like working linear regressions into these equations just to make them complicated, but I like making my stuff way too complex compared to what it could be (which made me a *terrible* songwriter back when I was trying to do that in college). |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|