Who's up for a challenge to win a billion?

fl0at_ · Jan 29, 2014

kidbourbon said: ↑

Nothing shocking in there. Favorites win a lot, but not always.
Click to expand...

Yea. Like I said. Not groundbreaking.

For whatever reason, 9s win at almost a 70% rate in the East. But not sure if statistically significant.

kidbourbon · Jan 29, 2014

I'd be interested in looking at the Vegas future odds going into the tournament vs. results. It would take some time to do, though. From the future odds, you could assign not just a ranking to each team but a score, which is more useful than ranking because it indicates not just that #1 is better than #2, but also how much better.

With this data, you could:
(1) assign a likelihood of winning for each game, and also for each team advancing through each round.
(2) Compare that with actual results.
(3)???
(4) Profit?

I know there are computer rankings that provide better predictive value than the seedings (it's well documented). And I'm confident that Vegas future odds would provide better predictive value than the seedings. But I wonder how Vegas futures stack up against some of the best algorithmic ranking systems (kenpom, LRMC, Sagarin, etc.)?

But I'm not doing the research, so I'll have to keep wondering.

kidbourbon · Jan 29, 2014

fl0at_ said: ↑

Yea. Like I said. Not groundbreaking.

For whatever reason, 9s win at almost a 70% rate in the East. But not sure if statistically significant.
Click to expand...

I'm confident a larger sample size would smooth out the 9 seed thing and also the 5 and 6 seed percentages.

kidbourbon · Jan 29, 2014

fl0at_ said: ↑

Yea, I was incorrect.

And by number, do you mean 13 or 17? Or numbers, as in plural?
Click to expand...

Numbers

fl0at_ · Jan 29, 2014

kidbourbon said: ↑

I'm confident a larger sample size would smooth out the 9 seed thing and also the 5 and 6 seed percentages.
Click to expand...

I agree. Wonder if it is even worth looking at all the previous years, just for shits and giggles.

But the 8 and 9 seed should be almost 50/50 anyway, so an overall 54/46 difference doesn't seem that significant to me.

fl0at_ · Jan 29, 2014

kidbourbon said: ↑

I'd be interested in looking at the Vegas future odds going into the tournament vs. results. It would take some time to do, though. From the future odds, you could assign not just a ranking to each team but a score, which is more useful than ranking because it indicates not just that #1 is better than #2, but also how much better.

With this data, you could:
(1) assign a likelihood of winning for each game, and also for each team advancing through each round.
(2) Compare that with actual results.
(3)???
(4) Profit?

I know there are computer rankings that provide better predictive value than the seedings (it's well documented). And I'm confident that Vegas future odds would provide better predictive value than the seedings. But I wonder how Vegas futures stack up against some of the best algorithmic ranking systems (kenpom, LRMC, Sagarin, etc.)?

But I'm not doing the research, so I'll have to keep wondering.
Click to expand...

Drop me a link to the future odds, and if I can find a way to easily extract whatever numbers, we can throw them in a spreadsheet and see.

dknash · Jan 29, 2014

It seems like 5 seeds are generally flawed major conference squads and 12 seeds are generally sound mid-majors. Maybe less so with the expansion of the field. I don't think it's a total fluke.

kidbourbon · Jan 30, 2014

fl0at_ said: ↑

Drop me a link to the future odds, and if I can find a way to easily extract whatever numbers, we can throw them in a spreadsheet and see.
Click to expand...

I'm gonna have to keep looking. This site here is the best for archived point spreads: http://www.goldsheet.com/gs_new/histcbb.php

But I'm not finding tournament odds.

kidbourbon · Jan 30, 2014

Boom!

Found it. This site is gold. http://www.sportsoddshistory.com/aa_php/main.php?y=2009-2010&s=cbb&a=nc&o=r

You can sort by column, so you'd wanna sort by the "odds prior to round 1" column.

It only has the past four years, but it's a start.

To convert moneyline odds to implied winning percentage, do: 100/(ML + 100). So let's say Syracuse was +800. 100/(800 + 100) = .11 = 11%.

But what you'll find is that when you add up all those percentages, it will be well above 100%. This is because of the juice that's built into the listed odds. There's probably an easy way to strip out the juice so that the numbers add up to give you 100%.

In fact, I think you would just do this: X*(sum of all the percentages (listed of course as decimals)) = 1.
Solve for X.
Multiply each decimal by X.

I think that would do the trick. The result is "true odds"...or whatever the name is for odds that actually represent likelihood of an outcome without the addition of juice.

kidbourbon · Jan 30, 2014

dknash said: ↑

It seems like 5 seeds are generally flawed major conference squads and 12 seeds are generally sound mid-majors. Maybe less so with the expansion of the field. I don't think it's a total fluke.
Click to expand...

Eh, it's probably a fluke. I doubt the committee has a label next to the five seed slot indicating "insert flawed major conference team".

kidbourbon · Jan 30, 2014

fl0at_ said: ↑

Drop me a link to the future odds, and if I can find a way to easily extract whatever numbers, we can throw them in a spreadsheet and see.
Click to expand...

How good are your web scraping skills? I seriously need to block off a weekend and learn how to do that. That's just a really useful thing to be able to do.

tennisabstract.com is an absolute gold mine for tennis stats. It's amazing. The guy that does it, Jeff Sackman, just keeps adding on new shiz too. It's a great resource, but I'd love to be able to pull the data off it to use to create my own tennis rankings based on margin of victory and schedule strength.

So for example, this page here is a list of every match Rafael Nadal has played over the last 52 weeks. http://www.tennisabstract.com/cgi-bin/player.cgi?p=RafaelNadal&f=o1

What I would like to be able to do is compile a database (a spreadsheet would do the trick) that has, say, the top 50 players, and for each player it has their last [X] number of opponets, or their opponents through a given period of time, and the "dominance ratio" for each match. The dominance ratio is the % of return points won / % of serve points lost, but it doesn't even need to be calculated because he's already calculated it for each match (under the DR heading in the above link).

From there, I would create a fictional player whose stats all correspond to the tour averages. My general framework would then be to calculate the result if each actual player played this fictional player, and then for each match played over, say, the past year by each of the top 50 players, I would compare the actual result with the result of the fictional match.

So basically play the top 50 against the fictional guy in a computer simulated match (this is easy to do...you just plug in serving percentage data and returning percentage data and the computer spits out a result). Rank the top 50 based on these results. But this is just a "preliminary" ranking that gives us a jumping off point from a SOS perspective.

And then for every real match:
Isner is [X]% better than average player
Djokovic beat Isner and with an [X] dominace ratio
Use the last two lines to come up with single number that is basically "the data from this single match suggests Djokovic is [X]% better than the average player.

And then iterate accordingly through each player and each match. The result is a metric that ranks the players based on how good their opponents were, and what essentially amounts to their margin of victory of the opponent. And because tennis is such a "connected" sport (i.e most players have played each other several times..so if you pick two of the top 50 players at random, and then pick a third player at random, odds are both of the first two guys will have played the third guy, and multiple times to boot), I think the end result would be a pretty damn accurate little metric. And way way way way way better than ATP rankings.

And, honestly, doing the coding -- via excel or otherwise -- for the ranking system wouldn't be too hard. But I first need the data. And I don't know how to web scrrape.
So if you think you could scrape that site for the data discussed above and maybe a few other nuggets, I would compensate you for your time.

Beechervol · Jan 30, 2014

Don't waste your time. I've already picked the field and winning order.

Changes are coming. Hide yo kids......

fl0at_ · Jan 30, 2014

kidbourbon said: ↑

How good are your web scraping skills? I seriously need to block off a weekend and learn how to do that. That's just a really useful thing to be able to do.

tennisabstract.com is an absolute gold mine for tennis stats. It's amazing. The guy that does it, Jeff Sackman, just keeps adding on new shiz too. It's a great resource, but I'd love to be able to pull the data off it to use to create my own tennis rankings based on margin of victory and schedule strength.

So for example, this page here is a list of every match Rafael Nadal has played over the last 52 weeks. http://www.tennisabstract.com/cgi-bin/player.cgi?p=RafaelNadal&f=o1

What I would like to be able to do is compile a database (a spreadsheet would do the trick) that has, say, the top 50 players, and for each player it has their last [X] number of opponets, or their opponents through a given period of time, and the "dominance ratio" for each match. The dominance ratio is the % of return points won / % of serve points lost, but it doesn't even need to be calculated because he's already calculated it for each match (under the DR heading in the above link).

From there, I would create a fictional player whose stats all correspond to the tour averages. My general framework would then be to calculate the result if each actual player played this fictional player, and then for each match played over, say, the past year by each of the top 50 players, I would compare the actual result with the result of the fictional match.

So basically play the top 50 against the fictional guy in a computer simulated match (this is easy to do...you just plug in serving percentage data and returning percentage data and the computer spits out a result). Rank the top 50 based on these results. But this is just a "preliminary" ranking that gives us a jumping off point from a SOS perspective.

And then for every real match:
Isner is [X]% better than average player
Djokovic beat Isner and with an [X] dominace ratio
Use the last two lines to come up with single number that is basically "the data from this single match suggests Djokovic is [X]% better than the average player.

And then iterate accordingly through each player and each match. The result is a metric that ranks the players based on how good their opponents were, and what essentially amounts to their margin of victory of the opponent. And because tennis is such a "connected" sport (i.e most players have played each other several times..so if you pick two of the top 50 players at random, and then pick a third player at random, odds are both of the first two guys will have played the third guy, and multiple times to boot), I think the end result would be a pretty damn accurate little metric. And way way way way way better than ATP rankings.

And, honestly, doing the coding -- via excel or otherwise -- for the ranking system wouldn't be too hard. But I first need the data. And I don't know how to web scrrape.
So if you think you could scrape that site for the data discussed above and maybe a few other nuggets, I would compensate you for your time.
Click to expand...

I'll need to read through this whole post. I skimmed, but wanted to let you know I saw it.

I use a couple different PHP modules to read all the HTML, and then parse the tags.

The one I have used for a while is simple html dom (http://simplehtmldom.sourceforge.net). It isn't very robust, but I'm most familiar with it.

I'll see what I can do. Getting the data off the site shouldn't be tough. Getting it into excel might be tricky, but there is probably a module out there I can plug in to.

It might be a few days before I can play with it, but I'll look over it today if I can get some down time.

kidbourbon · Jan 30, 2014

fl0at_ said: ↑

I'll need to read through this whole post. I skimmed, but wanted to let you know I saw it.

I use a couple different PHP modules to read all the HTML, and then parse the tags.

The one I have used for a while is simple html dom (http://simplehtmldom.sourceforge.net). It isn't very robust, but I'm most familiar with it.

I'll see what I can do. Getting the data off the site shouldn't be tough. Getting it into excel might be tricky, but there is probably a module out there I can plug in to.

It might be a few days before I can play with it, but I'll look over it today if I can get some down time.
Click to expand...

I don't know much about web-scraping but what I noticed about the tennis abstract site is that the page source doesn't have the stats in it. I assume that means that the numbers are being calculated on the server side, and are showing up on the page as the return of function calls. (not sure if I explained that very well....my coding days are well in rearview)

fl0at_ · Jan 30, 2014

kidbourbon said: ↑

I don't know much about web-scraping but what I noticed about the tennis abstract site is that the page source doesn't have the stats in it. I assume that means that the numbers are being calculated on the server side, and are showing up on the page as the return of function calls. (not sure if I explained that very well....my coding days are well in rearview)
Click to expand...

This is actually the file you want:
http://www.minorleaguesplits.com/tennisabstract/cgi-bin/jsmatches/RafaelNadal.js

But then you just have to figure out what all the noise means.

Alternatively, might be able to just pull the page, and insert a javascript directive to write the tags, rather than just display them.

I think it is somewhere around the "make" function, but rather than returning the call, you could, in theory (I'd need to look more into it) also have it execute document.write which would then just write the damn HTML tag rather than just display it.

kidbourbon · Jan 31, 2014

fl0at_ said: ↑

This is actually the file you want:
http://www.minorleaguesplits.com/tennisabstract/cgi-bin/jsmatches/RafaelNadal.js

But then you just have to figure out what all the noise means.

Alternatively, might be able to just pull the page, and insert a javascript directive to write the tags, rather than just display them.

I think it is somewhere around the "make" function, but rather than returning the call, you could, in theory (I'd need to look more into it) also have it execute document.write which would then just write the damn HTML tag rather than just display it.
Click to expand...

Do you know Python? I've heard it's great for these sorts of things.

kidbourbon · Jan 31, 2014

fl0at_ said: ↑

This is actually the file you want:
http://www.minorleaguesplits.com/tennisabstract/cgi-bin/jsmatches/RafaelNadal.js

But then you just have to figure out what all the noise means.

Alternatively, might be able to just pull the page, and insert a javascript directive to write the tags, rather than just display them.

I think it is somewhere around the "make" function, but rather than returning the call, you could, in theory (I'd need to look more into it) also have it execute document.write which would then just write the damn HTML tag rather than just display it.
Click to expand...

And figuring out the link I needed to be looking at from the page source and pulling it from the top seems obvious in hindsight, but I didn't know to do that, and probably wouldn't have figured it out. So, biggie ups.

fl0at_ · Jan 31, 2014

kidbourbon said: ↑

Do you know Python? I've heard it's great for these sorts of things.
Click to expand...

I'm not "up" on python. I have the basics down, but it isn't something I know well enough to pull something like this off with any speed.

I played with his code today, and it is a bugger.

I can get all the stats, load them into an array of strings. But this is the part that is pissing me off... once the page finishes loading, the variable I created, seems to become non-existent. And it doesn't seem to matter when I try to write the data out so I can read it, I can never get it to write the data.

So I can dump all the stats I want into the console, but I can never manage to physically put them somewhere where I can extract that info.

It is very well written.

I think I'm going to have to be forced, if I continue to play with this, to move to Ghost and python and see if I can dump the data that way. But I'm not optimistic it is going to work, which means I'm hesitant to devote the time.

It would pretty much just be easier to just extract that data file's proper name from each of the top 50 players, and then do the math to obtain the stats you want myself, and then tabulate the data. Because right now, there is just no easy way to scrap this site.

kidbourbon · Jan 31, 2014

fl0at_ said: ↑

I'm not "up" on python. I have the basics down, but it isn't something I know well enough to pull something like this off with any speed.

I played with his code today, and it is a bugger.

I can get all the stats, load them into an array of strings. But this is the part that is pissing me off... once the page finishes loading, the variable I created, seems to become non-existent. And it doesn't seem to matter when I try to write the data out so I can read it, I can never get it to write the data.

(1)
So I can dump all the stats I want into the console, but I can never manage to physically put them somewhere where I can extract that info.

(2)
It is very well written.

(3)
I think I'm going to have to be forced, if I continue to play with this, to move to Ghost and python and see if I can dump the data that way. But I'm not optimistic it is going to work, which means I'm hesitant to devote the time.

(4)
It would pretty much just be easier to just extract that data file's proper name from each of the top 50 players, and then do the math to obtain the stats you want myself, and then tabulate the data. Because right now, there is just no easy way to scrap this site.
Click to expand...

(1)
I'm not sure I'm following. You can dump the data and then it goes vamoose?

(2)
I'm not surprised. That dude is smart. He's not famous, and so I'm pretty sure he has a day job, and to put up a site like that in your spare time, and to add new features as frequently as he's been adding them -- I think by March the site may actually be able to perform fellatio -- one would have to be pretty bright.

(3)
If you tihink it's going to take a lot of time, and you're not positive it's gonna work, then don't sweat it. His site displays more and better tennis data than any other site out there -- even sites you'd have to pay a subscription to access -- and by a comfortable margin. But he's getting the raw data from somewhere else. And I'll have to assume that the "somewhere else" is easier to scrape.

(4)
I'm not sure what you mean by this. He also gets his match data from somewhere, but I don't know where that somewhere is.

fl0at_ · Feb 1, 2014

kidbourbon said: ↑

(1)
I'm not sure I'm following. You can dump the data and then it goes vamoose?
I can get the data, I can't dump it any way I can access it outside control+c. Making it pointless.

(2)
I'm not surprised. That dude is smart. He's not famous, and so I'm pretty sure he has a day job, and to put up a site like that in your spare time, and to add new features as frequently as he's been adding them -- I think by March the site may actually be able to perform fellatio -- one would have to be pretty bright.

I think a lot of the reason the site is built the way it is is so he can force the user to do most of the work, and thus decrease his server load.

(3)
If you tihink it's going to take a lot of time, and you're not positive it's gonna work, then don't sweat it. His site displays more and better tennis data than any other site out there -- even sites you'd have to pay a subscription to access -- and by a comfortable margin. But he's getting the raw data from somewhere else. And I'll have to assume that the "somewhere else" is easier to scrape.

Meh. More of a challenge at this point. I don't like losing.

(4)
I'm not sure what you mean by this. He also gets his match data from somewhere, but I don't know where that somewhere is.

He actually calculates in the script, for example, the link you gave me, the dominance ratio and all that jazz is calculated in javascript by the user's browser.

var acerate = alignRound((match.aces/match.pts), 1, 1);
var dfrate = alignRound((match.dfs/match.pts), 1, 1);

etc.

var statrow = [domratio, acerate, dfrate, firstin, fwin, swin, bksaved];

And all those calculations come from that data file, which we can easily scrap.

Click to expand...

.

Log in or Sign up

Who's up for a challenge to win a billion?

fl0at_ Humorless, asinine, joyless prck*

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

dknash Chieftain

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

Beechervol Super Moderator

fl0at_ Humorless, asinine, joyless prck*

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless prck*

kidbourbon Well-Known Member

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless prck*

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless prck*

Share This Page

Log in or Sign up

Who's up for a challenge to win a billion?

fl0at_ Humorless, asinine, joyless pr*ck

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless pr*ck

fl0at_ Humorless, asinine, joyless pr*ck

dknash Chieftain

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

kidbourbon Well-Known Member

Beechervol Super Moderator

fl0at_ Humorless, asinine, joyless pr*ck

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless pr*ck

kidbourbon Well-Known Member

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless pr*ck

kidbourbon Well-Known Member

fl0at_ Humorless, asinine, joyless pr*ck

Share This Page

Useful Searches

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*

fl0at_ Humorless, asinine, joyless prck*