Latrinsorm
10-05-2013, 03:09 PM
The Pythagorean model (http://en.wikipedia.org/wiki/Pythagorean_expectation) is not perfectly accurate. What if one source of error we can remove is that close games are a .500 proposition for every team, and any deviation from that is merely statistical noise?
So I tabulated all the NBA team-seasons where 82 games were played in a 30 team league, which gives us 238 (2005-2011, 2013, less 2 games for Boston and Indiana in 2013). That gives us...
12 1 17
13 1 16
15 2 35
17 2 43
18 2 45
19 3 57
20 2 41
21 2 33
22 4 99
23 5 114
24 6 163
25 2 52
26 6 170
27 5 148
28 2 57
29 5 148
30 3 90
31 2 67
32 6 187
33 10 338
34 10 337
35 5 174
36 5 183
37 4 147
38 4 140
39 3 112
40 7 275
41 12 490
42 7 289
43 5 209
44 8 339
45 9 400
46 4 184
47 6 272
48 3 141
49 6 291
50 10 508
51 3 156
52 5 267
53 5 267
54 8 432
55 4 209
56 5 281
57 5 279
58 4 240
59 5 304
60 2 122
61 3 174
62 3 181
63 1 61
64 1 60
65 1 61
66 3 194
67 1 61
...where the first column is actual games won, the second column is number of team-seasons that had that, and the last column is total predicted wins for those team-seasons. If our hypothesis is correct, we would expect teams with 40 or less wins (sub-.500) to have more actual than predicted, 42 or more (super-.500) to have less, and 41 to have the same. As it turns out...
rec n a p dif
<.500 109 3219 3288 69
>.500 12 492 490 -2
=.500 117 6038 5982 -56
...that is pretty much what we see! But a few things:
1. What if a team was predicted to have 41 wins and actually had 43? This fast read would say that the sub-.500 got 2 games closer to .500, but they really only got 1 game closer to .500 and then went even further. How should we count that? It's hard to say, and while those bridge-crossers only accounted for 11 of 238 team-seasons, they accounted for -38 of the -56 (but only 5 of the 69). I think the smart thing to do would be to count 41 to 43 as 1 game towards .500, 1 game away from .500, net result 0 for our metric. This reduces the total observed value of [69+56-2 = 123] by 24, because a good part of that -38 was in the correct direction to start with, so over 238 team-seasons we see 99 games' worth of close games being crapshoots, or about two-fifths of a game per team per season.
2. More importantly, I'm not at all sure how to quantify error bars for this measurement.
3. I'm using the basketball-reference stated numbers for Pythagorean Wins, but obviously the formalism returns long decimals, and when we're talking about decimal differences that may be significant.
But for now I'm encouraged to go back through the years of 82 games and see what we can see.
So I tabulated all the NBA team-seasons where 82 games were played in a 30 team league, which gives us 238 (2005-2011, 2013, less 2 games for Boston and Indiana in 2013). That gives us...
12 1 17
13 1 16
15 2 35
17 2 43
18 2 45
19 3 57
20 2 41
21 2 33
22 4 99
23 5 114
24 6 163
25 2 52
26 6 170
27 5 148
28 2 57
29 5 148
30 3 90
31 2 67
32 6 187
33 10 338
34 10 337
35 5 174
36 5 183
37 4 147
38 4 140
39 3 112
40 7 275
41 12 490
42 7 289
43 5 209
44 8 339
45 9 400
46 4 184
47 6 272
48 3 141
49 6 291
50 10 508
51 3 156
52 5 267
53 5 267
54 8 432
55 4 209
56 5 281
57 5 279
58 4 240
59 5 304
60 2 122
61 3 174
62 3 181
63 1 61
64 1 60
65 1 61
66 3 194
67 1 61
...where the first column is actual games won, the second column is number of team-seasons that had that, and the last column is total predicted wins for those team-seasons. If our hypothesis is correct, we would expect teams with 40 or less wins (sub-.500) to have more actual than predicted, 42 or more (super-.500) to have less, and 41 to have the same. As it turns out...
rec n a p dif
<.500 109 3219 3288 69
>.500 12 492 490 -2
=.500 117 6038 5982 -56
...that is pretty much what we see! But a few things:
1. What if a team was predicted to have 41 wins and actually had 43? This fast read would say that the sub-.500 got 2 games closer to .500, but they really only got 1 game closer to .500 and then went even further. How should we count that? It's hard to say, and while those bridge-crossers only accounted for 11 of 238 team-seasons, they accounted for -38 of the -56 (but only 5 of the 69). I think the smart thing to do would be to count 41 to 43 as 1 game towards .500, 1 game away from .500, net result 0 for our metric. This reduces the total observed value of [69+56-2 = 123] by 24, because a good part of that -38 was in the correct direction to start with, so over 238 team-seasons we see 99 games' worth of close games being crapshoots, or about two-fifths of a game per team per season.
2. More importantly, I'm not at all sure how to quantify error bars for this measurement.
3. I'm using the basketball-reference stated numbers for Pythagorean Wins, but obviously the formalism returns long decimals, and when we're talking about decimal differences that may be significant.
But for now I'm encouraged to go back through the years of 82 games and see what we can see.