The easiest way, in my opinion, to do a statistical analysis on such a project would be to take all brackets since the expansion and reduce them to seeds: So if round two was all twos, all ones, two sixes, etc, you would have a combo that was like 1,14,12,4,6,2 etc, for each east,west,south,etc. And then the same for second round, all the way to the final. And then rank the highest seeding order, such that XYZ progression happened 10/25 times, #2, 4/25 times, etc. And then each play those highest orders. And that has probably already been done. But eventually it has to hit again.
I started coding a parser to get the seed results based on year and such. I have all the teams, winners, etc, but I have to make it so that it carries the seed to the team.
I think a good way to do it is to determine the outer bounds of inclusion for the latest rounds and then work backwards. In other words, if we could come up with a list of teams -- the smaller the better -- where we could say with a relatively high degree of confidence that no one outside that list is cutting the nets. So by way of easy example, let's assume that we think only zona and cuse can win the whole thiing. So then you put zona as the winner, and then move on to the next leve (the final four) and for each region ask "what teams could conceivably make it this far"?. And then just iterate down the line. We'll have to conservative with our inclusiveness, though, or we'll go past 2,000 brackets without batting an eye.
And because "agreeing on teams" is something that would never happen, we could delegate that task to the computers. I'll humbly nominate kenpom's ranking with inclusion determined by a standard deviation -- or just some predetermined variance -- of what the historical data indicates. In other words, if, after you knock out one outlier, the worst team to make the final four had a kenpom ranking of [insert ranking], use that as the cutoff. Is this a terrible idea? Yeah, probably. But pretty much all of our ideas for something like this are going to necessarily be terrible.
New idea: determine the brackets with a random number generator that has certain checks built in. So the winner is chosen randomly for each game but we won't let the computer take a 16 seed over a 1 seed. Or we won't let some team ranked [x spots] below another in a computer ranking win the game. So random but not ridiculous. Another terrible idea, I know.