I'm not a statistic master so perhaps some people may be able to describe it with more details but I'm going to give a brief explaination of why the median of "top" 20 is a totaly biased estimation.
You can describe the total damage done by 1 player(class) on one given encounter as a single sample (measure) of the quantity you are trying to estimate.
The quantity beeing a
Random variable it follows a
Probability distribution.
When you want to compare dps among different classes you want to compare their Probability distribution. The mean and variance are parameters that describe the probability distribution (but are enough only for Gaussian/Normal distribution), which usually fit many random events in the world thanks to the
Central limit theorem.
But in our case the normal distribution isn't a good model, first because normal distribution has an infinite support, (can take negative value regardless of the mean). And it's definitivly a non symmetric distribution as everyone aim for the "best dps".
More complex distributions, as it's the case for us, usually require more parameters than mean and variance to be described. And most people understand that the mean isn't what interest us here (but can be interesting for Blizzard who care about the whole playerbase).
Modeling the right distribution to fit the scenario can be a really interesting problem, but would require of course a lot of data.
Given the measurements avaiable (lets say top 200 damage by spec), you cannot estimate distribution parameters, as some people pointed top 200 damage done by 1 spec can be the whole data for one class (including the worst players), but the 1%top for an other class.
You may eventually be able to make an estimation based on top N of your samples only if you artifically reduce all your base sample to the same size.
Let's say you have 2000 samples for Arcane Mage, but 200 for feral druids, you pick 200 samples out of your 2000 Arcane Mage samples (using a uniform random distribution). Then take the top 20 out of your 200 samples for both classes. That would be a non biased estimation of the top 10%, but once again that wouldn't describe the whole distribution.
If you don't want to discard your 2000 Arcane Mage sample, you simply have to take a fixed % top. You will average the dps of the top 10% mage (200) and the top 10% of the druids (20), your estimation for the druids would of course be much more noisy in this case.
If you want to make a robust estimation of the distribution you need the whole data set (not the top N/%), having less sample for one class/spec isn't that bad in this case as you can estimate how robust your estimation was given the number of sample.
Overall the top N/% is probably
extremly sensitive to the variance of the distribution.
A bit more details on the bias estimation of your measurements,
Considering you are measuring the average of the top 10% of samples. Which corresponds to the Cumulative distribution function beeing above or equal 0.9 (1-10%).
Assuming normal distribution (as I said it's a bad model but can give you an idea) :
The erf function is a bit complicated so I just gave a numerical solution you can check on wikipedia if you want more details.
What does this mean?
Class X with 9000 average dps and 1000 dps sigma, your measure will show 10281 dps.
Class Y with 9000 average dps and 100 dps sigma, your measure will show 9102 dps.
As everyone could "feel" that measurement is biased and highly favor variance, just tried to express by how much
