How rank is computed
Removing bad surveys
Student Surveys are filtered of duplicate and “invalid” surveys prior to ranking. Invalid surveys are those that are not self-consistent, reflecting a corrupting effect on the data, either accidental or with intent. We have found that certain inclined students survey their “competing” schools, giving artificially bad (or good of their own school) reviews. While we do not wish to point any fingers, we have been able to link up several groupings of falsified data with admissions staff at some universities.
5,000 valid surveys were analyzed statistically, and a gaussian matrix was created to model the survey patterns within and between surveys.
We can now identify those surveys that: vary too little, vary too much, have fields that do not covary properly, or are inconsistent. (i.e. rating the university as an A for friendliness, but then complaining either about the people or the social life). In addition, a rule-base system was created to identify duplicates and model trends of surveys from the same machine.
This allows us to be able to identify if a person is falsifying many surveys. FFT analysis is employed to determine the “data content” of each survey as well, providing more information for modeling. Basically, recurring patterns in survey data appear as different frequency components, which can be be filtered against.
The resulting filter, correlation matrix, and survey model is applied uniformly to all surveys. Out of 7,500 undergraduate student surveys, 483 surveys were rendered invalid. Individual Inspection of the invalid surveys revealed a false positive rate of 5%. (24 of the 483 invalid surveys were actually “good" -- 2.5%).
How is rank computed?
The generic quick answer is that it is the average of student opinion ratings minus “variability of future score”. The “variability of future score” is larger for low numbers of surveys, meaning that that school's ranking position is less trustably high or low -- the score could be easily affected by a new highly positive or negative survey. If there is only 1 survey, and it ranks a school at a 10, then 1 more survey could come in, ranking a '0', which would give the school average a 5 (10/(1+1) = 5). Strict statistical variance is not instructive here because 'variance' is computed within a group of surveys — with only 1 survey, there is no variance.
The "variability" function decreases with higher numbers of surveys, applied equally to all institutions, making it an acceptably fair accounting form. After 5 surveys, the variability of score drops to less than .3; after 10 surveys, it is less than .1. After 20 surveys, there no significant variability in position. Essentially, each school's score converges to a position as the number of surveys increases.
The rank is the computed by multiplying the importance of each variable selected (how much we or a prospective student cares about education quality or campus beauty) by that variable's score and adding together. The average of all matching surveys for a particular school is then taken. From this, a 'variability' is computed — this is based upon the number of surveys. Then the 'variability' is subtracted from each school's overall score, reducing it. In this manner, schools that have more surveys have a more believable average score than school with only 1 survey.