About
|
Filtering Student Surveys are filtered of duplicate and “invalid”
surveys prior to ranking. Invalid surveys are those
that are not self-consistent, reflecting a corrupting effect on the
data, either accidental or with intent. We have found
that certain inclined students survey their “competing” schools, giving artificially
bad (or good of their own school) reviews.
While we do not wish to point any fingers, we
have been able to link up several groupings of falsified
data with admissions staff at some universities. 5,000 valid
surveys were analyzed statistically, and a gaussian matrix was created
to model the survey patterns within and between surveys.
We can now identify those surveys that: vary too little,
vary too much, have fields that do not covary properly,
or are inconsistent. (i.e. rating the university as
an A for friendliness, but then complaining either about the
people or the social life). In addition, a rule-base
system was created to identify duplicates and model trends of
surveys from the same machine. This allows us to
be able to identify if a person is falsifying many
surveys. FFT analysis is employed to determine the “data
content” of each survey as well, providing more information for
modeling. The resulting filter, correlation matrix, and survey model
is applied uniformly to all surveys. Out of
7,500 undergraduate student surveys, 483 surveys were rendered invalid.
Inspection of the invalid surveys revealed a failure rate
of 5%. (24 of the 483 surveys were
actually “good",2.5). How is rank computed? The
generic quick answer is that it is the average of
student opinion ratings minus “variability of score”. The “variability
of score” is larger for low numbers of surveys, meaning
that that school's ranking position is less trustably high or
low. Strict statistical variance is not instructive here because
'variance' is computed within a group of surveys — with
only 1 survey, there is no variance. The 'Variability' function
decreases exponentially with the size of the sample set, applied
equally to all institutions, making it an acceptably fair
accounting form. After 5 surveys, the variability of
score drops to less than .3; after 10 surveys, it
is less than .1. After 20 surveys, there
no significant variability in position. Essentially, each school's score
converges to a position as the number of surveys increases.
More specifically, Rank is computed by multiplying the importance of
each variable selected by that variable and adding together.
The average of all matching surveys for a particular
school is then taken. From this, a 'variability' is
computed — this is based upon the number of
surveys. If there is only 1 survey, and it
ranks a school at a 10, then 1 more survey
could come in, ranking a '0', which would give the
school average a 5 (10/(1+1) = 5). This
is the lowest that the school 'could' be — given
1 more survey. So this 'variability' is subtracted from
the overall score, reducing it. In this manner,
schools that have more surveys have a more believable average
than school with only 1 survey. Actual Equation:
score = average(importances[]*preferences[]) - (10*(sum(importances[])))/(#svys + 1)
|
|