New Years Day 2005
StudentsReview's OFFICIAL Rankings are the first rankings to be
COMPLETELY generated from student opinion, and to publicly publish
their ranking methodology & analysis. There are no
preconceptions, university administrators, group consensus, or personal expectations governing
our rankings. We perform an analysis that is transparent
and understandable, and hopefully informative, governed only by tabulated student
This document will explain how the data was analyzed, how the StudentsReview ranking system operates, and how “apples to apples” comparison is achieved. If you have a quick question, head on over to the 2005 NYD FAQ (Frequently Asked Questions) page.
The StudentsReview Ranking system consists primarily of
filtering machinery and a public formula, which is described below.
There is a minor component to the math (and
we assure you, it is only math connected with analysis
of student surveys) that we are keeping private to prevent
exploitation. If public, it could be exploited by third
parties who want control the rankings, and it would be
impossible to model the data, preventing any meaningful analysis in
Apples to Apples
StudentsReview biases its rankings heavily towards educational quality. There are many components and aspects with which you can use to try to asses the “value” or “rank” of any particular school. For many, the reputation opens doors and provides opportunities right at the outset that might not otherwise or ever be available, or the value surfaces and is inbued in different ways than is explicitly measurable. StudentsReview recognizes this, and encourages the reader to be aware — when viewing any rankings — of the different ways that an institution can be valued, how its education may surface to help them in their career later, and of how their own priorities may change over time.
For this particular set of rankings, and in general, StudentsReview takes the approach that the quality of education will surface and provide both confidence and opportunities in the workplace once the tangible skills are revealed. While REPUTATION is fairly well known and understood, the quality of education at any particular institution is an unknown that is insightful to reveal.
Several institutions have led surveys from designated “feeder” students — to bias our data in favor of their institution. Apart from the data we've been able to detect ourselves, the students have surprised us to reveal that they are “self reporting” — that is, that they have acted as ethics police, reporting directly to us those incidents of bad-practice. Resultingly, we dropped all of the affected surveys and performed manual verification.
A number of institutions have sought to leverage threats of legal action, or a business relationship to influence us into removing or altering data. In every contention, except those where the data was found to be genuinely invalid, the demands and solicitations were denied.
5,000 valid surveys were analyzed statistically,
and a gaussian matrix was created to model the
survey patterns within and between surveys. We can now
identify those surveys that: vary too little, vary too
much, have fields that do not covary properly, or are
inconsistent. (i.e. rating the university as an
A for Friendliness, but then rating Social life an F).
The filter was then trained on a marked set
of valid and invalid data to set its thresholds for
how much inconsistency, variance, and survey-survey interaction it can tolerate.
The combination of the trained statistical filter, correlation matrix, and survey model runs autonomously, applying its decisions uniformly to all surveys from all schools. For the OFFICIAL rankings, the thresholds for inconsistency were tightened a bit to prevent spurious or anomalous data from affecting the rankings too heavily.
Out of an early sampling of 7,500 undergraduate student surveys, 483 surveys were rendered invalid. Inspection of the invalid surveys revealed a failure rate of 5% and 7% for good and missed surveys respectively. (24 of the 483 surveys were actually “good”), then 32 invalid surveys were missed, leading to an overall incorrectness failure of about 1%.
If anything, the self-selected bias is actually a polarizing one, in that those students who feel positively tend to “reply” in the comments to those who feel negatively, creating what appears to readers to be a strongly partisan opinion of the school on our website. But even that belief is incomplete, because many of the students who do not write a comment after taking the survey are those without strong opinions. So the survey data itself (not the comments) tends to have a reasonable mix of positive, negative, and middle of the road opinion.
The second result is a positive one. The same self-selected students are also the ones with time to sit down and expound in-depth about their experiences — at far greater detail than possible with any on-campus surveying. As such, much more meaningful causal insight is gained about they physical factors leading to any satisfaction or dissatisfaction than any short quotes could provide.
Finally, it has been brought to our attention that some critics believe surveyors are fulfilling some “ulterior” motive by taking our survey (i.e. deferring applicants to reduce competition, etc. ). Except in the cases of university admissions bad practice (described above), there is little to no reward or consequence that student can achieve by taking our survey. That is, deferring prospective applicants through our site as a personal tactic would not achieve visible results for several years — most students will have graduated by then — it is long after most peoples' reward horizons. It is impossible for them to achieve any ulterior motive beside informing prospective students.
A common method to overcome this problem is to model the shape of biases across ALL surveys and ALL schools (Figure 1 - black line). That shape is used to neutralize the response biases in any particular school, so that all schools have the same modelled sample. Unfortunately, to some degree, neutralization is a bad thing — consider a school where most of the students actually ARE dissatisfied, or the educational quality really IS NOT that stellar. Neutralization would disproportionately suppress the valid (dissatisfied) opinions, and overly amplify the one or two satisfied opinions — leading to an artificially high score.
The other method, and the preferred one, is to simply acquire more surveys. As the numbers grow, the sampling converges on a representative sampling.
Just to reiterate, 61% of our surveys are positive, so we do not maintain an overly negative bias.
Regardless of what anyone may say, schools are inherently non-comparable. They have different student bodies, different majors, different educational offerings, different locations, weather, and ultimately different people doing the rating. Ranking the schools in any fashion is the equivalent of me giving you an orange, and Joe over there an apple, and asking each of you, “how sweet is that fruit?”. You might say, “This orange is super-tangy and sweet”, and Joe might say, “My apple is really tart and sweet!”. Now which fruit is sweeter? Who knows? Two different people said something about two very different fruits — we don't know if Joe is more or less sensitive to sugar than you are, if the apple actually has more or less sugar content, or if the multitude of other flavors teasing your taste buds interfere in some way. The point is, you are two different people looking at two completely different fruits.
We overcome this
problem in three steps: Binding Free Variables, Synthesis, and Normalization.
But before diving in, it is useful to provide
an overview of what is occurring. Essentially what we
do is break apart what we know about you and
Joe into as many factors as possible — are you
male, how much do you like fruits, etc. Then
we look at what we can learn from a large
number of people like you and Joe. How similar
are your sensitivities to sugar, how do the similar people
to Joe rate oranges, and how do the similar people
to you rate apples? How do you covary? Using
what we know about how you both are related, we
synthesize a kind of “ghost” next to each of
you to act as a “stand in” for the other
person. True, it is not the same as if
Joe had tasted an orange himself, or if you had
tasted an apple, but it provides a suggestion of how
you “might have” rated it, if you did.
Without a knowledge of the free variables, and the dependencies upon them, it is impossible to insure that the sampling of data is comparable from school to school. Suppose we know (hypothetically) that “in general” that women tend to rate educational quality a half-grade higher than men. (i.e. Women give an A-, and Men a B+). Now suppose you compare two schools, one of mostly female proportion (School A), and the other mostly men (School B). If equal in education, the mostly-female school will naturally score higher in the rankings than the mostly men school. Does this mean that School A has a better educational quality? Absolutely not. To actually compare the two, we have to leverage our knowledge about how educational quality is dependent upon the free variable “Gender” — that knowledge will tell us about relationship in Educational Quality has to gender, and allows us to conclude that Schools A & B actually have equivalent educations. Without the knowledge of the free variables, the surveys we have would be both misleading and useless for concluding anything about relative educational quality — or about anything else, for that matter.
Well, missing data poses a difficult problem. If there is a missing data point in the free variables, the “lack of data” is amplified to drive the rankings incorrectly. What data synthesis does is to create “dummy” data to fill the missing data with a consistent data point based on the existing data, and to act like a dampener on inconsistencies. The relationships we've learned from the entire data set are used to predict the filler data. (analogous to our apple-orange ghost example above). It is there to prevent empty or small amounts of information from driving the analysis one way or another. Suppose we want surveys from an equal number of men and women at each school, but then find at one of the schools that only 1 or no men have been surveyed. We take the relationship of the women-men's scores, times the women's scores to determine the average score a male would give. This data point is completely generated from the women's scores at that institution, but prevents a single poor male rating, or no male rating from driving the school's score down. Now in practice, there are 30 dummy entries, one corresponding to the total combinations of free variables — each of 5 intellects, 2 genders, and 3 regions.
åj ( N×cor(i,j)×(Avg(Cj) - D(OverallAvgj,Ci) ))
where i represents the missing data, and j the source data. We take the weighted average of the correllative difference between the values predicted by the free variables.
Once the data is filled, it is still is not quite comparable, in that different numbers of students with different genders, intellects, and regions are rating the schools. We overcome this problem by normalizing the distributions of intellects, genders, and regions at each school to the average distributions of the entire data set — achieving a “best fit” of the free variables from each school to data set average. In this way, the distributions match, artificially making all the student bodies similar. Normalization stratifies the data set across the free variables, allowing them to be bound and reweighted for each school to achieve a common contribution, and thus an artificially common comparison.
We did not normalize or synthesis across majors, because insufficient data exists to draw any reliable about the inter-major dependencies, and over-normalization introduces HUGE artifacts into the dependent variables, such as single-overamplification, which we observed.
As mentioned earlier by the discussion of free variables, the dependent variables (Program Quality, Social Life, etc) are conditioned on the free variables:
= åijkNijk× Quality|Intellecti,Genderj,Regionk
Our free variables carry an independency assumption — that Gender, Intelligence, Major, and region coming from are completely independent. That is, that Gender has no bearing on intelligence, or on Major choice, or region the student is coming from, and vice-versa. In the cases of Gender«Major, Intelligence«Major, there may be some correllation to break that independency assumption. But because this particular ranking does not stratify by Major, the independency assumption is not broken. In the case of Intelligence, there is a hidden dependency upon school and ACT/SAT score (in addition to other factors) (School,ACT|SAT->Intelligence), which is only trivially modelled as a function of school.
Not included in this ranking, despite a large number of surveys is the University of Alabama, Tuscaloosa. There are an enormous amount (the greater proportion) of surveys about the University of Alabama which have taken the time to defeat our statistical filters, but when manually evaluated, more than 2 reviewers believe them to be all written by 1 or 2 people due to identical writing styles. Additionally, disproportionately few have left email addresses and contact information, far below the average.
The contention of “insufficient information” is one that is leveraged fairly frequently, but is often made overlooking the value of a single survey. Many surveys provide knowledge of systematic failures by the school, but individually, surveys provide causality and insight that would be otherwise invisible. Consider a survey of Physics, rating program quality an F, and friendliness in department also an F. A large number of surveys might completely cover this survey up, making it seem like an aberration, but what does one learn from it? Someone in physics finds the department unfriendly, and perhaps because of it, it impacts their success in the department! What kind of person is sensitive to friendliness in the department? A social friendly person! While this school might be great in general, a social friendly person should not come here to study physics.
Why not rank using ACT/SAT? Standardized tests are poor predictors of performance and have little provable correllation with actual intelligence - The “Self Rated Intelligence” that we use is intentionally loaded — carrying the facets of ego, ambition, multivariate intelligence, understanding, and reasonable hope.
Next Year's Analysis