StudentsReview ™ :: Undergraduate Survey Data Biases

-or-
Search for Colleges by Region
 

or within distance of city





  Who's got the Best (variable)?

Perceptual Rankings:
You Make 'Em.
We Post 'Em.
You Vote 'Em Up.
You Vote 'Em Down.
Aww yeah.


Undergraduate Survey Data Biases

We had our survey vetted by hundreds of students at the outset (now hundreds of thousands), and had a few psychological researchers evaluate the questions for imparted "researcher bias" (or experimenter bias). Some of the experimenter bias is deliberate, which we will discuss later, but right now I want explore a little of the response bias and intrinsic bias in the data collected so far (2015). Contentions that evaluaters have with online survey data surrounds the incorrect bipolar bias assumption (at least for our data), and neglect some of the more positive traits of online surveys -- namely lower procedural bias, at least surrounding completion pressure. (Though, survey layout and rendering on different devices now produces another bias). Without a controlled experimentation environment, one can never be 100% certain of what the biases are, but a lack of completion pressure reveals itself in the length of the commentary responses. One can never be 100% certain what the biases are in the controlled environment too, but desire to complete (and get out to eat, use the restroom, see the sun...) definitely exists there. My point is, they are different.

Selected Undergraduate Surveys

We filtered 100,000+ undergraduate surveys down to 67,589. We eliminated duplicates, all-zeros, all-very-high ratings matching a "certain" (undisclosed)*1 profile, spam, and erroneous university selections, user reported/challenged (and resultingly invalidated), and surveys that our statistical analyzer thought were less valid. Surely there is still some noise.

The survey data consists of a number of individual control fields (Gender, Self-rated Intellect, Instate/Out-of-state/International, ACT/SAT, Graduation Year), and "dependent" school ratings such as:

  1. Education Quality
  2. Whether Schoolwork is Useful
  3. Whether Schoolwork encourages Creativity
  4. Whether Academic Success is dependent upon knowledge and material mastery
  5. Competitiveness of classmates
  6. How much their mind was challenged
  7. How much they expected to be challenged
  8. Classes taught by TAs
  9. Faculty Accessibility
  10. If they are treated as a valued individual
  11. Friendliness
  12. the Social Life
  13. Extra Curricular Activities
  14. The Surrounding City
  15. Campus Safety
  16. Campus Beauty
  17. Maintenance
  18. Apparent Funding Use
  19. and if they would return Again if given the choice

Question Bias

You will notice that StudentsReview questions are biased heavily towards educational quality and light on sports/extracurriculars, with 9 (about half) of the 19 field questions having to with knowledge gaining, classwork and interactions with faculty. This is by design, as we've felt that strong (or weak) athletics programs are quite apparent, but the classroom experience is not. 6 of the remaining questions reflect the social life, and 3 questions describe the interactions with the physical campus and facilities -- although, Campus safety could be considered to be both facilities and social life. Finally, we ask a single "sum up" question -- if they would return again if given the chance, which can also act as a catch-all for missed questions.

Neutrality of Respondent Bias

First we wish to determine if there are any large scale correlations between the control variables and the dependent variables. Looking at the max magnitude of the pairwise correlation(control, dependent) coefficients: $max(|cor(control,dependent)|)$

IntellectACTSATGenderFrom AreaGraduation Year
0.103226100.136717570.07904559-0.053743100.08894535-0.12345948

The maximum magnitude correlation exists in the ACT column and is between ACT and Extra Curicular activities at 0.137. This is an extremely low correlation, and is still the largest in the data set. Variables with such a low correlation would be considered to be unrelated. The other control variables that StudentsReview collects are even lower -- indicating that across the data set, Intelligence, Gender, Graduation Year, ACT/SAT, and where a student visits from are not "predictive"*2 of his or her ratings at a given school. On the first pass, this shows a certain neutrality in the data that might be unexpected.

Sensibility of Data

Here's a few sanity tests. What are the correlations of some of the dependent variables? Are they expected?

FieldFieldCorrelationExpected
Education QualityAcademic Successhigh 0.7301764yes
Education QualityAgainmed 0.5803594interesting
Academic SuccessUseful Schoolworkhigh 0.7063110yes
Academic SuccessSurrounding Citylow 0.3160694yes
Academic SuccessMind Expectationslow 0.1359635surprising
Campus BeautyMaintenancemed 0.5886384yes

First, most chosen values are exactly as expected -- the rated educational quality has a high correlation with rated academic success. Campus Beauty is correlated, but not highly, with Campus Maintenance. This of course makes sense -- an unmaintained campus, no matter how "beautiful", will not be for long, but a well maintained ugly campus is still ugly. But at least clean!

Two noteworthy values. Educational Quality is only moderately correlated with whether a student would return. This means that education quality on it's own does not fully account for a student's satisfaction. Not everyone is exclusively motivated by classroom learning as apparently I am. We'll have to see whether our survey captures enough features. Second, a student's rating of the "academic success" (being dependent upon mastery) is almost completely uncorrelated with how much they expected to be challenged by their coursework coming into the school. This is surprising because one might be inclined to think that a person with a high Mind Expectation would be critical of the Academic grading system. Or at least, I might be.

See Figure 1 for a visualization of the correlation matrix.

* "Qualifications" *3

  1. Undisclosed - Some of the filtering mechanism is undisclosed to keep malicious users from exploiting the survey
  2. Predictive - "predictive" in the non causal sense, across a closed data set. This is statistics after all -- "Linearly Descriptive" is probably better, but is not the conventional term.
  3. Qualifications - This is the qualifications section, which everyone seems to have to include these days.
Fig 1 - The correlation matrix of the dependent fields, clustered by values shows that 6 of the 9 educational fields are rated closely, and that they are highly correlated (fields in blue in the middle). As one would expect, Faculty Accessibility is negatively correlated with TAs teaching classes, and whether a student would choose to return to a school is correlated with the Education Quality.