SAT Scores for High Schools in

New York City

Link to the our project repositiory

The plan:

  1. What were our goals? What questions did we seek to answer?
  2. Where did we get our data? What are the problems with our data?
  3. What parts of the data were interesting?
  4. Demonstration of our code.
  5. Discussion of our visualizations and results.
  6. What are the next steps?


  1. Visualizing poverty and other SES metrics with Scholastic Performance Standards.
  2. Discover and display any correlations between assement data and community information.

Questions to Answer:

  1. Is there a correlation between SAT scores and other school information?
  2. How do the schools' performance map with poverty and income in New York City?
  3. How do the schools' performance relate to teachers', parents', and students' feeling about their experience with the school?

Data Sources

  1. NYC OpenData SAT Scores 2012
  2. NYC OpenData DOE High School Directory 2014-2015
  3. Income by ZipCode
  4. Income by Neighborhood
  5. Class Sizes 2010-2011
  6. School Safety Report
  7. Neighborhood Tabulation Areas Shape File

Problems / Issues / Concerns with the Data

  1. Missing data: We lose a lot of data from schools that didn't report their scores.
  2. Time: The data we're using come from vastly different years.
  3. Aggregation: We had trouble choosing how to aggregate the information in a way that made the most sense. (Zipcodes? Neighborhoods? School Districts? Boroughs?)

Data Columns

Pulled some variables that we thought might be important.

  1. SAT scores: critical reading, math, writing and total out of 1600 scale
  2. Avg Household Income
  3. School gender ratios, average class sizes
  4. Safety concerns


  1. For the data and exploratory plots: R
  2. For the map: CartoDB
  3. For the survey visualization: Crossfilter

Data Munging

  1. Sample Code - Joinng tables
  2. Sample Code - Exploration plots
  3. Sample Code - Correlations

Cluster Centers

Kmeans cluster center assignments for the SAT scores.


  1. CartoDB Map
  2. Exploration of Environmental Effects on SAT Score
  3. Exploration of the Survery Questions
  4. Visualization of Important Survey Questions with SAT Scores

Exploration of Environmental Effects on SAT Score

Basic Correlation Fits

Basic Correlation Fits