Working with College Scorecard Data using Python and MySQL
I have recently found an interesting dataset on the U.S. Department of Education website. It provides a lengthy dataset for every college in the US. Each CSV offers yearly data with 2,989 columns. You can find the data here. FYI - It seems like data is not available after 2017.
I find it easier to use Python to run SQL scripts. I used MySQL to store and manipulate the data.
The following script is the first part of creating stage tables by the first five letters.
Create stage tables and group them by the first five letters:
"cleaners" class adds the unique IDs to each list. Lists are used to select columns from the DataFrame.
"mysqlload" class uses the data collected to create tables and insert data.
Next stop, I'll do some analysis in Python and Publish the results in Tableau.
#publictableau #tableau #analysis #analytics #dataengineering #highereducation
Comments
Post a Comment