2:00 PM - 3:00 PM
Seminar Series: Ramanathan V. Guha
Publicly available data from open sources are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets to make them useful --- scraping, cleaning, normalizing, joining --- is tedious, error prone and has to repeated by every group.
DataCommons attempts to alleviate some of this pain by synthesizing a single Knowledge Graph from many different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the DataCommons graph is open - any user can contribute data or build applications powered by the graph. We are jump-starting the graph with data from publicly available sources such as CDC, Census, BLS, FBI, etc. and are looking to engage with the academic community to take it further.