Cleaning Data with OpenRefine#

Cleaning data with OpenRefine

This session covers the use of OpenRefine for data cleaning, focusing on resolving entity discrepancies:

  • Data Upload and Project Creation: Import data into OpenRefine and create a new project for analysis.
  • Faceting Data: Use text facets to group similar entries and identify frequency of address crumbs.
  • Clustering Methodology: Apply clustering algorithms to merge similar entries with minor differences, such as punctuation.
  • Manual and Automated Clustering: Learn to merge clusters manually or in one go, trusting the system’s clustering accuracy.
  • Entity Resolution: Clean and save the data by resolving multiple versions of the same entity using Open Refine.

Here are links used in the video: