Cleaning Data with OpenRefine#
This session covers the use of OpenRefine for data cleaning, focusing on resolving entity discrepancies:
- Data Upload and Project Creation: Import data into OpenRefine and create a new project for analysis.
- Faceting Data: Use text facets to group similar entries and identify frequency of address crumbs.
- Clustering Methodology: Apply clustering algorithms to merge similar entries with minor differences, such as punctuation.
- Manual and Automated Clustering: Learn to merge clusters manually or in one go, trusting the system’s clustering accuracy.
- Entity Resolution: Clean and save the data by resolving multiple versions of the same entity using Open Refine.
Here are links used in the video:
