What is OpenRefine?

OpenRefine is a free, open source, Java application, that runs offline in a web browser.

openrefine interface

The original creator, David Huynh, describes Refine as:

“A power tool for working with messy data”

  • more powerful than a spreadsheet
  • more interactive and visual than scripting
  • more provisional / exploratory / experimental / playful than a database

David Huynh

Exciting Trailers from Google!

If you want a visual introduction, check out these trailers from Google created for an earlier version called GoogleRefine:

Tabular Data

Refine can handle all sorts of data from all sorts of sources:

Once imported, the data is represented as tabular, using this basic terminology:

table parts

Refine is efficient enough to provide comfortable performance up to 100,000’s of rows (although, you may want to increase memory allocated to Java).

Use Cases

Explore - navigate and evaluate quality with visualizations and filters that help dig deeply into the data so you can get to know it better…

Clean - efficiently discover and fix inconsistency with faceting, clustering, cell transforms, GREL expressions…

Transform - easily change formats, subset, or reshape with split/join multi valued cells, split columns, transpose columns/rows…

Extend - enrich data by combining files, merging projects, fetching URLs, reconciliation with online databases…

Automate - record and preserve your processing routine for transparency, then automate reuse by exporting operation history in JSON!