What is Version Control?

Do you have files like final.txt, final_revised.txt, final_revised2.txt, final_revised2_revised.txt?

final doc comic
'Piled Higher and Deeper' by Jorge Cham www.phdcomics.com

This is a “local version control system” which depends on your memory and organization to avoid errors and utter confusion. Luckily, we have software that can handle this task, from the basic “track changes” to big centralized systems such as SVN. Automated version control WILL make your life better!

Why Git?

Git is a free, distributed version control system originally developed for coordinating huge software development projects (specifically the Linux kernel). However, it is fast and flexible enough to be used on any scale project, from your personal notes to your research lab’s code–and offers many benefits beyond “track changes”.

Rather than storing a series of copies of a file with different filenames, Git captures a snapshot of your project each time you commit. Then it permanently stores this series of snapshots as your project’s history. Try to think of your changes as separate from the document itself. The current file that you see in your folder is made up of a specific set of those changes, while the complete history of your project is safely stored in a hidden .git directory.

file versions
Adapted from: Software Carpentry, Version Control with Git

Each commit records the creator, email, and changes made, providing transparency and credit for your project, as well as, checksums to ensure no information can be lost or corrupted without detection. Unlike “track changes”, this history stays with the repository permanently.

Git is distributed meaning every copy of a repository contains the complete history. This is great for collaboration, fast performance, and offline usage. Git can efficiently branch, diff, and automatically merge different sets of changes together, enabling people to work in parallel and sync their files.

branch and merge versions
Adapted from: Software Carpentry, Version Control with Git

With Git you can make changes and experiment without fear! Committing to a repository Git only adds data, it never deletes information. This makes almost everything undoable!

A Git repository is often called a “repo”

What is GitHub?

GitHub is a popular web service for hosting Git repositories–with benefits! It provides a handy web interface for editing and collaborating on repos, as well as, built in project management features and static web hosting. Accounts are free for public repositories–private repositories are available on a subscription pricing model.

GitHub is where the distributed part of Git gets really cool. With a central repo hosted on GitHub, you can easily collaborate with anyone in the world, or yourself across multiple computers, and never get out of sync with your project!

There are other version control systems and cloud repository hosts, check the Resources for more options.

Plain Text Workflow

Git works best tracking plain text files. All code (.c, .py, .r, .html, .md, etc) is plain text. Most images, video, or proprietary document formats (such as Word’s .docx) are not. Git can tell exactly what changes in a plain text file, but can not understand the insides of a binary file. It will know when a binary file is changed, but it can not give you the exact differences. Thus, Git is not optimal for managing Word docs, PDFs, or other binary files.

Instead of using proprietary formats, consider a plain text writing workflow using Markdown or LaTeX–it simplifies your life, makes writing easier and more sustainable!

Example Use

Software Carpentry, “Version Control with Git” lesson:

Or check the source code of this site!