Git Intro
Introduce using Git and GitHub for collaborating (workshop outline)
Why use Git and GitHub?
Git is a free, distributed version control system originally developed for coordinating huge software development projects (i.e. the Linux Kernel led by Linux Foundation).
However, Git paired with GitHub is also great for academic and personal uses, such as collaborating over code for your research project, drafting articles, or creating lab websites.
Intro to version control:
Git for science:
- Reproducibility: records all steps and participants, ability to share complete history.
- Backup: it is very difficult to delete history–if you commit it, Git stores it with integrity.
- Collaborate: manage complicated work in parallel with multiple people/computers.
- Experiment: branching makes you feel brave and free to explore!
GitHub:
- Project management features
- Organizations
- Community (find code, connect)
- Sharing (DOIs for citable code / data)
Basic Workflow
Git is best learned hands-on. Intro is via command line to give a clear step-by-step overview of how it works–in the future you may use a GUI tool or functions built into your IDE.
Keep in your heart that Git keeps track of Everything! Have peace of mind that you can’t loose your history once committed to a Git repo, unless you try really hard.
Be sure to .gitignore
cache files and temporary outputs–just commit the source code, data, and notes.
Basic GitHub Collaborating Workflows
There is two basic workflows to collaborate on a GitHub repository:
- Add collaborators to repo (simple, typical of smaller projects)
- On GitHub, click the “Settings” tab of your repository.
- On the left menu, click “Collaborators”.
- Add collaborators via email or GitHub name.
- Collaborator will need to accept the invite.
- Clone the repo to your local machine.
- Now you all have equal control over the repo content:
push
,pull
,merge
, etc. - Using feature branches may be helpful to organize your work–create a branch, do some work, push the new branch to GitHub, then create a Pull Request to discuss with your team.
- Fork and Pull Request (more complex, centralized control, typical of bigger projects)
- Navigate to your partner’s repo on GitHub.
- Click “Fork” in upper right.
- Make changes in your personal fork of the repo.
- On your personal fork, click “New pull request” button.
- Check the changes, click “Create pull request” button.
- Create a message saying exactly what changes you made and why.
- The original repo will now have a PR that collaborators can view and comment on. Only the owner can accept the request and merge it.
See GitHub Help Fork a Repo, About Pull requests, and Understanding the GitHub Flow for more info. Also, check out Atlassian’s Comparing Workflows or GitHub About collaborative development models for more options.
Collaborating Practice
Add collaborators:
- create a new repository with README
- clone to your local machine
- add partner as collaborator
- accept partner’s invite (check notifications)
- clone partner’s repo
- make change locally to partner’s repo README and push
Auto merge:
- make change locally in your repo, creating a new file
test.txt
- try to push
- pull (auto merge message)
- push
Conflicts:
- pull your partner’s repo
- make a change to
test.txt
locally in your partner’s repo - push
- make a change to
test.txt
locally in your repo - try to push
- pull, enter merging state
- fix conflicts
- commit
- push
Art of the commit: it is best to create small, targeted commits when collaborating. Each commit should do one specific thing making it easier for others to understand your work and navigate the history if necessary.
RStudio Integration
RStudio has builtin integration with Git, allowing you do complete the basic commands with the interface.
Basically, an RStudio “Project” will equal a Git repo.
You can directly clone
a repo from GitHub using File > New Project > Version Control > Git.
When you open a Project that is a git repo, you should see the “Git” menu on the interface giving access to the git commands.
- RStudio support, Version Control with Git and SVN
- SWC, Using Git from RStudio
- Happy Git and GitHub for the useR
Project management features
GitHub adds many handy web-based features to manage your projects / Git repositories:
- Issues. Create an issue to discuss and track ideas, bugs, projects, requests, etc. Can be assigned to people, tagged, and more. Also allows people outside of the project report problems with your code. Be sure to create checklists in the first comment box (
- [ ] step
)–they become click-able and show progress in the Issue view. - Projects. Create Trello board like lists to organize work.
- Wiki. Simple wiki-style documentation that can be edited by your collaborators (note: written in Markdown not wikitext).
To make the most of these features, you will want to learn Markdown because it’s a great way to write and is used everywhere on GitHub.
Issues, PR, and commits can be mentioned in any GitHub comment and will be replaced by reference shortlinks.
GitHub users can be mentioned using @
and will be notified of your comment.
gh-pages
GitHub also offers free web hosting for your project, organization, or personal profile. Check workshop Go-go gh-pages!
Also, use GitHub Gist to instantly share simple notes, outlines, snippets, etc.
Citable Code
GitHub repos can be integrated with Zenodo to issue a DOI. DOI are a persistent identifier used in academic writing to cite articles and other works. Having a DOI for your code makes it easier to track citations and impact. Learn how at Making Your Code Citable.
License
Choosing a license is important if you want share data, code, and content. It ensures users have legal rights to reuse, and reserves your rights as the creator. The FAIR Principles point out that having a clear license is a requirement for legal interoperability of data.
It is convention to add a file as LICENSE
or LICENSE.md
to a repository so that people can easily find the license.
GitHub can automatically generate some open source licenses for you, see Adding a license to a repository.
- Choose an open source license (for software)
- Creative Commons Choose a License (more common for text and data)
Resources
- Software Carpentry, Version Control with Git
- Data Carpentry for discipline specific data lessons
- Git Book
- Git for Teams
- Safari Books Online