bsdpower.com

Learn to use git correctly

It is appaling to me how many people, despite using git and doing so for a long time, seemingly remain ignorant of what are basic git operations. Please, for the sake of everyone around you if not for your own, spend a bit of time understanding and learning this tool.

Reading source

Source code exists for two purposes: writing it, which typically accomplishes some task, and reading it, which typically is done during debugging or otherwise to answer questions like "why was this done?", "when was this done?", "what was this doing before?".

You read source every day, because you don't write programs from scratch. But maybe you only read the current source. Revision history gives a temporal dimension to the source, allowing you to examine what the source was at any point in the past. I'm sure you have heard of this notion, but how often do you actually browse source history?

If the answer is "pretty much never", we have some work to do.

First tool I am going to introduce you to is gitk. It is part of git and should be installed on your system. Try it now in whichever repository is most convenient:

gitk

This gives you a tree of commits in the current branch. Depending on the project, it may be a list of commits that does not branch at all, or it may be a tree with more merges than non-merge commits where figuring out what commit belongs to where is far from trivial.

The other gitk mode of operation you need to know is viewing history of all branches at the same time:

gitk --all

If your project has a single master branch, this view will be identical to the current branch gitk view. On a project with many branches, this view will be more involved and likely more difficult to follow.

Now, what's the point of looking at gitk you ask? Here are some typical use cases:

Locating related commits

Suppose you identified a commit that changes behavior in some way, but this commit is part of a larger set of changes (think of a pull request). How can you figure out which pull request the commit was part of, and what other changes were made in that pull request?

Open gitk and enter the commit's hash into the appropriate box. Look up and down the history around the commit. In a repository with clean history, the branch on which the commit was made was branched off master and merged back to master. So you can follow the branch up and down to find the pull request number and identify other commits in the branch as well as see what they changed.

In a repository with messy history, especially for long lived branches, there might be other merges present. These additional merges make it more difficult to follow history. We'll get to examples of useless merges later.

Identifying common and different commits between branches

Suppose you have two branches that share some commits. How do you identify where the branches diverged, how much they have in common and how much they differ? You can do all of this with command line git tools but it is typically much more efficient to use gitk. Open it with the branches in question:

gitk branch1 branch2

and navigate to one of the branches. Follow the branch's history until you see a split point - that's the other branch splitting. Keep following history of original branch back to master or whatever the main branch is.

If the history is clean, you often can (and I do) have a single instance of gitk --all running with all branches visible, and you can locate any history information you need from that view.

Reviewing commits

Now that you hopefully have a bit of an appreciation for clean history, the next two commands I am going to cover are git status and git diff. I'm sure you have used them, but you should use them each time you are about to commit something. I see people working like this all the time:

... make some changes in an editor ...
git add .
git commit -m 'Fixed bug whatever'

This workflow simply commits whatever changes are in your working tree. Which, ideally, are exactly the changes that should be committed, but every now and again will have some debugging output, a syntax error or a mix of changes for unrelated bugs or features that should have been separate commits.

Instead your workflow should be this:

... make some changes in an editor ...

git status
... review the list of changed files, are any irrelevant files changed?
... should any untracked files be added?
... are there any untracked files that should NOT be added?

If there are no untracked files:

git diff
... review the changes you are about to commit.
... are there any irrelevant changes? are the changes complete?

# -a to commit everything without an add step
git commit -am 'Fixed bug whatever'

If there are untracked files:

git add .
git diff --cached
... review the changes you are about to commit ...

git commit -m 'Fixed bug whatever'

If your review of the changes finds that there are multiple unrelated changes that you are about to commit, stop and split the commit:

git reset
# add all changes in a file
git add <file>
# interactively add changes in all files
git add -p
# interactively add changes in a file
git add -p <file>
# repeat the addition for other files as necessary
# when done, review again:
git diff --cached

I alias git diff --cached to git dc to save my fingers:

git config --global alias.dc 'diff --cached'

... along with a bunch of other aliases that you can see here.

Rebasing and interactive rebasing

Rebasing is a feature unique to git that allows you to change development history. Other version control systems, subversion for example, insist that once a commit is made it cannot be altered or removed. Git permits anyone to do anything to any of the existing commits - combine them, split them, change their contents or commit message, change the author, and - not to be overlooked - to take them from one branch and apply them to another branch instead.

Rebasing is a topic well covered in various guides on the Internet, so I will not spend time explaining how to do it. I will, however, explain why you should do it.

Commit management

Each commit should ideally do one thing, and do that thing completely, and be easily understandable. Well, when a "thing" is a major feature completeness is at odds with readability. A 2,000 line commit is not readable. Its commit message probably overlooks many fine gotchas that are hiding in those 2,000 lines of changes.

Therefore, you should favor small, readable commits over "complete" commits.

As we are all human, we don't always commit often enough. Especially if we have to think "are these really all of the changes I am going to make?", it is easy to have commits that are too large. With rebasing in your tool belt, you commit as often as you take breaks. Each tiny change that you are mentally done with goes into a commit. Later, when you are finished with development, you can perform and interactive rebase and squash together commits that are, for example, implementation of some feature and a trivial bug fix in the same feature, where the value of the bug fix being standalone from the feature is nil.

Similarly you have the tools to take a commit that you realize is too big, or does too much, and split it into several smaller commits. This typically does not happen often on short lived branches but it becomes crucial when dealing with complex changes on long lived branches.

Clean history

Once you understand rebasing, you can rebase your feature branches on the main development branch, usually master, before you pull request them. This gets rid of merges of master into your feature branches which are typically nothing but distracting noise.

With rebasing you will no longer have merge commits between the same feature branch in different repositories, if you are using multiple computers. Such merges are nearly universally noise because you are merging with yourself and as such the probability of conflicts is nearly zero.

Success

If you got this far, your development histories are readable, your commits are readable and other people can easily read your code. Congratulations!