"Git Started" with Git!

Tutorial courtesy of Joseph Hobbs.

This article will teach you how to use Git, a software version control system.

What is Git?

Git is a free and open-source Version Control System (VCS) used to manage software projects.

Section 1: How does Git work?

Git tracks changes made to a working directory over time, enabling multiple people to work together on projects simultaneously. Git captures changes in a series of snapshots called commits. Each commit contains information about changes made to the working directory.

Git chains commits together into a history, containing the work done in the working directory. A working directory and its history are together referred to as a Git repository.

Section 2: How does Git store history?

Git stores history by linking each commit to one or more parent commits. This creates a continuous "chain" of commits, starting at an initial commit.

For an example, consider four commits: A, B, C, and D. Allow A to be the initial (first) commit to the repository. The repository may have a structure like the following.

A <- B <- C <- D

We see here that commit A was made first and has no parent. Commit B was created after commit A and uses commit A as its parent. Similarly, commit C uses commit B as its parent and commit D uses commit C as its parent.

In reality, Git uses SHA-1 hashes to refer to each of its commits. SHA-1 hashes look something like this: bb112b7886eeadf91a4a2e2da230b47717a74b80. That's a mouthful! Fortunately, when using Git, you may refer to a commit using the first seven characters, like this: bb112b7.

Practical Exercise

Let's create a commit in our new repository.

In order to create a commit, you must first make a change.

Create a text file called hello.txt and save to it the string Hello, world!.

Stage all changes to the directory using the following command. You must always stage changes like this before committing.

$ git add .

Now, commit changes to the repository with the message "Create hello.txt".

$ git commit -m "Create hello.txt"

You will see a message similar to this one. The commit hash will be different.

[master eb4db36] Create hello.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hello.txt

Section 3: Creating Branches

We will now learn a critical Git concept... branches.

Most simply, a branch is a pointer to a commit.

Recall our previous example.

A <- B <- C <- D (master)

Every Git repository has at least one branch. This is usually called master or main, though different developers have different preferences as to what their first branch is named. We see in this situation that the master branch points to commit D.

When we create commit E with parent commit D, Git automatically moves the master branch downstream.

A <- B <- C <- D <- E (master)

Let's say you've been making commits to your repository for a while now and it's now time to introduce a new feature. Git users consider it best practice to do this work on a separate branch and then merge your changes in later. This allows you to work out any bugs in your new feature before releasing it with the rest of the software. This is accomplished using Git's branching procedures.

Before we get our hands dirty, let's look at an example.

Suppose we want to add a feature called foo to our code. Implementing foo will likely introduce a large number of bugs, and we want to work out all of these bugs before introducing the feature to the master branch. We will do so by creating a new branch called foo.

A <- B <- C <- D <- E (master, foo)

We see that we've not actually changed our repository structure... we've only added a second pointer to commit E. However, let's see what happens when we create commit F on branch foo.

                      --- F (foo)
                     /
A <- B <- C <- D <- E (master)

We now have two concurrent branches, and we can make changes to the repository on branch foo without having to worry about changing our functional code present on master.

Let's create three more commits on foo.

                      --- F <- G <- H <- I (foo)
                     /
A <- B <- C <- D <- E (master)

We've added commits G, H, and I. These three commits allowed us to fully implement the feature foo without any disruptions to the master branch.

Practical Exercise

Let's create a new branch called foo.

$ git branch foo

Now, we will checkout to this new branch. When using Git, the process of checking out involves "moving" to a given commit, modifying the files in the working directory to match the state of that commit. Because foo doesn't have any changes to it yet, checking out to foo won't change our code. However, we still need to check out in order to ensure that all future changes are made to foo and not to master.

$ git checkout foo

We are now checked out to foo and can modify, stage, and commit code as usual.

After fixing the bug, we stage all changes, like so.

$ git add .

We then commit changes to the repository on the foo branch.

$ git commit -m "Fixed the bug"

Section 4: Checking Out

Very often, when using Git, we want to change where we are in the revision history. Let's look at an example.

A <- B <- C <- D (master)

Right now, we are looking at (checked out to) the master branch, which is currently pointing to commit D. The commit we are checked out to is referred to as HEAD. Because we are currently checked out to a branch, we have an attached HEAD.

A <- B <- C <- D (master <- HEAD)

Let's detach HEAD by checking out to commit B. Notice that, even though B is contained within the master branch, we will not be checked out to the master branch because master points at commit D.

$ git checkout B

Recall that B will be replaced with a commit hash. Without modifying commit D, this will change the files in our working directory to reflect commit B.

A <- B (HEAD) <- C <- D (master)

Again, this is referred to as a detached HEAD. We can easily reattach HEAD by checkout out to any branch (for example, master). If we create commit E in this state, the following will occur.

       --- E (HEAD)
      /
A <- B <- C <- D (master)

Notice that we have branched our commit history, but we haven't actually created another branch because we haven't created a branch pointer. If we wanted to create another branch here named bar and point it at commit E, we could easily do so using the following command.

$ git branch bar

This would have the following effect.

       --- E (bar <- HEAD)
      /
A <- B <- C <- D (master)

As a quick note, it is possible to be checked out directly to commit D and still have a detached head, like so.

       --- E (bar)
      /
A <- B <- C <- D (master, HEAD)

Notice that, because HEAD doesn't point to master, we are currently in a detached HEAD state. Any commits made in this state will not update the master branch. For example, creating commit F in this state will have the following effect.

       --- E (bar)
      /
A <- B <- C <- D (master) <- F (HEAD)

Section 5: Merging Branches

Having fixed the bug on foo, we now want to merge foo into master.

The merge process creates commit J, called a merge commit, on the master branch. The merge process also moves the master branch pointer up to the merge commit.

                      --- F <- G <- H <- I (foo) <--
                     /                              \
A <- B <- C <- D <- E <----------------------------- J (master)

Notice that commit J has two parents: I and E. This means that the master branch now contains all the work in the repository.

Unfortunately, Git merges often result in merge conflicts. This happens when Git doesn't know how to automatically merge work performed on two separate branches. Let's show an example of this.

                      --- F (foo)
                     /
A <- B <- C <- D <- E <- G (master)

Notice that, in the diagram above, commits F and G have been made downstream of the most common ancestor commit E. If commits F and G modify the same file but in different ways, then Git won't know which change to accept. This is called a merge conflict. Merge conflicts typically have to be resolved manually.

Practical Exercise

Let's checkout to the master branch.

$ git checkout master

Now, we attempt to merge foo.

$ git merge foo

Git will tell us via a terminal message if there are merge conflicts. If there are merge conflicts, Git will also tell us in which files they occurred. We can see a summmary of this by running the diff command.

$ git diff

We can now enter each conflicted file and manually resolve the conflict. A Git merge conflict looks like this.

<<<<<<< HEAD
content on the `master` branch
=======
content on the `foo` branch
>>>>>>> foo

Betweeen <<<<<<< HEAD and ======= is the content on the current branch (in this case, master). Between ======= and >>>>>>> foo is the content on the incoming branch (in this case, foo). To resolve the conflict, manually remove the dividers and replace the code with your desired resolution. This may involve discarding the current branch's changes, discarding the incoming branch's changes, removing both, or integrating them.

Section 6: Blaming

Sometimes, we want to know exactly who made which changes to a certain file. In order to do this, we use the blame command. Let's run git blame on a fictitious file called README.md.

$ git blame README.md

This prints the following to the terminal.

00000000 (John 2001-02-28 16:00:00 -0500  1) # An example README
00000000 (John 2001-02-28 16:00:00 -0500  2) 
abcdef99 (Jill 2002-01-01 13:20:10 -0500  3) This is a great README file
deadbeef (Jack 2017-01-01 17:47:24 -0500  4) with lots of useful information!

On each line, we see a shortened commit hash, an author name, a timestamp, a line number, and the line contents. This allows us to understand exactly who last modified each line, when s/he modified it, and in which commit the modification was introduced.

Section 7: Difference between Two Files

Another nice tool to have in our toolbox is the ability to find the difference between two branches or between two files. We will do this using the git diff command.

We've previously seen git diff in Section 4 when resolving merge conflicts, but git diff can do a lot more than that. git diff on its own tells us what's changed since the last commit. And if we get fancy with the command, we can understand how a file's changed over time.

To use git diff like this, we must specify two commits (or branches!) like so.

$ git diff A B

This will tell us the differences between commits A and B. We can get even more specific by looking at the differences in README.md between commits A and B using the following syntax.

$ git diff A B -- README.md

This will print to the terminal a summary of all the changes made to README.md from commit A to commit B.

Practice with the Git Started Game!

Let's practice all of this with the Git Started game!

First, clone the Git Started game to your computer.

$ git clone git@github.com:MASLAB/git-started.git

Enter the git-started directory.

$ cd git-started

Using the cat command, print the README to the terminal and begin!

$ cat README.md