"Git Started" with Git!
Tutorial courtesy of Joseph Hobbs.
This article will teach you how to use Git, a software version control system.
What is Git?
Git is a free and open-source Version Control System (VCS) used to manage software projects.
Section 1: How does Git work?
Git tracks changes made to a working directory over time, enabling multiple people to work together on projects simultaneously. Git captures changes in a series of snapshots called commits. Each commit contains information about changes made to the working directory.
Git chains commits together into a history, containing the work done in the working directory. A working directory and its history are together referred to as a Git repository.
Section 2: How does Git store history?
Git stores history by linking each commit to one or more parent commits. This creates a continuous "chain" of commits, starting at an initial commit.
For an example, consider four commits: A
, B
, C
, and D
. Allow A
to be the initial
(first) commit to the repository. The repository may have a structure like the following.
A <- B <- C <- D
We see here that commit A
was made first and has no parent. Commit B
was created after
commit A
and uses commit A
as its parent. Similarly, commit C
uses commit B
as its
parent and commit D
uses commit C
as its parent.
In reality, Git uses SHA-1 hashes to refer to each of its commits. SHA-1 hashes look something
like this: bb112b7886eeadf91a4a2e2da230b47717a74b80
. That's a mouthful! Fortunately, when
using Git, you may refer to a commit using the first seven characters, like this: bb112b7
.
Practical Exercise
Let's create a commit in our new repository.
In order to create a commit, you must first make a change.
Create a text file called hello.txt
and save to it the string Hello, world!
.
Stage all changes to the directory using the following command. You must always stage changes like this before committing.
$ git add .
Now, commit changes to the repository with the message "Create hello.txt".
$ git commit -m "Create hello.txt"
You will see a message similar to this one. The commit hash will be different.
[master eb4db36] Create hello.txt
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 hello.txt
Section 3: Creating Branches
We will now learn a critical Git concept... branches.
Most simply, a branch is a pointer to a commit.
Recall our previous example.
A <- B <- C <- D (master)
Every Git repository has at least one branch. This is usually called master
or
main
, though different developers have different preferences as to what their
first branch is named. We see in this situation that the master
branch points
to commit D
.
When we create commit E
with parent commit D
, Git automatically moves the
master branch downstream.
A <- B <- C <- D <- E (master)
Let's say you've been making commits to your repository for a while now and it's now time to introduce a new feature. Git users consider it best practice to do this work on a separate branch and then merge your changes in later. This allows you to work out any bugs in your new feature before releasing it with the rest of the software. This is accomplished using Git's branching procedures.
Before we get our hands dirty, let's look at an example.
Suppose we want to add a feature called foo
to our code. Implementing foo
will
likely introduce a large number of bugs, and we want to work out all of these bugs
before introducing the feature to the master
branch. We will do so by creating
a new branch called foo
.
A <- B <- C <- D <- E (master, foo)
We see that we've not actually changed our repository structure... we've only added
a second pointer to commit E
. However, let's see what happens when we create
commit F
on branch foo
.
--- F (foo)
/
A <- B <- C <- D <- E (master)
We now have two concurrent branches, and we can make changes to the repository on
branch foo
without having to worry about changing our functional code present
on master
.
Let's create three more commits on foo
.
--- F <- G <- H <- I (foo)
/
A <- B <- C <- D <- E (master)
We've added commits G
, H
, and I
. These three commits allowed us to fully
implement the feature foo
without any disruptions to the master
branch.
Practical Exercise
Let's create a new branch called foo
.
$ git branch foo
Now, we will checkout to this new branch. When using Git, the process of
checking out involves "moving" to a given commit, modifying the files in
the working directory to match the state of that commit. Because foo
doesn't
have any changes to it yet, checking out to foo
won't change our code.
However, we still need to check out in order to ensure that all future changes
are made to foo
and not to master
.
$ git checkout foo
We are now checked out to foo
and can modify, stage, and commit code as usual.
After fixing the bug, we stage all changes, like so.
$ git add .
We then commit changes to the repository on the foo
branch.
$ git commit -m "Fixed the bug"
Section 4: Checking Out
Very often, when using Git, we want to change where we are in the revision history. Let's look at an example.
A <- B <- C <- D (master)
Right now, we are looking at (checked out to) the master
branch, which is currently
pointing to commit D
. The commit we are checked out to is referred to as HEAD
.
Because we are currently checked out to a branch, we have an attached HEAD
.
A <- B <- C <- D (master <- HEAD)
Let's detach HEAD
by checking out to commit B
. Notice that, even though B
is
contained within the master
branch, we will not be checked out to the master
branch
because master
points at commit D
.
$ git checkout B
Recall that B
will be replaced with a commit hash. Without modifying commit D
, this
will change the files in our working directory to reflect commit B
.
A <- B (HEAD) <- C <- D (master)
Again, this is referred to as a detached HEAD
. We can easily reattach HEAD
by
checkout out to any branch (for example, master
). If we create commit E
in this state,
the following will occur.
--- E (HEAD)
/
A <- B <- C <- D (master)
Notice that we have branched our commit history, but we haven't actually created another
branch because we haven't created a branch pointer. If we wanted to create another branch
here named bar
and point it at commit E
, we could easily do so using the following command.
$ git branch bar
This would have the following effect.
--- E (bar <- HEAD)
/
A <- B <- C <- D (master)
As a quick note, it is possible to be checked out directly to commit D
and still
have a detached head, like so.
--- E (bar)
/
A <- B <- C <- D (master, HEAD)
Notice that, because HEAD
doesn't point to master
, we are currently in a detached
HEAD
state. Any commits made in this state will not update the master
branch.
For example, creating commit F
in this state will have the following effect.
--- E (bar)
/
A <- B <- C <- D (master) <- F (HEAD)
Section 5: Merging Branches
Having fixed the bug on foo
, we now want to merge foo
into master
.
The merge process creates commit J
, called a merge commit, on the master
branch. The merge process also moves the master
branch pointer up to the
merge commit.
--- F <- G <- H <- I (foo) <--
/ \
A <- B <- C <- D <- E <----------------------------- J (master)
Notice that commit J
has two parents: I
and E
. This means that the
master
branch now contains all the work in the repository.
Unfortunately, Git merges often result in merge conflicts. This happens when Git doesn't know how to automatically merge work performed on two separate branches. Let's show an example of this.
--- F (foo)
/
A <- B <- C <- D <- E <- G (master)
Notice that, in the diagram above, commits F
and G
have been made downstream
of the most common ancestor commit E
. If commits F
and G
modify the same
file but in different ways, then Git won't know which change to accept. This
is called a merge conflict. Merge conflicts typically have to be resolved
manually.
Practical Exercise
Let's checkout to the master
branch.
$ git checkout master
Now, we attempt to merge foo
.
$ git merge foo
Git will tell us via a terminal message if there are merge conflicts. If there
are merge conflicts, Git will also tell us in which files they occurred. We can
see a summmary of this by running the diff
command.
$ git diff
We can now enter each conflicted file and manually resolve the conflict. A Git merge conflict looks like this.
<<<<<<< HEAD
content on the `master` branch
=======
content on the `foo` branch
>>>>>>> foo
Betweeen <<<<<<< HEAD
and =======
is the content on the current branch (in
this case, master
). Between =======
and >>>>>>> foo
is the content on
the incoming branch (in this case, foo
). To resolve the conflict, manually
remove the dividers and replace the code with your desired resolution. This
may involve discarding the current branch's changes, discarding the incoming
branch's changes, removing both, or integrating them.
Section 6: Blaming
Sometimes, we want to know exactly who made which changes to a certain file.
In order to do this, we use the blame
command. Let's run git blame
on
a fictitious file called README.md
.
$ git blame README.md
This prints the following to the terminal.
00000000 (John 2001-02-28 16:00:00 -0500 1) # An example README
00000000 (John 2001-02-28 16:00:00 -0500 2)
abcdef99 (Jill 2002-01-01 13:20:10 -0500 3) This is a great README file
deadbeef (Jack 2017-01-01 17:47:24 -0500 4) with lots of useful information!
On each line, we see a shortened commit hash, an author name, a timestamp, a line number, and the line contents. This allows us to understand exactly who last modified each line, when s/he modified it, and in which commit the modification was introduced.
Section 7: Difference between Two Files
Another nice tool to have in our toolbox is the ability to find the difference
between two branches or between two files. We will do this using the git diff
command.
We've previously seen git diff
in Section 4 when resolving merge conflicts,
but git diff
can do a lot more than that. git diff
on its own tells us what's
changed since the last commit. And if we get fancy with the command, we can
understand how a file's changed over time.
To use git diff
like this, we must specify two commits (or branches!) like so.
$ git diff A B
This will tell us the differences between commits A
and B
. We can get even
more specific by looking at the differences in README.md
between commits A
and B
using the following syntax.
$ git diff A B -- README.md
This will print to the terminal a summary of all the changes made to README.md
from commit A
to commit B
.
Practice with the Git Started Game!
Let's practice all of this with the Git Started game!
First, clone the Git Started game to your computer.
$ git clone git@github.com:MASLAB/git-started.git
Enter the git-started
directory.
$ cd git-started
Using the cat
command, print the README to the terminal and begin!
$ cat README.md