Before We Begin
Disclaimer: In our book with Harry Paarsch I have written an introduction to working with Git for absolute beginners. If you have never worked with Git before, I would strongly encourage you to get familiar with that write-up first, or at least with some version of “Git for absolute beginners”. To get the most out of this write-up, you would need to be familiar with the basics of working with Git, including:
- tracking files,
- committing changes to files,
- creating branches to experiment with changes to files, and
- merging branches and potentially resolving conflicts.
Cloning A Git Repository
There are a few additional things to understand about Git that arise when
working with others on a shared codebase.
In such situation, there is usually a central Git repository to which everyone
has access.
Commonly it is hosted on GitHub, but any remote server can be used to host a
Git repository, and we will not discuss how to set one up from scratch.
Between GitHub, Bitbucket, Gitlab, and internal corporate solutions such as
Azure DevOps, you will almost never have to configure this thing yourself.
Internally in Git, such central repository is usually called remote, and the
default name for the remote repository is origin
.
Every user who is working on the codebase starts by cloning the origin
repository to their local machine.
The first thing to understand is that by default cloning creates an entire
local replica of the repository with the complete history of all changes to all
files that are tracked as part of the codebase.
However, the cloned repository will only be identical to the central source at
the time of cloning.
Any subsequent changes to the origin
repository will not get automatically
reflected in the local clone.
It is our job to periodically synchronize the local version of the repository
with the origin
, and Git will help us minimize the pain of doing this.
Working Example
Setup
For the example we will assume the identity of the user named “Konstantin at Home”. Most of the action will be done by him, and to illustrate the process of collaboration we will introduce another user, “Konstantin at Work”.
We will work through the example from the book,
which involves two small R
files: GitExampleMain.R
and
GitExampleFunctions.R
.
Below is their initial contents:
> cat GitExampleMain.R
#############################
### GitExampleMain.R ###
#############################
rm(list = ls())
source("functions.R")
x <- 3
y <- 4
z <- AddArguments(x, y)
print(z)
> cat GitExampleFunctions.R
#############################
### GitExampleFunctions.R ###
#############################
AddArguments <- function(x, y) {
return(x + y)
} # end AddArguments
For illustration purposes we will make a change - notice that the
GitExampleMain.R
file calls the GitExampleFunctions.R
file, but is using
an incorrect name to refer to it.
We will fix this and at the same time get rid of the rm(list = ls())
line as
well.
Here is the updated version of GitExampleMain.R
:
> cat GitExampleMain.R
#############################
### GitExampleMain.R ###
#############################
source("GitExampleFunctions.R")
x <- 3
y <- 4
z <- AddArguments(x, y)
print(z)
Assuming that these are the only changes we made, here is how the commit history looks And this is how the commit history currently looks on “Konstantin at Home”’s machine:
> git log
commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2 (HEAD -> master, origin/master)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:28:13 2020 -0700
Fixed imports, got rid of rm(ls)
commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:25:08 2020 -0700
first commit
Notice this piece at the top of the latest commit:
(HEAD -> master, origin/master)
.
Here is what the pieces mean.
HEAD
is the internal Git term that means “latest commit”, i.e. the current
state of affairs.
It is currently pointing to the master
branch on the local repository.
Next, origin/master
is the corresponding commit on the remote repository and the
name of the branch on the remote.
Essentially the part master, origin/master
should be read as “local branch
master
corresponds to branch master
on the origin
remote repository”.
To complete the setup, we will introduce another change to the code, this time
by the “Konstantin At Work” user.
He will introduce default arguments to the AddArguments()
function.
Here is the new GitExampleFunctions.R
file:
> cat GitExampleFunctions.R
#############################
### GitExampleFunctions.R ###
#############################
AddArguments <- function(x=1, y=2) {
return(x + y)
} # end AddArguments
And here is how the commit history looks in “Konstantin at Work”’s version of the repository:
> git log
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Thu Oct 15 21:58:48 2020 -0700
Added default arguments to function
commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:28:13 2020 -0700
Fixed imports, got rid of rm(ls)
commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:25:08 2020 -0700
first commit
Let us now work through the example of making a change in the codebase in details.
Overview
When we work on making changes to a shared code repository, we will be expected to go through the following set of steps:
- Catch up our local master branch with the remote.
- Make a new feature branch locally off master.
- Implement the necessary code changes, making as many commits as we deem necessary, all to our local feature branch.
- Once feature development is complete, catch up our local master branch with the remote again. If the development work took more than a couple days, this is absolutely essential, but it is a good practice to do this step every time.
- Push our local feature branch to the remote repository, making a feature branch on the remote in the process.
- Open a pull request, also known as code review request, in which we will
propose to merge our feature branch into the master branch on the
origin
repository. - Wait for someone else to approve our code changes.
- Perform the merge once the pull request got approved.
- Once the merge is done, go back to our local repository and update the
local master branch to catch it up with the
origin
one final time. - At this point we can optionally delete the local feature branch.
That is a lot of steps, but most of them take very little time. Usually the majority of time is spent in actual feature development and addressing feedback from the code review.
Step 1: Get Our Repo Up To Speed
We start by pulling from the origin
repository:
> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:kgolyaev/work_with_remotes
3d89e35..2d13a36 master -> origin/master
Updating 3d89e35..2d13a36
Fast-forward
GitExampleFunctions.R | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Running git log
will let us see what changes happened on the remote server:
> git log
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (HEAD -> master, origin/master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Thu Oct 15 21:58:48 2020 -0700
Added default arguments to function
commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:28:13 2020 -0700
Fixed imports, got rid of rm(ls)
commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Thu Oct 15 21:25:08 2020 -0700
first commit
We can see that our collaborator, “Konstantin at Work”, added default arguments
to the AddArguments()
function.
We are now up-to-date with the centralized repository, time to start
development.
Step 2: Create Feature Branch
This is very fast:
> git checkout -b kghome/print_to_cat
Switched to a new branch 'kghome/print_to_cat'
The command git checkout -b
creates a new branch and switches to it.
Notice the name we chose for the branch: it starts with our user name, and is
followed by a summary of the code change that we plan to implement.
This is a good practice to stick to, and if we work with a group of people,
they may have their own preferred taxonomy from naming branches.
If so, we should stick to it.
Step 3: Develop New Feature
For our example feature development will be trivial: we will replace the call
to print()
with call to cat()
. Here is the revised version of the
GitExampleMain.R
file:
#############################
### GitExampleMain.R ###
#############################
source("GitExampleFunctions.R")
x <- 3
y <- 4
z <- AddArguments(x, y)
cat(z)
And here is how our local git repository will look like after we commit these
changes to the kghome/print_to_cat
branch:
> git commit -a -m "refactored: replaced print() with cat()"
[kghome/print_to_cat 08c4d08] refactored: replaced print() with cat()
1 file changed, 1 insertion(+), 1 deletion(-)
> git log -n 2
commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a (HEAD -> kghome/print_to_cat)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 21:56:01 2020 -0700
refactored: replaced print() with cat()
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (origin/master, master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Thu Oct 15 21:58:48 2020 -0700
Added default arguments to function
Notice that we used git log -n 2
to only show the last two commits.
Also notice that latest commit is on the kghome/print_to_cat
branch, and the
latest commit on the master
branch is still the one where function
AddArguments()
received default values for input arguments.
This should make sense: we have not added any new commits to the master
branch since we ran git pull
in Step 1.
Step 3 would often take a while, if you seek to develop a non-trivial feature.
Let us imagine that in the meantime our collaborator, “Konstantin at Work”,
changed the master
branch by replacing the values of the default input
arguments to AddArguments()
function.
This is how the last two commits look at the origin
in the meanwhile:
> git log -n 2
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Sun Oct 18 22:09:34 2020 -0700
changed default argument values to x=3 and y=4
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Thu Oct 15 21:58:48 2020 -0700
Added default arguments to function
Step 4: Get Up-to-Date Again
At this point, “Konstantin at Home” is done with feature development, and because it took us a while, our local repository had fallen behind. So we have to catch up:
> git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:kgolyaev/work_with_remotes
2d13a36..cb3f71d master -> origin/master
Updating 2d13a36..cb3f71d
Fast-forward
GitExampleFunctions.R | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
This step can be confusing, so make sure you fully understand what is going on
here.
First, we switch from feature branch kghome/print_to_cat
back to master
.
Then we run git pull
to get the latest commits from the master
branch
on the remote repository and catch up our version of master
.
Here is what the last three commits are in “Konstantin at Home”’s repository:
> git log -n 3 --all
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (HEAD -> master, origin/master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Sun Oct 18 22:09:34 2020 -0700
changed default argument values to x=3 and y=4
commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a (kghome/print_to_cat)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 21:56:01 2020 -0700
refactored: replaced print() with cat()
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Thu Oct 15 21:58:48 2020 -0700
Added default arguments to function
The second and the third commits are exactly the same we saw in Step 3, and
the most recent commit is the one from “Konstantin at Work”.
Notice that we had to add --all
option to the git log
command to see commits
from all branches, because by default it only shows us commits from the current
branch.
It is important to understand that we cannot pull from the origin
while on
our feature branch.
Here is what happens if we try:
> git checkout kghome/print_to_cat
Switched to branch 'kghome/print_to_cat'
> git pull
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=origin/<branch> kghome/print_to_cat
This makes sense: feature branch kghome/print_to_cat
was created in our
local replica of the repository, and so far we have not attempted to push
changes made locally to the origin
.
This will come later, in Step 5.
For now, we need finish catching up: our local master
branch is now in sync
with the remote, but our local feature branch is not.
To fix this, we merge the master
branch into the feature branch:
> git checkout kghome/print_to_cat
Switched to branch 'kghome/print_to_cat'
> git merge master
Merge made by the 'recursive' strategy.
GitExampleFunctions.R | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
We have deliberately selected a simple example and were careful to avoid
changes that may cause merge conflicts.
Realistically we should be prepared to handle conflicts every time we run
git merge
or git pull
commands, but this is not the focus of our guide.
Also, we may be asked to provide a commit message when running
git merge master
line.
Here are the three latest commits in “Konstantin at Home”’s repository after
this step:
> git log -n 3 --all
commit 71b4e1aa8557a53b0f76236039b6a85c2c43afd5 (HEAD -> kghome/print_to_cat)
Merge: 08c4d08 cb3f71d
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 22:30:53 2020 -0700
Catching up feature branch work with master
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (origin/master, master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Sun Oct 18 22:09:34 2020 -0700
changed default argument values to x=3 and y=4
commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 21:56:01 2020 -0700
refactored: replaced print() with cat()
Step 5: Push Feature Branch to Remote
At this point, our feature branch is up-to-date with master
and also has the
latest feature code that we wanted to develop in the first place.
It is now time to push it to the origin
:
> git push --set-upstream origin kghome/print_to_cat
Enumerating objects: 9, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 8 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 588 bytes | 588.00 KiB/s, done.
Total 5 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
remote:
remote: Create a pull request for 'kghome/print_to_cat' on GitHub by visiting:
remote: https://github.com/kgolyaev/work_with_remotes/pull/new/kghome/print_to_cat
remote:
To github.com:kgolyaev/work_with_remotes.git
* [new branch] kghome/print_to_cat -> kghome/print_to_cat
Branch 'kghome/print_to_cat' set up to track remote branch 'kghome/print_to_cat' from 'origin'.
Notice that we had to use the --set-upstream
flag to git push
.
Without it, Git does not know what to do with the contents of the new branch
that we created locally.
We have to spell out origin
explicitly because it is possible to work with
multiple remote repositories in Git, but this is out of scope for this guide.
Step 6: Open Pull Request
We have pushed the feature code to the remote repository, but our job is not
yet done.
The feature code is in the feature branch kghome/print_to_cat
, and we want
to merge that code into master
on the origin
, so that this
change would propagate to all our collaborators.
Generally we would not be able to make a straight merge into master
branch
on the central repository.
Instead, we need to open a pull request, also known as PR, or as a “code
review request” for others to evaluate the changes we propose.
Every Git hosting system will have a slightly different interface for creating
a pull request.
On GitHub, we will usually be prompted to create a PR whenever it detects a
push to a new feature branch.
By default, we will be creating the PR from kghome/print_to_cat
branch into
the master
branch.
We will need to provide a title and a description of the changes our code
proposes, and at the bottom we will see what lines in which files will actually
be affected.
In our example, the changes are minimal:
x <- 3
y <- 4
z <- AddArguments(x, y)
- print(z)
+ cat(z)
The minus in front of the line means the line will be removed, and the plus means the line will be added. On GitHub website, the “-” lines will have red background and the “+” lines will have green background, so telling them apart will be easier.
Once the PR is created, Git will check if the proposed merge would cause any conflicts. Often creating a PR also kicks off all the tests that are usually developed as part of the project, as well as code quality and formatting checks. Depending on the complexity, these tests may take some time to run, and if any of them fail, we will be expected to change the code on our feature branch to make sure all tests pass. But this is a topic for another guide.
Step 7: Wait for PR Approval
Usually at least one other collaborator has to manually approve a pull request
into the master
branch.
Often the reviewers will ask us to make changes to our feature branch code
and withhold their approval until such changes are made.
On a real project, this step can really take a while, sometimes longer than
Step 3 in which we actually developed the feature.
But for our guide we will fast forward to the point when the PR gets approved.
Step 8: Merge PR Into Master
Once all tests and code checks have succeeded, and a reviewer approved the PR,
we can click the “Merge Pull Request” button to actually perform the merge.
Some teams have preferences as to whether to use the “squash merge” or “rebase
merge”, which will impact how the final commit history will look.
I personally prefer “squash merge”, where all commits to the feature branch are
combined into one, because I tend to make a lot of tiny commits as I develop
code.
We choose “squash and merge” and confirm the choice, which should take a second
or two.
At this point, GitHub offers to delete the kghome/print_to_cat
branch, and I
suggest you do this, by clicking the “Delete Branch” button.
Once feature code is merged into master
, there is no reason to keep the
feature branch around, particularly if we might want to reuse its name later.
Pruning code branches that are no longer being used is a good practice for
keeping the repository healthy and organized.
Step 9: Catch Up Locally With Remote One Last Time
Careful readers will realize that steps 6 through 8 did not involve our local
copy of the repository.
To finish up, we should catch it up with the origin
:
> git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
Unpacking objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 2 (delta 2), pack-reused 0
From github.com:kgolyaev/work_with_remotes
cb3f71d..2d88747 master -> origin/master
Updating cb3f71d..2d88747
Fast-forward
GitExampleMain.R | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
And to keep track, here are the last three commits:
> git log -n 3 --all
commit 2d887472481fb8745ad34de8adeaef3e00c11a97 (HEAD -> master, origin/master)
Author: Konstantin Golyaev <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 23:23:05 2020 -0700
refactored: replaced print() with cat() (#1)
commit 71b4e1aa8557a53b0f76236039b6a85c2c43afd5 (origin/kghome/print_to_cat, kghome/print_to_cat)
Merge: 08c4d08 cb3f71d
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 22:30:53 2020 -0700
Catching up feature branch work with master
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Sun Oct 18 22:09:34 2020 -0700
changed default argument values to x=3 and y=4
Notice that the latest commit is done by “Konstantin Golyaev” and not “Konstantin at Home”. This is because we did it with my GitHub username via the GitHub online interface.
And this is how the last two commits would look for our collaborator, “Konstantin at Work”:
> git log -n 2
commit 2d887472481fb8745ad34de8adeaef3e00c11a97 (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin Golyaev <konstantin.golyaev@gmail.com>
Date: Sun Oct 18 23:23:05 2020 -0700
refactored: replaced print() with cat() (#1)
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date: Sun Oct 18 22:09:34 2020 -0700
changed default argument values to x=3 and y=4
Notice that “Konstantin at Work” never sees the kghome/print_to_cat
branch
in his commit history, because he never created one.
Step 10: Delete Local Feature Branch
The final cleanup step is to delete the local feature branch, which no longer serves any purpose. This can be hard to undo, so we will make sure we are deleting the right branch:
> git fetch -p
From github.com:kgolyaev/work_with_remotes
- [deleted] (none) -> origin/kghome/print_to_cat
> git branch -v | grep "\[gone\]"
kghome/print_to_cat 71b4e1a [gone] Catching up feature branch work with master
The first command git fetch -p
tells git to prune branches that are no longer
on the remote repository.
Because we deleted the feature branch on the central repository in Step 8,
this will identify the branch kghome/print_to_cat
.
The second line does a double-check, it will search for branches that have
[gone]
in their last commit, which is added by the pruning step.
Now we are sure that kghome/print_to_cat
can be deleted:
> git branch -D kghome/print_to_cat
Deleted branch kghome/print_to_cat (was 71b4e1a).
And with this, we are finally done!
Conclusion
The above process may seem needlessly convoluted, and it can indeed be, particularly in a situation where you collaborate on the codebase but rarely touch the same files as your peers. But if you and at least one other person are editing the same file concurrently, following the above process can save you a lot of headaches when you and your partner make conflicting changes to the same file and try to keep both your features working at the same time. Happy coding!