17 min read

Git: Working With Pull Requests

Before We Begin

Disclaimer: In our book with Harry Paarsch I have written an introduction to working with Git for absolute beginners. If you have never worked with Git before, I would strongly encourage you to get familiar with that write-up first, or at least with some version of “Git for absolute beginners”. To get the most out of this write-up, you would need to be familiar with the basics of working with Git, including:

  • tracking files,
  • committing changes to files,
  • creating branches to experiment with changes to files, and
  • merging branches and potentially resolving conflicts.

Cloning A Git Repository

There are a few additional things to understand about Git that arise when working with others on a shared codebase. In such situation, there is usually a central Git repository to which everyone has access. Commonly it is hosted on GitHub, but any remote server can be used to host a Git repository, and we will not discuss how to set one up from scratch. Between GitHub, Bitbucket, Gitlab, and internal corporate solutions such as Azure DevOps, you will almost never have to configure this thing yourself. Internally in Git, such central repository is usually called remote, and the default name for the remote repository is origin.

Every user who is working on the codebase starts by cloning the origin repository to their local machine. The first thing to understand is that by default cloning creates an entire local replica of the repository with the complete history of all changes to all files that are tracked as part of the codebase. However, the cloned repository will only be identical to the central source at the time of cloning. Any subsequent changes to the origin repository will not get automatically reflected in the local clone. It is our job to periodically synchronize the local version of the repository with the origin, and Git will help us minimize the pain of doing this.

Working Example

Setup

For the example we will assume the identity of the user named “Konstantin at Home”. Most of the action will be done by him, and to illustrate the process of collaboration we will introduce another user, “Konstantin at Work”.

We will work through the example from the book, which involves two small R files: GitExampleMain.R and GitExampleFunctions.R. Below is their initial contents:

> cat GitExampleMain.R
#############################
###   GitExampleMain.R    ###
#############################

rm(list = ls())
source("functions.R")

x <- 3
y <- 4
z <- AddArguments(x, y)
print(z)
> cat GitExampleFunctions.R
#############################
### GitExampleFunctions.R ###
#############################

AddArguments <- function(x, y) {
  return(x + y)
} # end AddArguments

For illustration purposes we will make a change - notice that the GitExampleMain.R file calls the GitExampleFunctions.R file, but is using an incorrect name to refer to it. We will fix this and at the same time get rid of the rm(list = ls()) line as well. Here is the updated version of GitExampleMain.R:

> cat GitExampleMain.R
#############################
###   GitExampleMain.R    ###
#############################

source("GitExampleFunctions.R")

x <- 3
y <- 4
z <- AddArguments(x, y)
print(z)

Assuming that these are the only changes we made, here is how the commit history looks And this is how the commit history currently looks on “Konstantin at Home”’s machine:

> git log
commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2 (HEAD -> master, origin/master)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:28:13 2020 -0700

    Fixed imports, got rid of rm(ls)

commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:25:08 2020 -0700

    first commit

Notice this piece at the top of the latest commit: (HEAD -> master, origin/master). Here is what the pieces mean. HEAD is the internal Git term that means “latest commit”, i.e. the current state of affairs. It is currently pointing to the master branch on the local repository. Next, origin/master is the corresponding commit on the remote repository and the name of the branch on the remote. Essentially the part master, origin/master should be read as “local branch master corresponds to branch master on the origin remote repository”.

To complete the setup, we will introduce another change to the code, this time by the “Konstantin At Work” user. He will introduce default arguments to the AddArguments() function. Here is the new GitExampleFunctions.R file:

> cat GitExampleFunctions.R
#############################
### GitExampleFunctions.R ###
#############################

AddArguments <- function(x=1, y=2) {
  return(x + y)
} # end AddArguments

And here is how the commit history looks in “Konstantin at Work”’s version of the repository:

> git log
commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Thu Oct 15 21:58:48 2020 -0700

    Added default arguments to function

commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:28:13 2020 -0700

    Fixed imports, got rid of rm(ls)

commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:25:08 2020 -0700

    first commit

Let us now work through the example of making a change in the codebase in details.

Overview

When we work on making changes to a shared code repository, we will be expected to go through the following set of steps:

  1. Catch up our local master branch with the remote.
  2. Make a new feature branch locally off master.
  3. Implement the necessary code changes, making as many commits as we deem necessary, all to our local feature branch.
  4. Once feature development is complete, catch up our local master branch with the remote again. If the development work took more than a couple days, this is absolutely essential, but it is a good practice to do this step every time.
  5. Push our local feature branch to the remote repository, making a feature branch on the remote in the process.
  6. Open a pull request, also known as code review request, in which we will propose to merge our feature branch into the master branch on the origin repository.
  7. Wait for someone else to approve our code changes.
  8. Perform the merge once the pull request got approved.
  9. Once the merge is done, go back to our local repository and update the local master branch to catch it up with the origin one final time.
  10. At this point we can optionally delete the local feature branch.

That is a lot of steps, but most of them take very little time. Usually the majority of time is spent in actual feature development and addressing feedback from the code review.

Step 1: Get Our Repo Up To Speed

We start by pulling from the origin repository:

> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:kgolyaev/work_with_remotes
   3d89e35..2d13a36  master     -> origin/master
Updating 3d89e35..2d13a36
Fast-forward
 GitExampleFunctions.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Running git log will let us see what changes happened on the remote server:

> git log

commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (HEAD -> master, origin/master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Thu Oct 15 21:58:48 2020 -0700

    Added default arguments to function

commit 3d89e35054d6e1a7ab1b9aa2fbf922bc4f569ba2
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:28:13 2020 -0700

    Fixed imports, got rid of rm(ls)

commit a8d1b6b92bf3e3e034a1f133cace92ddd6947080
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Thu Oct 15 21:25:08 2020 -0700

    first commit

We can see that our collaborator, “Konstantin at Work”, added default arguments to the AddArguments() function. We are now up-to-date with the centralized repository, time to start development.

Step 2: Create Feature Branch

This is very fast:

> git checkout -b kghome/print_to_cat
Switched to a new branch 'kghome/print_to_cat'

The command git checkout -b creates a new branch and switches to it. Notice the name we chose for the branch: it starts with our user name, and is followed by a summary of the code change that we plan to implement. This is a good practice to stick to, and if we work with a group of people, they may have their own preferred taxonomy from naming branches. If so, we should stick to it.

Step 3: Develop New Feature

For our example feature development will be trivial: we will replace the call to print() with call to cat(). Here is the revised version of the GitExampleMain.R file:

#############################
###   GitExampleMain.R    ###
#############################

source("GitExampleFunctions.R")

x <- 3
y <- 4
z <- AddArguments(x, y)
cat(z)

And here is how our local git repository will look like after we commit these changes to the kghome/print_to_cat branch:

> git commit -a -m "refactored: replaced print() with cat()"
[kghome/print_to_cat 08c4d08] refactored: replaced print() with cat()
 1 file changed, 1 insertion(+), 1 deletion(-)

> git log -n 2
commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a (HEAD -> kghome/print_to_cat)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 21:56:01 2020 -0700

    refactored: replaced print() with cat()

commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad (origin/master, master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Thu Oct 15 21:58:48 2020 -0700

    Added default arguments to function

Notice that we used git log -n 2 to only show the last two commits. Also notice that latest commit is on the kghome/print_to_cat branch, and the latest commit on the master branch is still the one where function AddArguments() received default values for input arguments. This should make sense: we have not added any new commits to the master branch since we ran git pull in Step 1.

Step 3 would often take a while, if you seek to develop a non-trivial feature. Let us imagine that in the meantime our collaborator, “Konstantin at Work”, changed the master branch by replacing the values of the default input arguments to AddArguments() function. This is how the last two commits look at the origin in the meanwhile:

> git log -n 2
commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Sun Oct 18 22:09:34 2020 -0700

    changed default argument values to x=3 and y=4

commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Thu Oct 15 21:58:48 2020 -0700

    Added default arguments to function

Step 4: Get Up-to-Date Again

At this point, “Konstantin at Home” is done with feature development, and because it took us a while, our local repository had fallen behind. So we have to catch up:

> git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:kgolyaev/work_with_remotes
   2d13a36..cb3f71d  master     -> origin/master
Updating 2d13a36..cb3f71d
Fast-forward
 GitExampleFunctions.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

This step can be confusing, so make sure you fully understand what is going on here. First, we switch from feature branch kghome/print_to_cat back to master. Then we run git pull to get the latest commits from the master branch on the remote repository and catch up our version of master. Here is what the last three commits are in “Konstantin at Home”’s repository:

> git log -n 3 --all

commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (HEAD -> master, origin/master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Sun Oct 18 22:09:34 2020 -0700

    changed default argument values to x=3 and y=4

commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a (kghome/print_to_cat)
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 21:56:01 2020 -0700

    refactored: replaced print() with cat()

commit 2d13a3629f2391aca6cad7675003b8b1e7ed35ad
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Thu Oct 15 21:58:48 2020 -0700

    Added default arguments to function

The second and the third commits are exactly the same we saw in Step 3, and the most recent commit is the one from “Konstantin at Work”. Notice that we had to add --all option to the git log command to see commits from all branches, because by default it only shows us commits from the current branch.

It is important to understand that we cannot pull from the origin while on our feature branch. Here is what happens if we try:

> git checkout kghome/print_to_cat 
Switched to branch 'kghome/print_to_cat'

> git pull
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> kghome/print_to_cat

This makes sense: feature branch kghome/print_to_cat was created in our local replica of the repository, and so far we have not attempted to push changes made locally to the origin. This will come later, in Step 5. For now, we need finish catching up: our local master branch is now in sync with the remote, but our local feature branch is not. To fix this, we merge the master branch into the feature branch:

> git checkout kghome/print_to_cat 
Switched to branch 'kghome/print_to_cat'

> git merge master
Merge made by the 'recursive' strategy.
 GitExampleFunctions.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

We have deliberately selected a simple example and were careful to avoid changes that may cause merge conflicts. Realistically we should be prepared to handle conflicts every time we run git merge or git pull commands, but this is not the focus of our guide. Also, we may be asked to provide a commit message when running git merge master line. Here are the three latest commits in “Konstantin at Home”’s repository after this step:

> git log -n 3 --all
commit 71b4e1aa8557a53b0f76236039b6a85c2c43afd5 (HEAD -> kghome/print_to_cat)
Merge: 08c4d08 cb3f71d
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 22:30:53 2020 -0700

    Catching up feature branch work with master

commit cb3f71dfd677754a8284f1a7f613c188647d2f9e (origin/master, master)
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Sun Oct 18 22:09:34 2020 -0700

    changed default argument values to x=3 and y=4

commit 08c4d08892469b5fecd40e5dd2fa77c80cb35b1a
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 21:56:01 2020 -0700

    refactored: replaced print() with cat()

Step 5: Push Feature Branch to Remote

At this point, our feature branch is up-to-date with master and also has the latest feature code that we wanted to develop in the first place. It is now time to push it to the origin:

> git push --set-upstream origin kghome/print_to_cat 
Enumerating objects: 9, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 8 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 588 bytes | 588.00 KiB/s, done.
Total 5 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
remote: 
remote: Create a pull request for 'kghome/print_to_cat' on GitHub by visiting:
remote:      https://github.com/kgolyaev/work_with_remotes/pull/new/kghome/print_to_cat
remote: 
To github.com:kgolyaev/work_with_remotes.git
 * [new branch]      kghome/print_to_cat -> kghome/print_to_cat
Branch 'kghome/print_to_cat' set up to track remote branch 'kghome/print_to_cat' from 'origin'.

Notice that we had to use the --set-upstream flag to git push. Without it, Git does not know what to do with the contents of the new branch that we created locally. We have to spell out origin explicitly because it is possible to work with multiple remote repositories in Git, but this is out of scope for this guide.

Step 6: Open Pull Request

We have pushed the feature code to the remote repository, but our job is not yet done. The feature code is in the feature branch kghome/print_to_cat, and we want to merge that code into master on the origin, so that this change would propagate to all our collaborators. Generally we would not be able to make a straight merge into master branch on the central repository. Instead, we need to open a pull request, also known as PR, or as a “code review request” for others to evaluate the changes we propose.

Every Git hosting system will have a slightly different interface for creating a pull request. On GitHub, we will usually be prompted to create a PR whenever it detects a push to a new feature branch. By default, we will be creating the PR from kghome/print_to_cat branch into the master branch. We will need to provide a title and a description of the changes our code proposes, and at the bottom we will see what lines in which files will actually be affected. In our example, the changes are minimal:

   x <- 3
   y <- 4
   z <- AddArguments(x, y)
 - print(z)
 + cat(z)

The minus in front of the line means the line will be removed, and the plus means the line will be added. On GitHub website, the “-” lines will have red background and the “+” lines will have green background, so telling them apart will be easier.

Once the PR is created, Git will check if the proposed merge would cause any conflicts. Often creating a PR also kicks off all the tests that are usually developed as part of the project, as well as code quality and formatting checks. Depending on the complexity, these tests may take some time to run, and if any of them fail, we will be expected to change the code on our feature branch to make sure all tests pass. But this is a topic for another guide.

Step 7: Wait for PR Approval

Usually at least one other collaborator has to manually approve a pull request into the master branch. Often the reviewers will ask us to make changes to our feature branch code and withhold their approval until such changes are made. On a real project, this step can really take a while, sometimes longer than Step 3 in which we actually developed the feature. But for our guide we will fast forward to the point when the PR gets approved.

Step 8: Merge PR Into Master

Once all tests and code checks have succeeded, and a reviewer approved the PR, we can click the “Merge Pull Request” button to actually perform the merge. Some teams have preferences as to whether to use the “squash merge” or “rebase merge”, which will impact how the final commit history will look. I personally prefer “squash merge”, where all commits to the feature branch are combined into one, because I tend to make a lot of tiny commits as I develop code. We choose “squash and merge” and confirm the choice, which should take a second or two. At this point, GitHub offers to delete the kghome/print_to_cat branch, and I suggest you do this, by clicking the “Delete Branch” button. Once feature code is merged into master, there is no reason to keep the feature branch around, particularly if we might want to reuse its name later. Pruning code branches that are no longer being used is a good practice for keeping the repository healthy and organized.

Step 9: Catch Up Locally With Remote One Last Time

Careful readers will realize that steps 6 through 8 did not involve our local copy of the repository. To finish up, we should catch it up with the origin:

> git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

> git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
Unpacking objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 2 (delta 2), pack-reused 0
From github.com:kgolyaev/work_with_remotes
   cb3f71d..2d88747  master     -> origin/master
Updating cb3f71d..2d88747
Fast-forward
 GitExampleMain.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

And to keep track, here are the last three commits:

> git log -n 3 --all

commit 2d887472481fb8745ad34de8adeaef3e00c11a97 (HEAD -> master, origin/master)
Author: Konstantin Golyaev <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 23:23:05 2020 -0700

    refactored: replaced print() with cat() (#1)

commit 71b4e1aa8557a53b0f76236039b6a85c2c43afd5 (origin/kghome/print_to_cat, kghome/print_to_cat)
Merge: 08c4d08 cb3f71d
Author: Konstantin at Home <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 22:30:53 2020 -0700

    Catching up feature branch work with master

commit cb3f71dfd677754a8284f1a7f613c188647d2f9e
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Sun Oct 18 22:09:34 2020 -0700

    changed default argument values to x=3 and y=4

Notice that the latest commit is done by “Konstantin Golyaev” and not “Konstantin at Home”. This is because we did it with my GitHub username via the GitHub online interface.

And this is how the last two commits would look for our collaborator, “Konstantin at Work”:

> git log -n 2
commit 2d887472481fb8745ad34de8adeaef3e00c11a97 (HEAD -> master, origin/master, origin/HEAD)
Author: Konstantin Golyaev <konstantin.golyaev@gmail.com>
Date:   Sun Oct 18 23:23:05 2020 -0700

    refactored: replaced print() with cat() (#1)

commit cb3f71dfd677754a8284f1a7f613c188647d2f9e
Author: Konstantin at Work <konstantin.golyaev@microsoft.com>
Date:   Sun Oct 18 22:09:34 2020 -0700

    changed default argument values to x=3 and y=4

Notice that “Konstantin at Work” never sees the kghome/print_to_cat branch in his commit history, because he never created one.

Step 10: Delete Local Feature Branch

The final cleanup step is to delete the local feature branch, which no longer serves any purpose. This can be hard to undo, so we will make sure we are deleting the right branch:

> git fetch -p
From github.com:kgolyaev/work_with_remotes
 - [deleted]         (none)     -> origin/kghome/print_to_cat

> git branch -v | grep "\[gone\]"
  kghome/print_to_cat 71b4e1a [gone] Catching up feature branch work with master

The first command git fetch -p tells git to prune branches that are no longer on the remote repository. Because we deleted the feature branch on the central repository in Step 8, this will identify the branch kghome/print_to_cat. The second line does a double-check, it will search for branches that have [gone] in their last commit, which is added by the pruning step.

Now we are sure that kghome/print_to_cat can be deleted:

> git branch -D kghome/print_to_cat 
Deleted branch kghome/print_to_cat (was 71b4e1a).

And with this, we are finally done!

Conclusion

The above process may seem needlessly convoluted, and it can indeed be, particularly in a situation where you collaborate on the codebase but rarely touch the same files as your peers. But if you and at least one other person are editing the same file concurrently, following the above process can save you a lot of headaches when you and your partner make conflicting changes to the same file and try to keep both your features working at the same time. Happy coding!