Installing Python on Ubuntu

For reasons similar to those mentioned in my previous post, I had to install Python on bare Ubuntu boxes, be that Amazon EC2 instances or Azure Virtual Machines. While most Ubuntu systems come with some version of Python pre-installed, those versions are frequently old and do not support some of the useful packages from the python data science ecosystem, such as numpy, pandas and scikit-learn. Having done this a few times, I decided to write down the steps to avoid going through the pain in the future. Before we start, I have to point out that in the last year or a very nice solution to the problem emerged in the form of conda. If you can leverage it, you absolutely should, since it will save you a ton of time. The rest of this post assumes that you, for whatever reason, want to install python from scratch.

Installing software on Linux is usually either super-simple, or involves massive amounts of pain. This has a lot to do with the Linux software paradigm, which I like to explain via a house construction analogy. A Windows program is like a manufactured home: all the necessary building blocks are packaged together into an application, which gets delivered to the user as a bundle. While this limits flexibility and increases bundle size, the program will typically work on any Windows machine since it is self-contained. In contrast, a Linux program is more like a brick house. Linux is comprised of many tiny programs such as grep, which generally can only do one thing well. These small programs, like bricks, provide building blocks for more complex programs. It is not always possible to list all the bricks on which the final house stands, and, as such, portability of Linux programs can be a problem. The simple path almost always involves invoking a command such as sudo apt-get install package-name. In Ubuntu, apt is a program whose job is to manage dependencies for other programs, i.e. finding what other bricks you need when you try to install a new piece of software. The pain starts when apt-get either cannot locate the necessary brick, or, worse, is unaware of a certain dependency, usually when it is nested a few layers down. At this point one has to perform an indeterminate number of google searches to uncover the name of the missing brick and install it via apt-get. (No, it is not always clear what the name of the missing program is, I wish. For instance, when scipy refused to install due to lack of freetype, I had to install libfreetype6-dev to proceed.)

The exact steps that I had to take to install python were numerous, so, rather than listing them all one-by-one in this blog post, I decided to commit them to my Github public repository for anyone to use, including future me. I provide two files. First script was used to install all python components on a VirtualBox VM that I made for our book on computing. The second script was used more recently to install python on Azure Ubuntu VM. The highlight of it is that I wanted to play around with TensorFlow, and for whatever reason, I ended up having to build it from source, rather than installing a pre-built version. It is likely possible to obtain TensorFlow without having to build it from source, but I ended up having to do just that.