I recently gave an interview to the AI Time Journal. I decided to preserve a copy of this interview for posterity. The published version had been slightly edited for better exposition.
At what point did you realise that you wanted to pursue a career in data science (data & AI), and how did you get into it?
I went to graduate school to study Economics, and I ended up studying econometrics and working with data. I always enjoyed coding even though I was entirely self-taught. When I graduated, I was offered a job at the then-nascent Central Economics team at Amazon. This was back in early 2011, before most tech companies started hiring economists. Once I joined Amazon, I realized there was a lot to learn about machine learning and data science, and got hooked.
Could you please elaborate on the different career paths in Data Science?
I really like the way this post from AirBNB categorizes data science jobs: analytics, inference, and algorithms.
Analytics jobs tend to focus on producing actionable insights from data. E.g. suppose an Amazon customer signs up for Prime, how much more will they end up spending on Amazon in the next few months?
Inference jobs are focused on design, execution, and analysis of experiments, or A/B tests. For example, does showing customer recommendations for related products distract them from their checkout experience, or does it result in additional upsell?
Algorithms jobs are concerned with development of predictive models and deploying them as part of a production workflow, e.g. developing a new recommendation system to promote item discovery.
To be a successful analytics DS, you need a solid grasp on statistics, strong SQL skills, and willingness to spend ample time to acquire domain knowledge relevant for the problem. Best results are usually derived not from algorithmic wizardry, but rather from finding and using the key subset of data. Domain knowledge can help to know for what to look.
To succeed at an inference DS job, you would need to feel comfortable talking to multiple different stakeholders such as PMs who come up with ideas for experiments and engineers who understand what can and cannot be experimented on. You also have to be able to understand the basics of experiment design and explain these basics to those who aren’t well versed in it.
A strong algorithms DS can be thought of as an ML engineer. Predictive models that only work inside Jupyter notebooks are not particularly useful for the business. For that, they must be integrated into the rest of the software stack. Being able to work side-by-side with engineers on model operationalization and deployment requires you to speak the engineering language and be comfortable with modern software development practices such as version control, unit tests, code reviews, CI/CD, and monitoring.
In your opinion, what have been the most relevant breakthroughs in data science impacting our world in the last 1-2 years, and what trends do you see emerging going forward?
The biggest innovation in the past decade has come from understanding how to use Transfer Learning. Previously models were trained from scratch for every problem and were nearly impossible to customize. Deep Learning made it possible for us to train models on one set of data and subsequently fine-tune them for another, similar problem. In 2010, it would have been very difficult to develop a model that can tell apart dogs from cats in photos. Today, I can achieve this with about 10 example photos and have a near-perfectly accurate model.
In the past couple of years, we were able to achieve similarly impressive results on natural language problems. Modern state-of-the-art models such as GPT-3 can generalize across domains with mind-blowing results. For example, these models can generate working Python code just from a plain-text description of what the function should be doing, such as “return all strings from a collection that are palindromes and are at least 8 symbols long”.
So far transfer learning has not been heavily utilized outside of the vision and text domains, most likely due to absence of large commonly used benchmark datasets such as ImageNet or Wikipedia corpus. I for one would love to see us make some progress on transfer learning for time series forecasting.
While hiring a Data Science Engineer, what are the few impressive skills you look at in their resume or profile?
My team tends to hire people with a fairly specific set of skills at the intersection of data science and ML engineering. We don’t work on analytics or inference problems as much as people do in other parts of Microsoft or elsewhere in the industry. One of the primary things we seek in candidates is a track record of building ML models from scratch. We don’t expect every candidate to be able to train a new GPT-3 model, and we like to reuse existing solutions whenever possible. But people who succeed on our DS team can recreate existing solutions from scratch if it becomes truly necessary. To use an analogy: if we’re in the business of shipping vehicles to customers, then my team seeks to hire mechanics rather than just drivers.
Another skill we look for is being comfortable with the engineering aspects of data science. We don’t develop awesome predictive models just to see them get stale inside Jupyter notebooks. Instead, we modularize the model code and develop a battery of end-to-end tests so that we could deploy the models into production by means of CI/CD workflows.
Finally, I know it’s not a particularly sexy skill, but I’m still convinced that the biggest return on invested learning effort in Data Science comes from getting good at SQL. It is almost impossible to avoid interacting with some sort of a database if you’re working on a real business problem, and all databases speak SQL. Most data preparation and cleaning logic can and should be implemented inside a database (read: in SQL) whenever possible.
And yes, we are actively hiring, of course :)
What advice would you give to other business leaders who would like to step into realising data science use cases? What advice should they ignore?
I think one of the best pieces of advice that I heard in my career was “no AI before BI”. This is a succinct way of saying that you must invest time and effort into setting up data infrastructure before starting to infuse your business with ML and AI. Without proper infrastructure, it is possible to hack some prototypes together quickly to demonstrate possible value, but it will be extremely challenging to realize said value because it will be nearly impossible to deploy models reliably. Unfortunately data infrastructure work is almost always less glamorous and less exciting for most stakeholders involved, and many businesses who’re new to ML want to minimize it. I am convinced that such an approach is a big mistake - whatever is built on a shaky foundation rarely lasts.
In terms of advice to ignore, I want to agree with Andrew Ng: these days it is not necessary for most businesses to focus on the latest and greatest algorithmic developments. Instead, businesses are much likely to realize ROI from investing into cleaning data and acquiring higher-quality datasets. Modern models are usually fine for most practical purposes.
According to you, what are the traits that make a good leader? In this technology era, how do you synchronize yourself with your team to excel in technology innovations?
Everyone has an opinion on leadership, and most opinions have at least some truth to them. I am still convinced that Andy Grove got it right when he said that a good leader must be a force multiplier for their team. (His book, “High Output Management”, is a classic on the topic, and for good reasons.) In addition, I think that the best leaders with whom I had a chance to work shared another trait: they deeply cared about their people on a human level. This became even more apparent in 2020 when so many things went wrong for many of us.
In terms of staying on top of the latest innovations in technology, I found that following a number of ML thought leaders on Twitter is amazingly effective. Often I have to time-box my Twitter time to make sure I get some actual work done. But with basic time management skills Twitter can be unreasonably useful.