Previously I mentioned what machine learners do better than most economists. Now I want to bring up a slightly more controversial distinction. Many machine learning applications are fundamentally less willing to make assumptions. This may sound a bit abstract and probably warrants an explanation.

At a very high level of abstraction, a large portion of statistics can be reduced to constructing a model that defines a relationship between a dependent variable and a set of independent variables. It is in defining this relationship where one can choose to make assumptions. It may sound like I am drawing a distinction between parametric and nonparametric statistics, but in fact, the point is more general.

In economics, we construct models to understand specific phenomena. Every little piece of our models usually has a clear and fairly straightforward interpretation. The famous model-building process of Alfred Marshall – write the question in English, translate into math, use math to obtain the answer, translate the answer back into English, burn the math – is still a good rule of thumb. But we cannot make this work without imposing a myriad of assumptions, many of which are so deeply ingrained in our thought process that we no longer consider them to be assumptions per se; rather, we implicitly treat them like axioms.

In contrast, machine learners become deeply suspicious the moment someone starts making assumptions. They raise a valid point: any result that is obtained under an assumption goes out the window once the assumption is violated. Many find such a predicament to be completely unacceptable, and refuse to go down that path. Strangely enough, the ability to make informed assumptions that are mostly true is still highly valued in the field, and has a special term reserved for it: «domain knowledge».

I feel that in economics we had to acquire «domain knowledge» early because our data is fundamentally very noisy. Without making assumptions, it is impossible to learn anything from, say, macroeconomic data – there are simply too many moving parts and too few degrees of freedom. Imposing a structure on the data can sometimes let us ignore a subset of the explanatory variables, and make the remaining variables more informative. I feel like this idea had not yet made it to the machine learning mainstream.