by Patrick O’Shaughnessy
My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studied data, risk, statistics, and machine learning—the last of which is the topic of our conversation today.
He has now left the world of finance to found a company called Prefect, which is a framework for building data infrastructure. Prefect was inspired by observing frictions between data scientists and data engineers, and solves these problems with a functional API for defining and executing data workflows. These problems, while wonky, are ones I can relate to working in quantitative investing—and others that suffer from them out there will be nodding their heads. In full and fair disclosure, both me and my family are investors in Jeremiah’s business.
You won’t have to worry about that potential conflict of interest in today’s conversation, though, because our focus is on the deployment of machine learning technologies in the realm of investing. What I love about talking to Jeremiah is that he is an optimist and a skeptic. He loves working with new statistical learning technologies, but often thinks they are overhyped or entirely unsuited to the tasks they are being used for. We get into some deep detail on how tests are set up, the importance of data, and how the minimization of error is a guiding light in machine learning and perhaps all of human learning, too. Let’s dive in.
2:06 – (First Question) – What do people need to think about when considering using machine learning tools
3:19 – Types of problems that AI is perfect for
6:09 – Walking through an actual test and understanding the terminology
11:52 – Data in training: training set, test set, validation set
13:55 – The difference between machine learning and classical academic finance modelling
16:09 – What will the future of investing look like using these technologies
19:53 – The concept of stationarity
21:31 – Why you shouldn’t take for granted label formation in tests
24:12 – Ability for a model to shrug
26:13 – Hyper parameter tuning
28:16 – Categories of types of models
30:49 – Idea of a nearest neighbor or K-Means Algorithm
34:48 – Trees as the ultimate utility player in this landscape
38:00 – Features and data sets as the driver of edge in Machine Learning
40:12 – Key considerations when working through time series
42:05 – Pitfalls he has seen when folks try to build predictive market investing models
44:36 – Getting started
46:29 – Looking back at his career, what are some of the frontier vs settled applications of machine learning he has implemented
49:49 – Does intereptability matter in all of this
52:31 – How gradient decent fits into this whole picture