**ReSolve Riffs with ReSolve’s own Dr. Andrew Butler on Integrating Prediction with Optimization**

Our guest this week was Dr. Andrew Butler, Chief Investment Officer here at ReSolve. He has earned a PhD from the University of Toronto, Department of Mechanical & Industrial Engineering; an M.A. in Applied Mathematics, York University; an Honours B.Sc. in Applied Mathematics & Physics, Memorial University. Our conversation covered:

- Mentors, studying optimization, machine learning and emulating similarities
- Comparing methodologies used in calculating oil reserves, with empirical finance
- Mathematics, known truths, soft axioms and assumptions
- Data science, tools and hypothesis tests in finance
- Comparing the definitions of “
*noise*” and different scientific methods - Regression types, relationships and features
- Existing models
*vs*creating models with data - Hyperparameters, universes, good decision-making and edges
- The fallacy of “
*The Efficient Frontier”*in Modern Portfolio Theory - Methods that make for good choices, and setting expectations
- Neural Networks, financial data extraction and optimization science
- Building complex models, integrating and optimizing predictions, and making decisions
- Counterintuitive predictions and “
*explainability”*– integrated*vs*optimized models - Optimizing for decision accuracy by minimizing prediction error
- Using
*Decision Trees*,*Ensembles*and*Boosting* - Market data and making predictions using de-coupled
*vs*integrated approaches - Training models and optimizing, going forward
- Optimal covariance shrinkage and minimizing error
- Constrained
*vs*unconstrained mean variance portfolios, de-coupled*vs*integrated, in a managed futures context - The effect of constraints in limiting models from being “
*different*” - The practical takeaways about integrated solutions for Advisors and Portfolio Managers
- The value of humility in optimizing for financial markets – knowledge, wisdom and experience
- Stochastic dominance
*vs*t-tests - Time horizons – size matters
- The Consultant Industry – monthly data
*vs*understanding process - And much more

This is “ReSolve Riffs” – live on YouTube every Friday afternoon to debate the most relevant investment topics of the day, hosted by Adam Butler, Mike Philbrick and Rodrigo Gordillo of ReSolve Global* and Richard Laterman of ReSolve Asset Management Inc.

#### Click here to download transcript in printable format.

You can also read the transcript** below**

#### Listen on

#### Subscribe on

#### Subscribe on

Google Podcasts

#### Subscribe on

**Andrew Butler**CIO, ReSolve Asset Management Inc.

Andrew is Chief Investment Officer and Portfolio Manager for ReSolve Asset Management Inc. Andrew leads the research, development and implementation of Resolve’s machine learning and portfolio optimization ecosystem.

Andrew has earned a PhD from the University of Toronto, Department of Mechanical & Industrial Engineering; an M.A. in Applied Mathematics, York University; an Honours B.Sc. in Applied Mathematics & Physics, Memorial University, and is a CFA Charterholder.

Review Andrew’s research page here: https://butl3ra.github.io

**TRANSCRIPT**

**Adam:00:01:56**Okay, welcome.

**Andrew:00:01:59**Good morning.

**Rodrigo:00:01:59**Welcome, welcome.

**Adam:00:02:00**Yeah.

**Rodrigo:00:02:01**Welcome team. Welcome, Andrew. It’s been only a few years since we last had you on the podcast.

**Andrew:00:02:06**It’s been a long time. It’s been like, what, four years, five years?

**Rodrigo:00:02:09**Yeah. And I got to say, the — we got feedback on a lot of the podcasts that we’ve done, but for those who are watching this and haven’t listened to the original podcast, I think you should. It was — a lot of people have commented on it being one of their favorite podcasts that we’ve done. Of course, those people tend to be more technically oriented, it’s about machine learning. Love the title, *Machine Learning – Silver Bullet or Pandora’s Black Box*, which is — that was pretty neat.

**Adam:00:02:43**If you do say so yourself, having you come up with the title.

**Rodrigo:00:02:44**Well, no. Well, we all brainstorm. It was pretty good. Actually, that was a conversation with Cory Hoffstein that came — where the title emerged. So, thanks again for joining us, Andrew. I’ll let Adam do most of the questioning here, because he’s been working with you all these years on developing kind of your quantitative techniques. But yeah, why don’t we start off by getting a little bit of your background again since it’s been so long and kind of your journey from soup to nuts?

**Andrew:00:03:17**Sure. Well, I mean, don’t we need to do a – is there a disclaimer that needs to be done here? Who wants to…

**Adam:00:03:23**That’s a good call. You know, it’s a good thing he’s holding our feet to the fire, Rodrigo.

**Andrew:00:03:27**That’s right.

**Adam:00:03:29**This is not being recorded on Friday afternoon with drinks, but it’s still important to recognize that this is not for investment advice. This is for information and hopefully educational purposes only. For investment advice, you should seek out a registered advisor who’s going to learn all about you and make recommendations based on your own personal situation. With that said —

**Andrew:00:03:55**Beautiful.

### Backgrounder

**Adam:00:03:56**— Andrew, go ahead and give us a little bit of background.

**Andrew:00:03:59**Yeah, so — Sure. I mean, I’m a math guy. My formal training is in Applied Mathematics. I had an undergrad, a really great opportunity, and I’ve talked about this in the past, but a really great opportunity to work with just some really great mentors and supervisors in the math department at Memorial here in Newfoundland, who are doing some great work. This is back in 2012, on optimization and machine learning. And so we were doing work on oil reservoir optimization and using, basically using simulators to simulate kind of the dynamics, the physical dynamics of the oil reservoir optimization. And understanding that that simulation process is quite computationally expensive to compute. And so thinking about ways in which we can reduce that computational overhead.

And so we did a lot of great work on emulating a simulator. So, basically trying to emulate a simulator, which means replicate or create kind of a fast, computationally fast version of that simulator. And at that point, I mean, I was hooked. It was just such a great blend of both machine learning and optimization science. I was really fortunate. I actually missed all of the deadlines to be taken on as an honors student, but my supervisor was very kind, and took me on, despite the fact that I didn’t have any funding to do the research. And so I owe him such a depth of gratitude. And that really set me off on kind of this journey that I’ve been on for the better part of a decade at this point, which is really at the heart, is at the intersection of optimization. And so the mathematics behind optimization, and machine learning or statistical learning, and how those two concepts interact across a wide variety of domains; engineering, statistics, finance. So, that’s kind of the start.

And then I pursued a master’s in applied mathematics in financial engineering, and at that time, had the wonderful opportunity to work with you guys as a student. And I think Adam, you started sending me problems very early on when I was just kind of cutting my teeth on finance. And I was seeing a lot of the same types of challenges and difficulties that I saw in other spaces, almost maybe even through an extra order of magnitude, right, the amount of noise and the amount of uncertainty that exists within financial datasets is staggering. And so I found that to be a very interesting avenue to apply the same type of skills that I have been working with in other domains of kind of applied math and engineering. And it’s been a wild ride ever since. I mean, that’s the long/short of it. I mean, we’ve been …

**Adam:00:07:49**Well, at some point you did your CFA too. Right? And so I think the CFA, they sort of — we overlook that sometimes amongst all your academic accolades. But that was no small endeavor. And I think also helped to sort of set up the basic training for finance, that you could kind of build a quantitative framework around.

**Andrew:00:08:13**Well, exactly. I mean, I remember first reading — I had literally no training in finance, and never took a finance course prior to that. And I remember reading the CFA books. And my first pass through, being just so incredibly lost in so many different ways, because of the different lingo. Trying to find context was difficult, having not been immersed in that field ever. Right? And so yeah, that was just a great learning experience. And it was so great that I was able to do that also while working, while working with you guys back when it was Butler, Philbrick, Gordillo and Associates.

**Rodrigo:00:09:03**My God.

**Andrew:00:09:04**Yeah. And — …

**Adam:00:09:06**That was a long time ago, yeah.

**Rodrigo:00:09:07**Yeah.

**Adam:00:09:07**That is a long time ago, but it was great, because being able to do the CFA Program, while simultaneously apply a lot of the stuff that was being discussed, like, in particular, a lot of the portfolio optimization theory. To be able to apply it in real time and to consolidate that learning, was just a great way to go about learning a topic that I really knew very little about at the time.

### Scientific Method

**Adam:00:09:36**I think this is a really good segue actually to — we can kind of pause here because… So, you had spent a few years doing optimization in an oil reservoir context learning the ins and outs of data science, and best practices in that field. And then you do your CFA, you’re immersed in a background of empirical finance. And in the CFA there’s a large reading list, right? So, you’ve got to go and read, cover a lot of the seminal literature in finance as part of the CFA.

And so it’s an interesting opportunity here to sort of say, to sort of juxtapose the methodologies brought to bear in your data science work with oil reservoirs against the methodologies brought to bear in the canon of empirical finance. Let’s maybe pause here and get your sense of — because I think there was a fair amount of dissonance that occurred at that point, and we didn’t sort of circle back to that for a few years, but maybe let’s dwell here for a minute.

**Andrew:00:10:51**Sure. I mean, my kind of first — my first take of the whole field, because keep in mind, I came from a background that is very much built on axioms, right? So, there’s this — the whole foundation of mathematics is built off of kind of known truths, and then building up, based on things that you can show to be provably true.

**Rodrigo:00:11:19**Hard sciences like…

**Andrew:00:11:22**Yeah. And in finance you could argue that there aren’t any strong axioms, right. It’s this weird kind of in between of a social science and a quantitative science, right? And so there aren’t any hard axioms. And so when you read through kind of traditional financial literature, you have to — I often find myself wondering, well, why did they make this particular assumption? For example, why did they assume that return was proportional to volatility, right, as kind of a blanket statement, and then being like, well, is that necessarily true? Right? And if you actually look, right, because if you assume that to be true, and you build kind of the scaffolding around it, well, then you get into things like, well, a max diversification portfolio would be *max Sharpe optimal* if returns are proportional to volatility. And ergo, you want to run a maximum diversification portfolio in the absence of being able to predict future returns, but you can predict volatilities. Okay. Well, those are a series of leaps that you’ve made in, based on one fundamental assumption, which is that returns were proportional to volatility.

And so then when you go back, and you take a data science approach to it, and you say, okay, well, is that, is that necessarily true? A lot of the times what you’ll see is like, well, no, it’s not. And then a lot of the work that I’ve looked at, it’s like, if anything, the inverse of that relationship is true. Right? And so this became very difficult and eye-opening for me, because it makes you question assumptions that are kind of just presented as conventional wisdom. And my natural tendency was therefore to question and then analyze them with a kind of a data science mindset.

**Rodrigo:00:13:46**Right. So, these — Go ahead, Adam.

**Adam:00:13:48**Yeah, it’s okay. No, I think it was just — there’s a dissonance between having a background in science and the scientific method, right? I don’t mean like, science exclamation mark. I mean, science as in following the scientific method in order to determine if a hypothesis can be rejected, right. And in finance, they seem to have sort of skipped a lot of the scientific method. And I think part of that is that prior to the introduction of data science as a kind of a hard science field, there really weren’t good tools to run hypothesis tests in a field like finance or economics.

**Andrew:00:14:51**Yeah. Well, yeah, to — the challenge that I think where you’re getting at there is that the data can be so noisy, that if you use traditional statistical test methodologies, you will often find, well, I cannot reject the null hypothesis that there is no relationship here, right? Which means, which doesn’t give you a whole lot. And if you’re trying to be either an academic or a practitioner in the field, especially if you’re trying to be a practitioner in the field, you need to make decisions, and you’re trying to make decisions that are informed based on data. But traditional statistical tests are telling you that there’s nothing here, with kind of the immersion of robust statistical methods, and machine learning and statistical learning, you’re able to kind of extract with more confidence some of these relationships that do actually exist, but they’re just hidden in so much noise.

**Rodrigo:00:15:59**Can we talk a little bit about what you mean by noise and contrast that to what I imagined would be less noise in the hard sciences, maybe in the field of oil sands and optimization and what you were doing before?

**Andrew:00:16:17**Yeah. Well, the noise — I mean, there’s different ways I think that you can think about noise. And I think in traditional quantitative finance people often state that what makes financial modeling so difficult is because the financial time series data is time variant, right. So, that is an aspect, that time varying aspect is a quality of the data that may not necessarily exist in other sciences where the relationship — so, if you’re trying to do, say, modeling tidal flows, there’s got to be maybe some degree of non-stationarity. But there’s going to be a large amount of stationary relationships, right? Seasonality patterns that exist within the data set that are predictable.

People often say, well, what makes financial time series data so difficult, is because the data is non-stationary. And that is true to a certain degree. Kind of more recently, what I’ve been thinking about is that, well, you can actually control, to a large degree, a lot of these non-stationary effects. So, if you think about kind of traditional non-stationarity effects, such as the stylized facts that were reported back in the work of *ARCH* and *GARCH* modeling. So, these would be things like no non-stationarities in the data, like time varying volatility. You can actually look at the data, you can use the — you can use your naked eye and look at the data and observe that the time series data exhibits different periods where volatility is higher and lower. But you can control for that effect to a large degree, right? And you can control for other forms of non-stationarity.

What makes financial data, I think, really difficult is yes, there may be some non-stationarity that you’re not controlling for, but there’s just such little signal to noise. So, a particular feature, so if you’re trying to model financial time series data, a particular feature will explain very little of the variance of that time series, right, in particular, on scales of the data that most people would work with, like daily or weekly or monthly scales. It explains …

**Rodrigo:00:18:59**So, when you say a particular feature you’re talking about, let’s say like the value, like a P/E, and you’re trying to use that to predict some signal, to capture some signal. You’re saying that it’s overwhelmed by noise.

**Andrew:00:19:17**It’s just overwhelmed by noise, and actually can give sometimes, I think, the illusion of non-stationarity as well, because there’s just so much noise. And so you get with any particular feature, like say, price to earnings, or if you were using trend following, some momentum or if you’re using some seasonality features, you’re actually only explaining at best 0.1% of the variance. If you can explain 1% of the variance of the time series like you’re doing phenomenal. And so what that means is that a lot of the times what you’re observing is really difficult to distinguish if the artifact that you’re observing is signal, or if it’s noise. And so what we’ve spent the last several years working on is trying our best to understand how much of that signal we can actually extract.

**Adam:00:20:26**And how — So, this is kind of getting to the crux of where I was trying to go before with this segue or with this — …

**Rodrigo:00:20:36**Pause.

**Adam:00:20:36**How would the methods of traditional empirical finance go about trying to explain time series returns? And how does it contrast with the approaches that you learned during your oil reservoir optimization, and just generally being brought up steeped in a field that embraced the scientific method?

**Andrew:00:21:16**Sure. Well, I would say kind of the backbone of empirical finance, has been regression, has been in particular, linear regression. And this comes from the work of Fama and French, and all of the work that they did in kind of extracting factor returns from stock datasets. But embedded in that so — and regression, don’t get me wrong, has a lot of strengths. But it also comes with it a lot of assumptions about the dataset, in particular, that the relationship between the response variable or what you’re trying to predict, and what you’re using to predict it, which we call a feature is linear, or created by some linear combination. It doesn’t take into account the fact that there could be — or it doesn’t control for the fact that there could be kind of just different distribution assumptions at the tails of the distribution than what are observed kind of in, let’s call, it the middle or normal market environments.

What statistical learning and the machine learning as well, they allow for kind of much more nuanced descriptions of the data sets. So, for example, if a relationship exists between, and I’m going to speak in X and Y a lot, because it’s just the way I tend to speak. But X would be the feature in Y would be the thing that you’re trying to predict, right. If a relationship exists between X and Y, but it only exists in a particular domain of X, right, so if you were to cut X, say, at its left tail, and you observe that there’s a strong relationship, but only when X is say, less than one, or less than zero, let’s say X. When X is negative there’s a strong relationship that exists.

Well, you’re able to actually be nuanced about finding and identifying that relationship using the machinery of these kind of, let’s call them advanced statistical methods. Whereas if you were to use traditional finance methods of linear regression, you’re assuming that the relationship is linear, and it exists throughout the entire domain of X. Right? When in reality, there could be a lot more descriptive things going on if you were to slice and dice the data in a more thoughtful manner.

**Rodrigo:00:24:26**And I guess from the of the perspective of what most people have seen and understood, most financial advisors and practitioners, a common thing with regards to data and assumptions we’re making about equity markets is the, like you said, the idea of a normally distributed data series. I think most advisors at this point after, especially 08, have recognized that there is a big left tail that that four standard deviation event happens in financial markets more than it would in a Gaussian distribution and have had to adjust for that, right. So, whether they’re using third party managers, using tail protection, or CTAs, a lot of people have identified it , accepted that it’s different and had to create different tools in order to fill in those gaps in those particular extremes, right? That’s just one example of empirical finance failing us, in the industry.

**Andrew:00:25:27**Exactly. And it allows you to do other things. Like you can — there’s a whole field of what’s called non-parametric methods, which just allow you to make as few assumptions about the relationships that exist between data as possible. And rather than assuming, for example, that the relationship is linear, the relationship instead kind of emerges naturally from the data. So, this is the nice thing. And this is where I think a lot of science in general has changed in the last 30 years or so. Which is prior to that, people have been proposing models, and then observing how well those models actually fit reality. And the movement, certainly more recently has been, well, let’s start with data and let the data tell us about what exists there. So, it’s kind of inverting that whole workflow process, which is interesting.

**Adam:00:26:32**Another thing I think that many practitioners of traditional empirical finance, maybe don’t think about enough, are hyper-parameters. So, for example, somebody is investigating the idea of value, or fundamental value being a predictor of equity returns. And there’s just so many different decisions that are made as part of that type of investigation, right. A short list is what universe are you selecting from? Are you using all stocks? Are you using only large cap stocks? Are you sorting stocks into sectors or doing some other kind of sort? How are you defining value? Are you going to bin them into deciles? So, you’re long the top decile, short the bottom decile, or long the top quintile, short the bottom quintile.

There’s all of these different types of decisions that are made and we don’t sort of stop to think about the impact of each of those decisions on how explanatory this eventual model is. Maybe talk a little bit about how data science methods explicitly account for those different dimensions of the problem that are often taken for granted as part of traditional experimental empirical finance.

**Andrew:00:28:32**Yeah, yeah. So, you make a really good point there. You know, traditionally hypotheses are proposed, and then as you describe, a series of decisions are made to get at the final product, right? So, a hypothesis, so in your example you use whatever, price to book as an explanatory feature, and then you — what universe do you use, what is the way in which you convert predictions into decisions? Are you going to take the top half and the bottom half, are you going to take the top quartile and the bottom quartile?

**Adam:00:29:14**Are you going to hold them in equal weight or cap weight?

**Andrew:00:29:17**That’s right. Are you going to hold them in risk adjusted proportion? So, all of these extra meta decisions that are being made that result in the final process that are rarely accounted for. So, you have to — if you were a good scientist, you’d have to question and you’d actually want to understand what is the impact of each of those incremental decisions that are made along that process? And rather than getting kind of a point estimate of the final product of what it may look like, you now get a distribution so you’re able to kind of iterate through all of those potential design decisions. You now are presented with a distribution of potential outcomes.

And if you found that the particular or particular model set up that you landed on is at the 99th percentile, and all the other potential paths that you could have taken in model design decisions result in considerably lower performance, well, that should be concerning to you, right. And that should be raising red flags. And you should know that while you may have tried to take every special care in not overfitting the data, design decisions just kind of creep into the process. And you can — and because, unfortunately, we use, or traditionally quantitative modelers rely so heavily on back tests, it becomes very easy to inadvertently inflate your back test performance based on these design decisions, which in reality, the out of sample results will show you that they are no better than all of the other potential design decisions that you could have taken, right. Just, you’ve basically inadvertently done in sample fitting. And so …

**Rodrigo:00:31:31**Yeah, you subconsciously bias your process over time. Yeah.

**Andrew:00:31:37**And so the nice thing, and this is kind of the data science mindset is, there’s parameters of the model, and then you know, because computation has become so quick, we can also now do evaluation of the hyper-parameters. So, these types of design decisions. And what you want to do is you want to set up an experimental design that tests, what decisions can I actually make? Right. So, for example, if I’m actually able to determine that choosing the top decile is better than choosing the top half/bottom half, I can actually test that. And the way that I would go about testing that would be using the kind of data dividing techniques that are very common in data science, which is you have kind of a training set on which you make your decisions. So, in this case, you’re trying to say, can I make the decision between using the top decile or using the top half? And then I have these validation sets and of course, I have a final holdout set. But I have these validation sets, which will tell me, well, can I actually make that decision in a clean setting?

**Adam:00:32:53**Do I have an edge in making that decision?

**Andrew:00:32:55**Do I have an edge, precisely. Do I have an edge in making that decision? And so you’re actually able to set up, if you set up the experimental design thoughtfully, and this is kind of really a lot of what we’ve been doing in the last several years, is that you want to know what decisions can you actually make in a clean setting, and to the extent to which you cannot make a decision? Well, then you may want to diversify. Right. So, kind of like the antithesis of the lack of decisions would be diversification, right. And so — but the, I guess, the main takeaway is the testing methodologies that are just so common in data science are rarely used in quantitative finance.

Especially they — I think they’re being used more and more at the model level, but not at the meta-model level. Which is like all of the extra design decisions that came into creating the final product, but they should be. And so trying to be humble about what decisions you can make and what decisions you can’t make in these testing techniques from the data science literature are very helpful in assessing that.

**Rodrigo:00:34:18**Well, can we give a concrete example? You talked about Fama/French, for example, right? Can you walk through how you think they came to the conclusions that they came to as professors and academics and where we might find flaws in that approach?

**Andrew:00:34:37**I’m actually not a good student of a lot of the Fama/French work. I don’t know maybe Adam, you …

**Rodrigo:00:34:43**Well, let’s just do an abstract. Yeah, maybe Adam, you can talk about the idea of — …

**Adam:00:34:47**Well, I mean, the value factor from the original Fama/French papers in 91 and 92 is a really good example. Now, you’ve got to give them a break because they were dealing with some antiquated technology by our standards and brand new datasets that had very low granularity, and a very limited number of like equity based characteristics, right. So, they didn’t have long earnings histories. They had long price histories, or market cap histories and long book-to-market histories, right. So, that’s really all they had to work with. So, in terms of choosing the value metric, that’s like, we got to kind of give them a break there, right. But one of the design decisions that they did make was in creating their value factor, which is still, by the way, used today. This is the *Fama/French Value Factor*.

They’d have a large cap universe, which is above the 40th percentile by market cap, and a small cap universe that is below the 40th percentile by market cap. So, they run price to book decile sorts for the large cap universe, they evaluate the, or they generate the returns for that strategy. They do exactly the same thing for the small cap universe. And then they average the returns at each month, because they were using monthly data at each month from the small cap, long minus short, and the large cap long minus short. And that becomes the value factor. Now, trading stocks below the 40th market cap percentile means that you’re trading micro caps, right. You’re trading these little tiny stocks that trade by appointment, that are going to have massive slippage, that just are not practical to trade.

So, half of the returns that are — that form that omnipresent Fama/French value factor, you actually cannot generate as a practitioner, right? They’re completely useless. Right? You may be able to mostly replicate what goes on in the top 40 percentile, right, or 40% by market cap. So, I mean, that was a design decision that they made, right? Now, how did they make that? How did they decide on the 40th percentile being the breakpoint? How did they decide on trading deciles versus quintiles or tertiles or top half/bottom half they are holding by market cap all of these different decisions? Well, these can be made in a way that acknowledges that at each of these decision points, you’re making assumptions, right? So, for example, they could have tested using the exact same methodology at different breakpoints. So, we’re going to use the 20th or the 80th percentile as a breakpoint, the 60th percentile and the 40th percentile, right.

Now, how does that change the results? Now, can I use methods of data science to run a test at the 80th, 60th and 40th percentile breakpoints, observe the results in one segment of the dataset, use those results to then make a decision about whether I want to set the breakpoint at 80, 60 or 40, make that decision and then run it in the out of sample data. Do that a bunch of times and then, are my results better from making that decision? Or are they worse, or are they the same? Now, if they’re worse or the same, you can say well, actually, I have no skill. That’s completely arbitrary. This 40th percentile breakpoint is completely arbitrary. And I need to run it, run this experiment at all these different breakpoints and include all of those results in my experimental outcome, and then that will show a much truer sense of the actual dispersion in expectation for applying that methodology. And that’s just one dimension of the experiment, which is the market cap breakpoint. Right?

There’s obviously a variety of other types. You could run the same experiment, should I use deciles versus quintiles? Let’s see if I have an edge in making that decision, in an in sample out of sample framework, right? All of these different decisions. Now that’s a fairly simple experiment because they didn’t have a lot of data to work with. Modern experimenters are varying a much larger number of dimensions. What day am I rebalancing on? What value metric am I using? How am I specifying that value metric? Am I specifying profits as EBIT, EBITDA, GAAP earnings, operating earnings, etc, right? There’s all these different variables that can make a substantial difference on the outcome that experimenters are varying as part of their experimental process, but they’re not publishing all of the different results. So, the reader of the journal perceives it as a much higher level of precision in expectations about the application of that methodology, than there actually is in practice, when people actually go to use this out of sample.

**Rodrigo:00:40:32**Right. When you read the abstract, you get a certain conclusion that has been handcrafted by the practitioners, whether they do it consciously or subconsciously, right. Whether they haven’t actually published the — think about how modern finance works. You have a theory that that deep value is the way to invest and have better results in market cap, right, or have any sort of alpha on top of your beta. You’ve made a decision and then you use up all of your data, you test the whole dataset multiple times in different — using different parameters. You identify using all of your data, the ones that work, the ones that don’t. You publish the ones that do without ever knowing that the same theory applied to your experimental designs for the ones that didn’t, right. They probably use value metrics that didn’t.

I’m not — I have no idea what they did. But I know what I’ve seen in the field, right, which is exactly what I’m describing, where you have a wide variety, where they all stem from the same *a priori* theory about how markets work, but you will naturally gravitate towards the results in the back tests that make the most sense to your prior bias. And then you publish on that, people read the abstract and then start buying product or strategies or implementing their own strategies based on — because what they’ll do is they’ll grab the solutions in that paper, test them themselves, right, and get the results. And I replicated that, right? They haven’t replicated the value factor, they replicated the book to price factor, not that many other ones that might have not worked out at all, right.

So, this is where modern finance, I think, continues to be an issue when you talk to practitioners today, how they understand their own results, and their back tests and how little they think the history of modern finance has influenced their thinking about investing as well.

**Adam:00:42:31**Yeah. I think it’s — just to put a bow on this whole section of the conversation, I ran a really simple test. So, Lu Zhang and his team publishes a huge dataset of daily factor returns. I think it’s about 160 different factors on the Russell one — or the top 1000 by market cap at each point in time, so implementable strategies. And you’ve got 160-odd return series at decile sorts, quintile sorts, etc. That’s all — this is all free. I encourage people to go and perform this experiment themselves. So, I took all of the series that were defined as factor strategies. And I ran an in sample/out of sample kind of test, where I take 90% of the data and I observe the performance characteristics of each of these different series, each of these different value strategies. I’m going to say there’s 45 or 50 different value strategies, right? And I say I’m going to choose the value strategies with the highest Sharpe ratio, the highest compound return, the highest arithmetic mean return, the highest CAPM and alpha.

I use a wide variety of different types of objectives that I’m going to use to select a set of the best or the number one best value specification, right? So, I select it and then I carry that value into, that value selection, into the out of sample and say, okay, how well can I select — …

**Rodrigo:00:44:25**You apply those rules to the out of sample.

**Adam:00:44:27**— based on this objective. Everything I tried, and now I’m data mining, right? I’m data mining all these different objectives to see if I can use, you know, in sample best objective to select the best value strategy out of sample. Whatever I tried, was completely ineffective. Holding all of the different value strategies in equal weight outperformed any of my attempts to select any set of value strategies using an in sample data set, and using a wide variety of different objectives to try to make that selection, right.

So, all these papers who make claims about why using EBITDA to enterprise value or EBIT instead of GAAP earnings or etc, etc, because of some theory and then showing a full sample back test, that’s all rubbish. The fact is we cannot make any decisions based on the actual return streams that we evaluate. And if experimenters had performed their experiments in this in sample/out of sample way, then we would have known this 30 years ago. But instead we’ve got all of these different experiments conducted using the same flawed methodology and practitioners drawing unhelpful conclusions from that entire canon of research.

**Rodrigo:00:46:06**With, you know, all the incentives are misaligned in many respects, right. Even starting from the modern portfolio theory and CAPM and the gusto by which that has been adopted in the 70s and 80s, to create an efficient frontier of equities and bonds, right? Like that has — that was where we first started. I mean, that’s the first chapter in our book really, right. I’m talking about modern portfolio theory, people want to throw it out the window. It’s a decent statistical tool that has improper inputs based on *a priori* assumptions about equities and bonds, short term, mid-term long-term, right? So, you build a whole ideology from it, the 60/40 portfolio. And here we are today, it is canon and very difficult to change. I guess that leaves opportunities for firms like us, so we shouldn’t complain too much. But it is very interesting how that has evolved over the years.

### Neural Networks

**Adam:00:47:04**Yeah. So, I just want to segue to, because I think it was important to go through this whole background, because we’re going to talk about how we apply methods that do make choices. Right? We make a lot of choices with our methods. The point is, we are very explicit about using experimental procedures that allow us to determine which kind of decisions we have skill at making, and which kinds of decisions we don’t have skill at making and setting expectations appropriately. Right? So, Andrew, maybe let’s start there as a — so, what motivated you to want to go and do your PhD, right? Like, what kind of research have we or what kind of thinking had we begun to adopt? And then how did that motivate you to go and pursue your PhD?

**Andrew:00:48:02**Sure, yeah. And I would just like to say to your comment, Rodrigo, about CAPM and mean variance optimization, and there’s obviously a lot. I guess one of the big takeaways is to just always be careful about what you read. I think the theory behind mean variance optimization as a convex optimization that maximizes a particular utility is just great work. You know, the empirical results should always be kind of interpreted as a proof of concept of the theory, right, and not necessarily prescriptive, and not necessarily what you should expect, if you were to do this yourself on some other different dataset, right?

The theory is actually — is great. And the motivation for a lot of work is pushing the boundary and the frontier forward. And then supporting it with some evidence that shows where it works, where it does not, and where the potential flaws may exist, right. And so that’s really been how I read research papers. In particular, when you read the empirical evidence section of a paper that you take it with a grain of salt and that you should do your own experimentation, especially if you’re a practitioner in the field, and be aware of all of these decisions that are being made. You know, with that said, how did I — Yeah, so onto…

**Rodrigo:00:50:18**So, notice that I changed — I updated your name. You can’t just be Andrew, you’re Dr. Andrew Butler. Just want to make sure that everybody understands. Newly minted.

**Andrew:00:50:27**Yeah.

**Rodrigo:00:50:28**Go on.

**Andrew:00:50:29**Sure. So, in 2018, I was really missing some of the — the research that you do in an industrial setting can often be different than the research that you would do in an academic setting, just because a lot of the times you don’t necessarily have the space to sit with a problem, because there’s competing resources and objectives. So, I was really missing kind of the more formal research process that allows you to just kind of sit with a particular problem for quite some time, and to work out things on pen and paper, and whatnot. So, I reached out to just a really insightful and kind and generous professor at U of T who gladly took me on as a part time student in the Department of Engineering at U of T. And he was doing really exciting work on financial engineering, quantitative finance, operations research, data science. And so it was all things that I was very interested in doing.

And I always wanted to do something with neural networks. You know, if I was to pursue some, or if I were to fill kind of a gap in my knowledge, and to pursue something that I would find really interesting, it would have been in the neural network space. And for a long time, I had this idea of using neural networks, in particular, using convolutional neural networks for doing feature extraction on financial time series data. That was the original thinking.

**Rodrigo:00:52:48**Yeah. And so by the way, that’s University of Toronto, and University of Texas, which people when you say U of T, get caught up on. And, of course, University of Toronto is kind of leading a lot of the machine learning research in the world right now. So, very, very good academic pedigree in that space. But anyway, go on.

**Andrew:00:53:11**Yeah. So, I wanted to do research in neural networks. And I had a nice idea that I still actually think it’s a pretty good idea, which would use convolutional neural networks to perform feature extraction on noisy financial time series data. That was kind of my original proposal. And the challenge with that is that it just — I don’t think it had kind of enough depth. You know, you may get some interesting findings from that, but probably there wasn’t enough depth to fill out kind of, a full kind of PhD dissertation. So, then you’d have to kind of go on to another kind of tangential direction, maybe it’s related, maybe it’s not to kind of fill out a particular body of research.

My supervisor actually pointed me in the direction of a few really neat papers that I was immediately hooked on. He shared with me just some great work. And this is the work of Elmachtoub and Grigas out of Columbia on their *Smart “Predict, then Optimize”* frameworks. There’s some great work out of Carnegie Mellon by Brandon Amos and Dante on kind of task-based end-to-end learning. And I was immediately hooked because it had just this perfect blend of machine learning and optimization science. And we’ll see shortly that neural networks play a big part in this as well, that I was just immediately hooked. And I saw a lot of really great opportunities. So, I think it may be helpful to talk about what that actually was in more detail.

**Adam:00:55:17**Yeah. No, I agree. And I also think, or well, correct me if I’m wrong, but I think you really wanted to bring your experience working in industry to your research as well, right, so that it wasn’t just a pure application of theory. But it was a merger of really promising and interesting new theory with a rich spectrum of industrial applications.

**Andrew:00:55:49**Yeah, yeah, exactly. But I mean, to talk about the main idea. So, in many different applications of — or many different fields of engineering, operations, research, statistics, and also finance, you have to do — well, if you’re building complex models, those models have to do two tasks. The first task is a prediction task. And what a prediction task is, is it is taking in feature data, so it’s taking in features and it’s making predictions about unknown, let’s just say unknown quantities of interest, like, for example, in finance, the future return or the future risk of a particular asset. So, that’s the prediction model.

But predictions on their own are not particularly useful in a lot of applications. There’s often also a decision task, so that would be the second task. And the decision task takes as input the predictions and converts them into some kind of action or decision or some kind of actionable quantity that allows you to act in the world. Right? So, there’s this decision task that is reading in your predictions. And traditionally, these two processes are completely independent from each other. Prediction models are fit on training data using their own particular objective functions, and constraints and they’re completely unaware of how their predictions are actually going to be impacting the decisions that are ultimately being made based on those predictions. And so based on the work that I just mentioned, so the work out of Columbia, the work out of Carnegie Mellon, and others, there’s been kind of a more modern push to integrate these two tasks. And — …

**Adam:00:58:05**Can I press pause? Because I think it would be helpful to provide an example, right. So, and you did already with the idea of you’re going to use, let’s say, you’re going to use some sort of regression modeling, you’ve got a few features, a feature or a few features, you’re going to use those to predict the returns of an asset, maybe the risk adjusted returns of an asset over the next few days. So, now, what do you do with that? Right? Great. So, what is your optimal allocation given that we have this, you know, we have this forecast of asset returns, right? How much capital should we commit to that? Is this the only decision that you’re making at the same time? Or is this decision competing for resources with other decisions that you’re making at the same time, right? So, that’s — yeah, just wanted to make sure.

**Andrew:00:58:58**Yeah, yeah. So, I mean, that’s a great example. I mean, you can give other examples that don’t even live within the realm of finance. So, for example, this is an example that’s often cited. But imagine you’re doing route planning. So, something like Google Maps, it’s not necessarily how Google Maps works, but let’s use it as, you know, to be illustrative. So, you’re trying to do route planning, you’re trying to go from point A to point B in a city, from start to destination. Now, in order to — your goal is of course to go from point A to point B in the most efficient time. So, minimize the expected time to travel. In order to do that, there’s a whole bunch of potential paths that you can take to go from point A to point B, right? There’s a whole bunch of different roads in the city that you can actually navigate through that will get you to your destination. You want to find the particular path that gets you there the fastest way.

To do this, or one way to go about doing this is that you would have a prediction model first. And that prediction model is going to be making predictions about the expected travel time of each road or each edge in that network to be technical. Each edge in that network, we need to predict what’s the expected time to traverse that edge, right? And that’s going to be based on a prediction model that’s going to be taking in seasonality features maybe, it’s going to be taking in known accident data, weather data, who knows what it’s taking in to make a prediction about the expected travel time. And then, well, that’s great, you have these expected travel times, but they’re not that useful on their own. They’re only useful if they’re passed to a decision-making process. And that decision making process is going to be solving an optimization problem.

It’s going to be solving in this case, the shortest path problem. But it’s going to be using as input the predicted travel times from your prediction model, right. So, there’s this natural feed-forward, from predictions into final decision making, right. And the traditional way that this is this is done is that they’re completely separate from one another. So, predictions are made and they’re fit to maximize some kind of prediction quality. So, say, maximize, let’s say prediction accuracy. Technically, that would be you know, maybe it’s to minimize the mean squared error of the prediction or something to that effect, right. So, you’re trying to maximize prediction quality, or accuracy. And then once you have those, those predictions, you’re going to pass them to your decision making model. And that decision making model is doing something different, it’s doing a shortest path problem, which is actually has a different objective. That objective is, give me the path that minimizes travel time. Right.

And so there’s some potential inefficiencies that exist in this kind of decoupled or independent process. The fact that the prediction model is not aware of how the decision is ultimately being made, and the fact that it also has potentially an objective function that is different or maybe inconsistent with how decisions are being made, lends itself to some opportunity. And the opportunity there is to integrate the two. And so the whole kind of body of work that we worked on was on integrated methods. So, and by integrating, what you mean is that you want to fit prediction models, not to necessarily maximize prediction accuracy, but rather, so that when predictions are passed to a decision model, they induce decisions that optimize your decision objective. And it’s a different way of thinking about the whole prediction modeling process.

**Rodrigo:01:03:18**Right. So, just we’re talking about, let’s say, corn, and finding an accurate way to predict the price of corn tomorrow, or having a model that predicts the price of corn well, versus the ultimate outcome, which is, I want a portfolio that, let’s say, has a high information ratio. And those are — one might not be useful for the other or one needs to integrate the other, right. So, we were talking about the example of the documentary on Netflix for the Event Horizon Telescope, the first — capturing the first image of a black hole. And there are hundreds of telescopes that are part of this discovery, right. And you can make it so that each one of those telescopes is as accurate as possible about the thing that they captured, right? So, one telescope might be capturing X-Ray, one another might be capturing light.

And you can — the individual telescope, you can ask it to capture the most precise version of that parameter. But the ultimate goal of the team, this international team, was to provide an image of a black hole that was going to be useful for society to want to see it with their visual spectrum for the first time, right? So, you have the ultimate goal. And then you have all these instruments that could optimize for the individual parts. But really, and they do separate them into two teams to try to not be biased in terms of trying to create an ultimate outcome that is as useful as possible to their goal and society. So, I think that was an interesting example. If you havn’t seen that one, it’s on Netflix, it’s actually fascinating. But you do have — There’s a big difference between the individual parameter, the individual, let’s say in the case of our investment contract, and what we’re trying to maximize there ,versus the ultimate outcome. And integrating those two is an interesting approach that is, I guess, what you’ve been working on, right?

**Andrew:01:05:22**Yeah, yeah. So, that’s a really good example. Right? So, you’ve got all of these teams who are optimizing local, we’ll call them local objectives. And actually, the end result ended up being great. So, it’s not as though that, you know, one is necessarily better than the other. It’s not a universal, that integrated methods are better. And I’m quite clear about that in the thesis. But in this particular example, that you described about the satellite imagery is that you’ve got a whole bunch of teams doing local, optimizing local objective functions for a particular task, when in reality, at the end of the day, all of those objectives, or rather, all of those candidate images are going to be collated together to produce a final output.

So, another way to think about performing this task would be well, we know what the end goal is, how do we fit, how do we optimize each of the individual machinery, so that when they work in concert, they accomplish our end goal. And that’s a slightly different way in thinking about the whole modeling process. Right. And this is really I think the work of Elmachtoub and the work of … on what’s called* task-based end-to-end learning* was kind of really the, the modern seminal work in this field.

**Rodrigo:01:07:01**And you’re seeing right now there is a debate between the two sides? Or not a debate, but there is two teams, two groups, two ways of thinking about it that are going at it, right, academically?

**Andrew:01:07:12**Yeah, that’s right. There’s kind of like two schools of thought. Do you want to be — because, in reality, if you were to be — if you were to optimize your individual components perfectly, then this whole problem of integration goes away. What I mean by that is that, if you have perfect forecast accuracy, then you will invariably make optimal decisions, or you will invariably produce an optimal output. So, if you were to be very micro-focused on each, optimizing each individual particular task to extract the most amount of information that you can get, and then the result of all that is that you have perfect forecast accuracy, we’ll get perfect forecast accuracy and you were to say, let’s say you have perfect forecast accuracy for forecasting assets, if you pass those perfect forecasts to a mean variance optimization, you will actually produce the optimal mean variance portfolio. Right?

The reality is that, in practical terms, almost every model produces some form of prediction error. Right? And so it’s very rare that you’re going to get perfect forecast accuracy. And so that’s where there actually is opportunity for a different kind of modeling paradigm, which is this integrated approach that tries to optimize models so that they produce optimal decisions, irrespective of how good they’re actually doing at forecasting.

**Adam:01:08:53**Right. So, if you’ve got a — there are lots of situations where you may want to calibrate each individual model so that that model has slightly higher error than it would if you were simply optimizing each model individually to minimize that model’s forecast error, right? Because when you put all of the different models together, where each of the models is slightly mis-calibrated for its own objectives, in aggregate, the total error based on the decision objective is much lower.

**Andrew:01:09:41**That’s right. That’s right. The decision errors, which is ultimately if that’s all that you care about, the decision errors are lower, right. And in this industry, the decision errors are the errors that you make at the portfolio asset allocation decision level, right. Now there’s obviously, there’s some drawbacks from using these integrated methods, one of which is that predictions can be very counterintuitive, right? So, you talk about, you’re basically foregoing prediction accuracy, because all you care about is decision accuracy. Well, by virtue of that, you can actually end up with prediction models that look nothing like the data, right? And if you require some kind of — you want to have some explain-ability, or you want to be able to look at the models and intuit kind of what it’s doing under the hood, you will lose a lot of that in these integrated methods, because the predictions can be vastly different than what the underlying data looks like. Which you don’t necessarily get if you were to just isolate for prediction accuracy, right? I actually have some interesting examples, that if I share my screen, we can actually take a look.

**Rodrigo:01:11:13**Let’s do it.

**Andrew:01:11:22**Okay, can everyone see my screen?

**Adam:01:11:25**Yes, sir.

**Rodrigo:01:11:26**We can.

**Andrew:01:11:27**Yeah. So, let’s focus on the graph. I’m going to explain what’s going on in these charts in a second. But let’s focus on a graph on the left here for a second. What we have going on here is we’ve got feature values on the X axis, and we have asset returns on the Y axis. So, think of this as say maybe this is kind of your, I don’t know, Z score of your book to price. And this is how the returns of two stocks would respond to changes in book to price. So, we have this for two assets, in this case, the red asset and the blue asset. Now, the dark red and the dark blue dots, that illustrates how the actual returns of those assets would respond to changes in the feature value, to changes in X, right. And if we were to fit a model, let’s suppose that we were going to fit a linear model to this dataset, then what ends up happening is that we actually get, by virtue of the fact that we’re fitting an ordinary mean squared model, we’ve got predictions that are optimal with respect to mean squared error, right? So, they are optimal with respect to let’s call that prediction accuracy.

But if our objective, if our end objective was to say, maximize the return of a portfolio, in this case, a portfolio of two assets, then in reality what we would want to do is we would want to hold say, 100% of the red asset, when the feature value is slightly greater than one, and that’s denoted here by this orange line. And then we would want to hold 100% of the blue asset when the feature value is less than roughly one, right? And so you got this optimal decision boundary.

Now, if you take a look at what our predictions are doing, our predictions intersect at roughly X is equal to zero. And so despite the fact that these are the quote-unquote best predictions from a prediction accuracy standpoint, they ultimately lead to suboptimal decision making, because the point in which they intersect is around X equals zero. So, you actually, you hold the red asset in the range of say X equals zero to one, when in reality, what you would want to be holding to maximize your end objective would be the blue asset. Right?

Now, if you contrast that to the graph on the right, this is an integrated solution. So, the integrated solution doesn’t actually look at how well it’s doing in terms of mapping to the data. All it’s optimized for is for inducing optimal decision making. And you can see that because the point in which the two prediction lines intersect is precisely the point of intersection of the underlying data. So, this model here would actually induce optimal decision making, right.

To speak to the fact that integrated models can be — that there’s — so that’s an obvious advantage of the integrated models, right. And kind of one of the core advantages is that your prediction objective and your decision objectives are aligned. Right? Some of the disadvantages, which I alluded to before this is that, well, the predictions can be very counterintuitive compared to the underlying data. And so if you look at this example here on the right, this is also, this prediction model also results in optimal decision making. But the predictions look nothing like the data. Right? So, there is a drawback here, the predictions can be very counterintuitive. In fact, the blue line predictions have an inverted slope in comparison to what the underlying data suggests.

So, if you were a portfolio manager, and you were looking at, well, what’s my prediction model doing today and you saw this, you may be a little concerned. Unless, of course, you peel back the onion a little bit more and realize that, oh, well, the model that it landed on, is in fact, optimizing for decision accuracy. And in this particular case, this model instantiation would lead to optimal decision making.

**Adam:01:16:37**It is definitely interesting and counterintuitive. I mean, you’ve got, clearly, the slope of the blue function is positive. But in an integrated context, you actually flip the sign on the slope of the predictive function for the blue asset, right. So, it’s, yeah, a good illustration of how the interpretability of this kind of prediction decision method can be counterintuitive and impair explanatory value.

**Andrew:01:17:14**Exactly. Another drawback, of course, that may be obvious is that the solutions are not unique, right. So, for example, this is also an optimal solution, the one on the left, the one on the right, the one that we saw up here, these are all equally as optimal from a decision accuracy standpoint, right? So, now, you’ve got this other challenge of non-uniqueness of solution, that for a lot of prediction modeling problems you don’t necessarily have. So, for example, if you were to minimize the mean squared error, and your objective is linear, or rather, your function is linear, well we know that that’s a convex function, and it’s got a unique global minimum. So, no matter how you’ve initialized the seed of your optimization algorithm, you will always land on the same solution. That’s not necessarily the case for these integrated solutions.

And so there’s definitely challenges and some of the counter intuitiveness, the lack of interpretability, that are still kind of being addressed, because this is research that’s kind of in its relatively early stages.

**Adam:01:18:28**So, then, would you want to optimize for decision accuracy? And so let’s say you find multiple minima in terms of solutions for decision accuracy, then would you want to choose potentially the solution that also minimizes individual prediction error or form some kind of Pareto Frontier of those two objectives?

**Andrew:01:18:57**Yeah, yeah. So, that’s a great — and there’s definitely been a lot of work that looks at doing some kind of the Pareto Frontier or some trade off between mapping closely to the data in terms of prediction accuracies, but also satisfying your desire to accomplish optimal decision making, right? So, optimizing that frontier, there’s definitely some interesting work that has been published and is continuing to be published on that. You could reframe it slightly and say you want to maximize your decision accuracy, subject to constraints on how much you want to deviate from prediction accuracy quality, right. And there’s some interesting kind of formulation.

So, for example, if you have strong priors that the slope of the regression line should be positive, well, maybe you want to embed that as a constraint in the optimization. And so then you’re going to converge on solutions that at least from a sign perspective, are consistent with your priors on what you know about the underlying data.

**Rodrigo:01:20:11**So, can we just pull on that just in an English perspective. If you believe that momentum is a real thing; that things that have positive momentum, are likely to continue to have positive results and things that have negative momentum are going to continue to have negative results, so, that would be a positively sloping linear regression, right, a positively sloping. So, you have a strong belief that momentum is a real thing. You don’t want to deviate away from that. But your modeling tells you that there’s limits to this, that it’s not linear, that it might actually be the complete opposite of certain points.

What you’re saying is that you’re strong priors from a modeling perspective, you as the practitioner, you are going to create some sort of trade off or blend between the two. I guess, in order to keep yourself sane, and a lot of — we talked about interpretability. You know, in our industry, it seems to be an important thing. So, is that kind of what you mean, by the way like that? Yeah.

**Andrew:01:21:10**Yeah, absolutely. You’re absolutely right, and your example about momentum is spot on. I would say also, just on interpretability, there are and there’s definitely been some research on that, there’s a case to be made that the integrative methods in particular examples or particular kind of use cases can be more interpretable than the kind of decoupled alternative, right? So, for example, if — so, forget about linear regression models for a second. Imagine that instead, you are building a decision tree model. Right? Well, and this is the work, this is actually work also, by Elmachtoub on Smart, Predict then Optimize trees. What they show is that depending on your problem set up, if your end goal is to optimize for decision accuracy, then in a lot of cases, you require trees, decision trees that have much smaller tree depth, right.

In some cases, a tree depth of one, and a tree of one is very interpretable. You can actually pull up that model and see exactly how that prediction model is making decisions. And in contrast, in order to, in some cases, get the same level of decision accuracy, but using a decoupled approach, you would require trees of much deeper depth, which, of course, would be much more difficult for, say, a human to pull up and know precisely how predictions are being made, right? So, if you have a tree depth of 10, for example, that’s much more difficult to parse than a tree depth of one, right? And so there are certainly some instances and the field is fresh and definitely ongoing where they show yeah, well, in some cases interpretability could actually be improved, which is interesting.

**Adam:01:23:22**Very neat. And this also seems like a good use case for stuff like *boosting bagging*, etc, right?

**Andrew:01:23:29**Yeah, yeah, exactly. We actually, we wrote a paper that uses boosting as — so boosting decision trees as the underlying prediction model. For anyone who’s not super familiar with how boosted decision trees — it’s sometimes called *residualization*. But effectively what happens is that you have, your goal is you have a base learner or a weak learner, often called a weak learner that is a decision tree. And usually, it’s a very shallow decision tree, depth one, right, for example. So, a decision stump. Your objective is to build an ensemble of these decision trees that in the traditional case would be to maximize prediction accuracy, for example. So, minimize mean squared error. And the way that you go about doing this is you fit a first model, you then residualize or perform what’s called gradient boosting in which case you are then now trying to fit subsequent decision models that further improve your objective function.

Interestingly, so we wrote a paper that very simply just replaces the traditional objective function of most decision tree, boosted decision tree regression models, which is usually something like mean squared error, with the final end decision objective that you’re trying to accomplish. And these have a whole bunch of interesting technical challenges, and we were not the first to actually propose doing this, but we actually, were the first to propose doing this in a very general settings. We actually did it for a very general class of optimization problems known as *convex cone problems. *

But yeah, I mean, it lends itself very nicely to different types of prediction modeling classes, like decision boosting, well, like gradient boosting models. And you can show that in some cases, you require much fewer iterations of gradient boosting in order to yield the same decision accuracy, whereas a traditional approach would be completely unaware of how many trees it actually needs to reach a certain level of decision accuracy criteria. And so you can again, interpretability, as well as kind of sparseness or keeping model complexity low is certainly, there’s some advantages there as well.

**Adam:01:26:33**Okay. Well, let’s not bury the lead too far. So, you applied this framework to actual market data, right, and you sort of compared common use cases of you’ve got a bunch of futures, you’re going to try and predict the next period return of this market, and then you’re going to form a portfolio of all these markets based on these predictions, right, using a *predict, then decide*, versus an integrated prediction decision context, right. So, maybe walk us through some of the — because I know you’ve got some different constrained use cases, right. So, let’s walk through some of the results.

**Andrew:01:27:16**Sure, yeah. I mean, one of the key findings there was that for very specific types of portfolio optimization problems, which is in particular, when the portfolio optimization is — So, let me back up for a second. We’re trying to do predictions, the prediction methods that we’re using are simple linear regression models. And what we want to do is we want to pass those predictions, those regression predictions to a mean variance optimization portfolio, to make optimal decisions. And as you say, Adam, we were looking at the decoupled approach, which first fits by ordinary least squares and then optimizes versus an integrated approach.

And one of the key findings was that for particular constraint sets in particular, when the mean variance portfolio is either unconstrained or has only linear equality constraints, then you actually get closed form optimal solutions for the integrated regression coefficients, which was interesting. You know, this was inspired by you know, the work of Gould on Differentiating Parameterized Arcmin, but effectively, we kind of took it to that one extra step, which is that you can actually reformulate the whole problem and what falls out of all that is regression coefficients that you can compute effectively just as fast as you would compute ordinary least squares, which is great.

From the more empirical side of things, we did some prototyping on futures data, that compared to the integrated versus the integrated — sorry, the decoupled versus the integrated approach. And what they illustrated, and we were using features such as you know, long-term trends, so like the 252-day trend or the carry of a particular market as input features for the regression model, is that in a lot of instances, you could drastically improve the out of sample decision accuracy. So, as measured by say the out of sample kind of mean variance objective, but from a more kind of practical standpoint, things like the out of sample Sharpe ratio, or the risk adjusted performance characteristics also improve quite significantly in this particular use case.

**Adam:01:30:09**So, what’s the — maybe just spend a couple minutes describing the experimental setup?

**Andrew:01:30:15**Sure, yeah. Well, the experimental setup was actually very straightforward. It was, you have data that goes back, let’s say roughly 30 years. You’re going to use the initial 10 years of that data set to train your models, and then you’re going to perform a kind of walk forward test, a very standard walk forward testing and evaluating framework, where kind of each, I think, every couple — each two years, you incorporate that new two years’ worth of data. You refit your entire models, and then you evaluate the efficacy of that trained model over the subsequent two years.

**Adam:01:31:02**Right. So, you’ve had your first 10 years, you’re going, in the decoupled instance, you’re going to find the optimal regression betas for forecasting next period returns based on some set of features, trend, carry-type stuff. You’re going to apply those models to forecast returns, at each time step for the next two years. When you forecast those returns, you’re going to feed them into a standard mean variance optimization to form a portfolio. And then you’re going to observe the returns of that portfolio at each time step, for the next two years. After that two years, you’re going to go back and refit all those regression betas on the new full historical sample, and just continue to walk that forward, re-calibrating the betas every two years, right? That’s the decoupled approach. And then the IPO approach is to find the regression betas, but find the optimal regression betas using, based on the final decision objective, which presumably — was mean variance utility, or was it mean variance utility or Sharpe ratio?

**Andrew:01:32:24**Yeah, it was actually mean variance utility. Exactly.

**Adam:01:32:26**Right. So, optimizing for the mean variance utility of the portfolio, and then do the same kind of walk forward process, recalibrating every two years.

**Andrew:01:32:36**Right. Exactly. Yep. Yep. So, that’s spot on. And so these results were very promising. We also, so it’s not limited to mean variance optimization. We explored kind of more risk based optimizations, as well. So, these would be cases where, rather than you’re trying to forecast future returns, you’re actually trying to build a forecast for your covariance model, right. So, that — or something that is going to be informative for the risk of the portfolio. In this case, we use covariance, right. And we actually explored a wide variety of risk based portfolio optimization. So, we explored minimum variance portfolios, maximum diversification portfolios, and equal risk contribution portfolios. And those results were a little bit more mixed.

There were some very persistent results on the minimum variance aspect. In particular, what we did is we looked at, we performed a study where we looked at forming stock portfolios of different size. So, stock portfolios of say 25 stocks, 50 stocks, 100 stocks, randomly drawn from, I think it was 300 or 250 potential stocks. So, you can imagine there’s a lot of different combinations of 50 stocks from 300. And effectively performing the exact same tests as you just described, but where instead of forecasting an expected return, your forecasting a covariance and comparing the decoupled approach, which uses like multivariate GARCH, to do the covariance estimation versus the integrated approach, which fits the GARCH parameters, based on the decision error that it’s inducing.

And what we showed there was that, in particular, as the number of assets in the portfolio grows, kind of the persistence of the ability to minimize out of sample variance of the integrated method was quite high, in comparison to the decoupled alternative.

Now, interestingly, we also as I said, attempted to do this for maximum diversification as well as for the ERC portfolio. And you really saw no benefit to the integrated approach whatsoever. And we were trying to optimize for objectives that were consistent with the objective of the underlying portfolio optimization routine. So, we’re trying to optimize to maximize X post diversification ratios, or to minimize X post risk dispersion, right. But what we found is that, and this could speak, this was an interesting finding, but it could very well speak to just the stability of some of these risk parity like optimizations. But they were especially ERC, the Equal Risk Contribution, very resilient to errors in the covariance matrix. And so you just didn’t see. So, these are the interesting findings is that in some instances, you saw positive effects from using an integrated approach. And in other instances, there was no effect and in some cases, the traditional decoupled alternative did better, right?

**Adam:01:36:19**Am I right in interpreting the IPO-like covariance estimation as finding kind of an optimal shrinkage parameter on the covariance matrix? I mean, it seems, interpreting those results, it’s a little easier if I think about it that way, if you sort of think about ERC as introducing quite a large shrinkage parameter towards the equal weight portfolio as an example, right? Or towards equal correlations or equal covariances type thing, right? Is there any utility in thinking about it from that direction?

**Andrew:01:07:12**Yeah, yeah. I think thinking from it from that direction is very helpful. And in particular, so if you think of, say, the work of … and Wolf on, they’ve done a tremendous amount of work on covariance, linear and non-linear covariance shrinkage estimators for covariance matrices. And the way in which they determine the optimal amount of shrinkage is usually by minimizing the error between the in sample covariance matrix and the subsequent out of sample, right. So, give me the shrinkage factor that’s going to minimize my error to the X post, which would be like minimizing prediction accuracy. Right? And in contrast, what you could do, as you just stated.

Now, this isn’t necessarily what we did, but you could, in theory, find a shrinkage factor that rather than minimizes covariance prediction accuracy, just finds a covariance that minimizes your subsequent downstream objective, which is actually to form a portfolio that subsequently has out of sample low variance, or as low a variance as possible.

**Adam:01:38:29**Right, right, right. Yeah.

**Andrew:01:38:30**Right. And there’s actually, I think, some research fairly recently that goes in to describe that even the known minimum variance portfolio — sorry, the known out of sample covariance is or rather, the Oracle covariance is not necessarily the covariance is that that’s going to lead to a minimum variance process, which is very interesting. It talks about constraint configuration, and they have kind of a unique way in which they define the Oracle. But yeah, I mean, it …

**Rodrigo:01:39:09**That is counterintuitive.

**Adam:01:39:12**So, on the IPO, I think it’s useful to showcase some of the results for the actual commercial style applications that many of our listeners will be interested in, right? Like, so, when you actually compared the decoupled approach to the integrated approach in a full mean variance context on, for example, a futures universe using trend and carry features; how did that play out? You know, are you able to show any examples from the papers or is it better to just describe it qualitatively?

**Andrew:01:39:54**Yeah, I can describe it qualitatively. I actually don’t have them locally pulled up here. But yeah, so depending on the experiment setup, so this is the interesting thing. For unconstrained mean variance portfolios, you would see the biggest boost. And by that I mean, so we were doing, it was I think, roughly 25 or 30 asset futures portfolio, building unconstrained mean variance portfolios, and comparing the integrated versus the decoupled. And you will often — …

**Adam:01:40:29**Let’s stop because I think it’s useful to. What kind of constraints in a futures context, would say a managed futures manager or a CTA style manager typically enforce? Does it more resemble the unconstrained problem? Does it more resemble the constrained problem? Or how would you — how would the listener maybe think about that?

**Andrew:01:40:53**Sure. At the end product level, the types of constraints that you would enforce would be like exposure constraints. So, for example, I can’t be more than 10% exposed to crude oil long or short, right, or my — the sum of my energy exposure cannot be in excess of 20%, as just examples. So, these will be things — so, these will be constraints applied at the portfolio exposure or weight level. There’s other forms of constraints. That will be things like, the volatility of the portfolio cannot exceed, the estimated volatility of the portfolio cannot exceed 20%. So, that would be a volatility constraint, or you can get really fancy. It could be like, my estimated value at risk cannot be in excess of negative 4% to the downside or something like that. So, there’s a whole bunch of different types of constraints that you can add, but they all fall nicely into the mean variance optimization framework.

**Adam:01:42:08**And I think, in contrast to the constrained use cases that you explore in the paper, which I think are — is the most common dimension of constraints, right? It’s typically are they — are we imposing, long only, right? Are we imposing sum to one, right? Those would be more common for, for example, a stock selection type problem, right? You’re trying to create the most efficient portfolio whatever. Maximize information ratio, maximize Sharpe ratio, subjected to a long only constraint or a no leverage constraint, or both. But in a managed futures context the typical constraints map much more closely to the unconstrained case than the constrained case. When you think about typical constraints being long only or sum to one, right?

So, I think in a managed futures context, while absolutely all managers presumably are imposing some kinds of constraints on maximum risk levels, maximum concentration levels, maximum leverage ratios, CFTC, max exposure levels or liquidity type constraints, those are typically more around the edges. Right? And so the unconstrained is I think the — well, would you agree that the unconstrained is the closest analog and the most useful for thinking about the expected benefit of IPO versus the decoupled approach in a managed futures context?

**Andrew:01:43:46**In a managed futures context, yes and no, and it depends on how you’re constructing it. So, usually, quantitative managers have more than one model, right? And so often, and this is a tactic that we use often, is that it can be very helpful to run a bunch of models in unconstrained format. And they’re going to be specified different. Maybe they use different features, maybe they use a slightly different universe mix, what have you. But you have all of these unconstrained models that at the end of the day, you can then combine together. Now any one of those models may have portfolio exposures that would be, you know, uncomfortable or intolerable from a — … Yeah, exactly, from a max exposure limit, may not adhere to CFTC limits all of these very nuanced details. Any one of these models may not be appropriate.

But when you average them together, you get often all of this netting effect that takes place. And so then it may be very helpful to run unconstrained models and then subsequently apply constraints at the final end product, right. So, that’s what I mean. So, in terms of any one particular model, probably not appropriate, right. But the results were meant to be illustrative of this type of benefit. And in the unconstrained case you saw a pretty substantial improvement in Sharpe ratio. You had Sharpe ratios that were roughly 50% larger than the decoupled approach, which is very interesting. And then we performed a whole bunch of statistical tests to see how persistent was the improvement in Sharpe ratio.

So, if you were to randomly draw 252 days or roughly one year from the out of sample distribution, how likely would it be that your integrated solution had a higher Sharpe ratio than your traditional solution. And so we would see kind of ratios in the 65 to 70% range for the unconstrained case, right. So, what that means is that in 65% of samples, you’re likely to have a higher out of sample Sharpe ratio than you would with the traditional approach.

**Adam:01:46:32**And that’s sort of single period, Andrew, or is that over the full simulation horizon?

**Andrew:01:46:39**That would be over the full out of sample simulation horizon. Now, the point on constraints is a very important one to make, because constraints — the action of constraints limit the opportunity for the decision model to be different. And so if you think about — this is a helpful exercise. Think about the limit in which your constraints are so limiting, that they actually only yield a singleton or one potential weight solution, irrespective of the input. Well, in that limiting case, the traditional approach and the integrated approach would just produce the exact same solution, right. Constraints are so tight, right? The more that you begin to relax the constraints, the more opportunity there is for dispersion between the two approaches, because the influence of the predictions becomes greater.

But what we saw, generally speaking, was that in the presence of constraints, the dispersion in outcomes between the integrated solution and the traditional solution were much smaller, right? So, rather than having 65% of samples, you would have something like 55 or 60% of samples. It depends on how tight the constraints were. So, constraints actually play a very important role, and will dictate to a very large degree, how likely it is for you to even observe differences in outcomes, depending on the prediction models. And they also — and this is intuitive. Constraints are known to act as *regularizers* in the process. There’s good literature on the fact that different types of constraints attenuate different specific types of estimation error, right. So, this is all very consistent with that line of thinking.

**Adam:01:48:56**Right. So, I know because you kind of worked for, well, 10 years in the industry before you defended your thesis on this, right. What would you say are the practical takeaways for portfolio managers, right? How are we seeking to include some of these findings in what we do? And what are some practical takeaways for people who manage stock portfolios or other people who manage less constrained or global macro or managed futures style portfolios, do you think?

**Andrew:01:49:40**Right. I think the biggest takeaway for me is that you get different types of models. And in a world, like in financial modeling, where there’s so much uncertainty that the nice thing is that you have models that are interpreting the data through a different lens, right. So, while it may be — while it may not be the case that, and this really depends on problem setup, but it may not be the case that an integrated solution dominates a traditional solution or vice versa. But what is certainly the case is that you get different solutions. And so the process is likely to be different. And this is actually explained nicely. We actually examine, in the case of the regression problem, we examine the regression coefficients. And what we observe is that the ordinary least squares solution had negative coefficients when the integrated solution had positive coefficients, right.

And so you get this nice level of, call it *process diversification*, call it *model diversification* that, I think could be very beneficial in a practical setting. So, it’s not kind of a *this or that*, but it’s a *yes and* type of solution. And integrated solutions, I think the space is very early. If you think about the space of — most of statistical learning which is 50 plus years of literature, the modern lens of integrated solutions have really only been developed in the last, call it five to 10 years. So, there’s, I think, a lot of neat, promising research and a lot of neat technology that is available in open source to do this type of experimentation.

**Rodrigo:01:51:54**Andrew, can you talk — I mean, I think the listener would find it informative to really show 50% improvement in Sharpe ratio when you’re optimizing for that, and then have the conclusion be, hey, you should use both. So, why don’t you expand on that a little bit and help us understand where the value of humility comes in into this whole process of investing and optimizing for financial markets?

**Adam:01:52:25**Yeah, this is a really good question. This is now the difference between knowledge and wisdom and experience. That’s, I love this. Yeah.

**Andrew:01:52:33**Yeah. Well, there’s so much. I think this goes back to what we were just discussing at the start of the podcast, which is that there’s so many different micro decisions that are being made in any particular experiment, right? And I have this mental model of idea space where if two models share, this is an abstract idea, but if they share a lot of the same DNA or ingredients in terms of their ideas, well, then they live in idea space kind of in the same neighborhood, they live together. And if they have very different outcomes, well, then you kind of know that what you’re observing is not necessarily that one instantiation dominates the other, but rather, that the expected out of sample performance is likely somewhere in between those two, right?

And so when you observe a bunch of these tests, and there’s more tests that we haven’t discussed, but you observe these tests, and you see that there’s problem instances that the traditional method outperforms the non. So, for example, in our gradient boosting T test, we did integrated solution versus gradient boosting decision trees for a portfolio optimization problem. And the traditional approach produced lower out of sample decision errors than the integrated solution, right. And so you can see that there’s got to be cases. So, like, the results that you get are going to be so dependent on the datasets that you’re using, the problem assumptions that you make, the features that you make, and what you will often see is that they will not necessarily — one will not necessarily — it’s very rare that one particular method dominates universally another.

**Adam:01:54:51**So, talk a little bit about that, Andrew. Talk about the difference between kind of the idea of stochastic dominance versus typical kind of T tests that are employed in traditional empirical finance, and which are the source of like, the vast majority of decision making in the factor space, for example?

**Andrew:01:55:10**Yeah, yeah. So, in very general terms, the concept of stochastic dominance means that in almost every, you can say, technically with probability one, you’re going to observe that this particular process produces a higher, let’s just say, objective function value than another. You can say that with almost 100% — you can see that with effectively 100% certainty that no matter how you instantiate the problem, you’re always going to dominate the alternative. And you typically don’t get that level of confidence, especially in finance. In finance, you’re actually dealing more often than not, with very mediocre T tests. So, in a T test, you’re evaluating, say, in this case, the performance expectation of one process versus another and determining how statistically significant the means are usually …

**Adam:01:56:16**Based on an in sample fit, typically.

**Andrew:01:56:19**Based on — yeah, usually kind of based on some kind of quasi in sample fit, right. And more often than not, those two distributions are highly overlapping, which means they’re nowhere near the kind of quality of a stochastic dominance process, and they have P values that in the best case scenario, are sometimes you get the fifth percentile level. Which still means that 5% of the time, you’re expecting that the other process will outperform. And this is shrouded with in sample fit and various different other assumptions. More often than not, the T tests, the test statistics are much weaker than that.

**Adam:01:57:04**Yeah. The other issue for asset managers is time horizon, right. So, that’s why I asked you, when you are performing your quantile/quantile analysis, 60% of the time in certain practical use cases, 60 to 65% of the time the integrated approach outperform the decoupled approach, right? But that is over a 30-year horizon, right. Over the typical six months to maybe three years that investors will grant you in order to determine whether your process is better than some other random assortment of managers that they’re putting you up against, you know, those numbers are much closer to 50/50. Right. Which is why there’s always this tension, the investors need to stay with you long enough to realize the benefit of your excess edge, right.

And so there’s this tension where while you may fundamentally believe that over 30 years, 60% of the time this method is going to be better than this method, that still means over 30 years, 40% of the time, they won’t, it’ll underperform, right? But you’re not going to have, you know, if your better methodology over 30 years randomly underperforms over three years, you don’t get the chance to show how good your performance will be over 30 years, right?

**Andrew:01:58:39**Precisely.

**Adam:01:58:40**So, there’s this path dependency that is always an interesting tension in asset management. And this is not purely selfish. This is in order for the end investor to benefit from your better methods, you need to provide them with a path of returns that is competitive enough to keep them invested in your product, rather than being attracted into alternative products in the meantime, so they never get to benefit from it.

**Andrew:01:59:07**Yeah.

**Rodrigo:01:59:08**Well, this is why it’s so useful to — when you think about multi-strats, right, the … working the best back tests, which in quantitative finance, you see a lot of like momentum being at the top based on that single back test. And this is why the industry, I think, gravitates towards that is a preference, advisors and whatnot. Whereas, and this is based on a single factor of momentum back test that is then used, right? Rather than using multiple approaches momentum implementing other factors, knowing that these things work over time, but not all the time. And really creating an ensemble approach to your alternative sleeve right so if you will, right. So, if you like a manager and you let and this other manager is trying to do the similar thing that you do but you don’t know the granularity to which each one of these is doing what, you might want to use them both and just stick with them for good or bad, right?

But it also goes — it’s very interesting when you think about everything what we just discussed, how the consultant industry works and what they actually garner from the monthly data that they extract from managers, right? It almost seems like a silly exercise to focus on any data, rather than trying to focus on the process the manager goes through. I mean it’s a really tough thing. Obviously, we have been — we need — we have all these databases and these databases lead to the consultants and consultants pick based on what they’re looking at from a 100 observations or you know maybe 200 observations and then they start talking to you about the process versus the other way around. I don’t have a solution for this problem. I think it’s going to be very difficult to find a consultant merely picks a manager based on what they hear. But it is — it’s almost shocking how useless that is.

**Adam:02:01:06**Yeah, it’s completely it’s for the most part random number generator, right. We can demonstrate that empirically but we also know it you know even theoretically. And you know, if an allocator is not asking a quantitative manager especially, what is your experimental design, how do you think about in sample, out of sample, are you using leave one out versus K fold, what are the relative benefits, what are your — what is your hyper-parameter space, how are you accounting for that hyper-parameter space in your expectation cone? Like, all of these — these are the questions that matter to quantitative manager selection. And I just think that the vast majority of consultants or allocators out there just don’t have a deep enough understanding of the quantitative space to ask the right questions and get at the most meaningful and salient decision factors to allow them to add value.

I mean it’s — and it’s a hard job anyways. As an allocator you’re at the second derivative from the actual manager in terms of understanding the process. The manager doesn’t want to share the things that he or she feels are most valuable and differentiated about their approach, you know. It’s a hard problem. You know, I have sympathy for people at each stage in the allocation process.

**Rodrigo:02:02:36**Yeah. Yeah, I mean it’s an evolution. I think we’re going to eventually get there. We have spoken to a few consultants that do take that approach, they put a lot of weight on the process and understanding the process, over the outcome. But it is, you know, you’re still as a consultant going to be fired if that manager doesn’t perform for three years versus their peers right, at the end of the day. So, it’s a tough job but if we can get more and more people understanding this area and how data science works and how to understand back testing. Because as you roll through the people that are systematic quantitative rules driven and you speak to them, you realize what percentage of people don’t know how data mined and overfit their own back tests are.

And it’s not malicious, it simply is a lack of background. These are traders that started trading using fundamental analysis and they realized that was kind of tough so they did technical analysis that led them to quantitative investing and using programming in order to gather the rules. They have no background in information sciences to help them understand how decisions they’re making based on a single back test isn’t going to be useful in their live trading. And yet we continue to be asked to share back tests with institutions without being asked how we put together that back test. So, if that’s the playground, it’s just riddled with noise. And — …

**Adam:02:04:10**Yeah it’s also you know you imagine software engineers who are obviously excellent programmers, right, and are able to program up back tests really well. But they’re not steeped in the scientific method or experimental design. Engineers who obviously are steeped in the scientific method, but most of the experiments are causal, right. Like, if you’re measuring shear forces or all these types of things that you use to build a bridge, there are mechanical relationships, right, and the signal to noise ratio in decision making is extremely high.

So, finance is this very strange field that just having a strong background in quantitative methods can be helpful but it’s insufficient. Having a strong background in software can be very helpful but it’s insufficient. Having a strong background at data science is helpful but it’s insufficient. It requires a wide overlap and across a wide variety of domains in order to fully understand the problem, right? And it’s a lifelong journey.

**Rodrigo:02:05:23**You remember 10 years ago we sat down with an asset management firm that had a well-respected professor in the data sciences that had helped them put together a model. And they wanted us to review and we did and we realized that he had no background in this non-stationarity of financial markets. He approached the problem like it was a hard science, like it was geology. And we articulated that this may be a problem in the future and indeed it became a very large problem within months. And that’s just an example of somebody that’s well-respected, well-educated, you know, top guy, working for parts of Google and still not understanding the problem fully, right.

So, this particular financial area is just riddled with people with specialties in these different areas and the true values and kind of put them all together and identify — and recognizing the humility that you need to approach this space with, right. So, you know, this is — I have — I can’t believe I’m going to say this. But I have a lot of sympathy for the regulators trying to push back on back testing. But there is — you know, there might be some ways to help the industry standardize and articulate how it should be standardized so that we can actually get more accurate views on what we’re looking at, and what indices are being put out there that makes sense.

**Adam:02:06:50**Yeah. I don’t know if it was Rudyard Kipling, or who — I forget who said it but *make every effort to not fool yourself and you’re the easiest person to fool*.

**Andrew:02:07:00**It was Feynman, I believe.

**Rodrigo:02:07:01**Feynman. Richard Feynman.

**Adam:02:07:02**Feynman, there you go.

**Rodrigo:02:07:03**Yeah, that’s right.

**Adam:02:07:05**I think that’s a good place to leave it unless, Andrew, did you have anything else you wanted to — we’re two hours in. Did you have anything else you really wanted to put a pin in here today or?

**Andrew:02:07:014**No, this has been great. You know, we didn’t get to talk all about the neural networks but that’s going to be like at least another hour or more, so we’ll have to leave that for another time.

**Rodrigo:02:07:25**I like that. I like that. We’ll bring it back to the neural network. It’s beautiful. All right. Well, thank you Andrew for your time, Dr. Andrew Butler. And …

**Adam:02:07:33**Congratulations on a successful defense.

**Andrew:02:07:36**Yeah, thank you.

**Adam:02:07:37**And very, very well earned.

**Rodrigo:02:07:39**Where can people find your work? I guess we’ll put it on the show notes as well. But is there an easy place to kind of find the — your thesis and other work that you — …

**Andrew:02:07:48**Yeah, I have a website. We’ll post it in the notes. It’s a GitHub website, you know, ButlerA.GitHub.io. But we’ll put it in the show notes.

**Rodrigo:02:08:01**Excellent. Okay. Well, thanks — …

**Adam:02:08:03**And they can find your paper on Archive, Andrew, or your thesis or your papers or?

**Andrew:02:08:09**All of the papers will be linked on that site.

**Adam:02:08:13**Perfect. All right.

**Rodrigo:02:08:14**Excellent. All right. Thanks all.

*ReSolve Global refers to ReSolve Asset Management SEZC (Cayman) which is registered with the Commodity Futures Trading Commission as a commodity trading advisor and commodity pool operator. This registration is administered through the National Futures Association (“NFA”). Further, ReSolve Global is a registered person with the Cayman Islands Monetary Authority.