Accuracy and the fundamentals of lending

A first principles approach explaining why accuracy, and therefore machine learning, is so important in lending.

Paul Gu

Upstart News & Insights,

Preface

A few weeks ago, Dave wrote a blog post about Upstart’s important truth. He emphasized the belief that machine learning will entirely reshape the credit industry in the next 10 years. I want to elaborate on this from a first principles approach and explain just why accuracy, and therefore machine learning, is so important.

This document is meant to be descriptive and prescriptive, but not necessarily persuasive. In other words, I’m writing down the conclusions I’ve reached, but won’t be able to fit all of the data and reasoning for them here. The evidence does exist (I promise!), but compiling it all here would take too long and make the document too long.

Core Beliefs

The following core beliefs describe businesses that are primarily based on the evaluation of consumer risk. Lending is the best example (and within it, personal loans the best starting point), and insurance is a close second. Most of my writing will focus on lending, but I believe it applies to any such business.

  1. A loan is basically a single equation
  2. There is a lot of inaccuracy to be reduced
  3. Reducing inaccuracy is good
  4. Reducing inaccuracy is constrained by ease for the user

Core Belief #1 – a loan is basically a simple equation

Loans are weirdly simple. Unlike other products, the main feature of a loan is its price tag. More precisely, the product is money now, and the price tag is the extra money (interest) the customer has to pay back later.

This feature simplicity makes lending commodity-like and highly competitive. Money is money, and company A’s money is identical to company B’s money. This means that contrary to the popular business advice of “don’t compete on price”, loans as a product compete on just that, price.

Businesses that compete on primarily price aren’t inherently undifferentiated or doomed to unprofitability. Rather, they have to differentiate on the cost side of the ledger rather than the value side of the ledger. Cost side advantages can be insanely powerful and have created some of the biggest businesses in the world. Think Amazon, Walmart, Southwest, etc.

So fundamentally, a loan is its price, which is its cost (plus some profit margin). And the cost of a loan is the sum of (1) cost of capital, (2) defaults, (3) operating expenses.

Loan = APR = cost_of_capital + defaults + op_ex

In practice, cost_of_capital is just whatever the capital markets charge us to get access to money to loan out. Doesn’t matter if it’s VC money expecting a certain return, our own past profits reinvested with the expectation of a certain return, loan buyers expecting a certain return, etc. In theory, however, cost_of_capital can be understood as a risk free rate (sometimes proxied by the cost for the US government to borrow) plus a risk premium (investors want higher expected return to take on higher risk) for the actual risk (aka defaults).

Loan = APR = risk_free_rate + actual_risk + risk_premium + op_ex

Side notes:

  • You may have noticed that I tended to ignore the profit margin in this section. That is because net value (value of the product minus its cost) is the most important thing. The more net value there is, the more there is to go around between us and our customers. While there is certainly an art and science to setting the right profit margin to fulfill needs in the short-run and maximize profits in the long-run, I tend to be focused on maximizing net value creation.
  • I also tend to be less focused on whether we, Upstart, or our loan buyers are capturing value. Instead, most of the time we should simply think about total platform value. When there is a lot of total platform value, there is a lot to go around between us and our investors. And in the long run, the capital markets will set a price for money (the required investor return), we will have to achieve that for them, and the rest will go to us.

total_platform_value = investor_return + upstart_profit

Core Belief #2 – a lot of inaccuracy

Lending is one of the oldest industries, but somehow there’s still a lot of room for improvement. The core activity of lending is underwriting – predicting if a potential borrower would pay you back. And it turns out that lenders guess wrong all the time. The majority of applicants rejected by most lenders would have paid them back, and a nontrivial fraction of applicants approved by most lenders don’t actually pay them back. The fact that loans often/always have interest rates higher than the risk-free rate is itself proof of how much inaccuracy exists.

Going back to core belief #1, this means that the cost of risk (“defaults” or “actual_risk + risk_premium” ) in the fundamental loan equation is one that can be reduced a lot! This is central to the business opportunity we’re going after.

This inaccuracy isn’t a one-time-shot. It’s not like there is a single undiscovered truth about the world, we discovered it, and now we’re going to milk it all we can. Rather, the inaccuracies are many and complex, almost bottomless. The opportunity is to create a machine that continuously drills deeper into the truth in order to reveal more and more undervalued (and therefore overpriced) borrowers.

Side notes:

  • There is an ongoing debate internally about the limits of accuracy. Is perfect or close-to-perfect prediction possible? Maybe perfection should be understood as perfection holding macroeconomics constant. Maybe quantum effects doom us, or maybe physicists just haven’t figured out how to model them deterministically. In any event, the core belief here is that accuracy can still be improved a lot.

Core Belief #3 – accuracy is good

Accuracy sometimes seems like a mathy thing that mostly data scientists care about. It can feel zero-sum, or morally and/or financially neutral, just shifting the order of people’s rankings around without really creating anything extra in the world.

That impression is wrong (at least in our business). If you knew exactly who would pay you back and who wouldn’t, you’d lend to only the former and charge them the risk-free rate. In fact, every cent of excess interest everyone pays (mostly the good borrowers) goes towards covering the actual risk (defaults) and risk premium created by the borrowers who default. This sucks.

Even the defaulters are arguably harmed – they spent money they couldn’t afford and now could be facing bankruptcy. Even if you think defaults are good for defaulting borrowers (yay free money), they are at best a strange form of charity that should probably get separated from the activity of lending.

Furthermore, it turns out that there are a lot more actually-good borrowers than defaulting borrowers. This means that in practice, accuracy boosts tend to increase approval rates and access to credit.

So, the point is, accuracy saves borrowers money. In business terms, that means we lower the cost of risk in the loan equation, and have more total economic value.

Are there downsides to accuracy? If properly managed, the answer is no. Accuracy in lending should be strictly good, i.e. there are various ways to distribute the gains from accuracy gains, and no free lunches from accuracy losses. “Properly managing” accuracy, however, is no trivial task, and itself requires significant investment . This is because the constant pursuit of accuracy reduces predictability and consistency for the rest of the business. For example, marketing sees it in the difficulty of targeting the right applicants, and the overall business sees it in occasional fluctuations in conversion rates.

Side notes:

  • What about fairness? There are important questions about this both legally and morally. People understandably worry about the risk of discrimination, especially along racial dimensions. Like before, the popular intuition is backwards on this issue, at least in our business. This is because less accurate decisioning systems by definition overgeneralize people. Overgeneralizations, in addition to being sort of definitionally unfair, tend to be highly correlated with the kinds of unfairness our society is most worried about. For example, in the pre-data / pre-FICO score days, a lender would have to fall back on underwriting systems like “lend to everyone who lives in this neighborhood”. At Upstart, we’ve done a lot of work to quantify and study this effect in our own business, and observe a similar pattern – on balance, accuracy tends to improve fairness. We’ll share more on this topic in a future post.

Core Belief #4 – ease matters

In addition to price, potential borrowers are also sensitive to effort. If it is too hard to get a loan, many applicants will simply drop out of the flow. Not surprisingly, people are more sensitive to effort than they probably rationally should be (one back of envelope estimate was that our users implicitly value their time at $300/hour).

Because accuracy requires collection and verification of data from the user, there is a trade-off curve between accuracy and effort, and therefore, between price and effort. This is true both within lenders and between lenders. For example, in-branch lenders are sometimes quite good at lowering default risk, but do so at the cost of substantial friction for the user (having to go in, show your face, talk to a person, etc).

This trade-off curve is a pretty good model for the business. Our growth is primarily driven by conversion rate improvements, which are either the result of better rates or lower effort. Hence our mission statement – “Effortless credit based on true risk”! Our objective is to improve our product and capabilities such that we can upshift the whole curve.

machine learning

Side notes:

  • Lowering effort is an area where we must be especially careful to avoid deceiving ourselves about the existence of a free lunch. Many possible effort-reducing ideas undermine accuracy, but sometimes in a way that is initially hidden. This is why we favor initiatives to lower effort that fundamentally give us new information (e.g. a new data service, or models that derive more insight from existing data services) rather than merely changes to our verification policies.
  • That ease matters may seem to contradict my earlier claim that a loan product’s key feature is its price. It doesn’t. Ease is a feature of how one gets the product (e.g. 2-day shipping), rather than of the product itself (e.g. a pair of Nike shoes). While ease of access can significantly increase the chances you actually buy the product, you still have to want the product itself first!

Conclusion

The path described above is slow and difficult. The feedback cycle in lending is long, meaning irrational behavior or inaccuracy can persist for 12, 18 months before competitors have to deal with the consequences. But Upstart has a unique blend of people, resources, and culture that makes us good at solving for the above opportunities. Among those –

  • We have an independent data science team with a focus on accuracy rather than making short-term business numbers.
  • We think in financially foundational terms, i.e. economics > accounting.
  • We hail from the nerds-first culture of Google. We recognize the importance of building a company (tech, process, people) to maximize engineering and data science efficiency.
  • We understand and are okay with variance and uncertainty. Something that didn’t end up working out (ex-post) may still have been the right bet to make (ex-ante). Beliefs are probabilistic and we update them regularly in response to new information.
  • If successfully executed, this focus on improving accuracy will allow us to dominate every market where the accuracy-ease tradeoff is important. For the vast majority of people, Upstart will be the place to go for the most affordable (and still easy) credit.