This is part of my 5 Minute Concepts series, which is designed to help you understand fundamental concepts about subjects like learning, memory and competition in the shortest time possible. Each episode is available in video format on my YouTube channel and audio via my podcast. If you prefer to read, the transcript is below.
Want to know when new content shows up? Sign up for my newsletter here.
I’ve written about the exploration-exploitation dilemma before, but only in a long-form essay format. Since I think this is such a critical concept, and I realize that not everyone has the time to read a big essay, I’ve created this simplified explanation.
Just a warning: like any other 5 Minute Concepts piece, there’s always more to the story. I’m just trying to give you the most important parts in 5 minutes or less.
Let’s start with a stripped-down definition: the exploration-exploitation dilemma is the choice we all have to make between learning more or taking action with the knowledge we already possess.
Learning more is exploration, acting with current knowledge is exploitation.
With either action you’re trying to find some way to maximize what’s often referred to as “reward,” or some end-state that you find desirable.
The reason this is a dilemma is simple: you can’t explore or exploit exclusively and win in the long run.
If all you do is explore, you’ll never take action in the world — which means you get a predictable payoff of exactly zero. There isn’t much to be gained from passively gathering information until you die.
On the other hand, taking action without learning anything is also a long-term losing strategy. You do get some kind of reward by exploiting a known path, but that means you’re giving up any chance at a higher payoff that might be staring you in the face without you knowing about it.
The real kicker here is that exploring is what drives the value of exploitation, and vice versa. You need to explore in order to find good paths for exploitation, and you need to exploit in order to get a reward for your exploration. Both actions are dependent on each other.
What you’re balancing in either case is opportunity cost. You have a limited amount of resources, such as time and money, to work with over the course of your life. If you explore, you’re by default not exploiting, and vice versa.
Consider this example: Let’s say you’re scrolling through Netflix, looking for something to watch for the new couple of hours.
You notice that a movie you’ve seen a dozen times is one of the choices and consider watching it. Right next to that is a movie you’ve never seen before.
Choosing the movie you’ve seen provides a specific emotional payoff for you. You know all the best parts and you’re well aware of how the entire experience will make you feel.
Choosing the movie you’ve never seen means taking a certain amount of risk. There’s an unknown payoff for watching this new movie, and it might end up being a waste of two hours. Those two hours will be gone, never to return.
But you might also discover a new favorite movie, or genre, or director, that you never knew about.
It’s easy to get sucked into either extreme. I’ve known people who spent their whole lives reading, accumulating a veritable library worth of knowledge in their head, but never tried to do anything with it.
And, of course, I’m sure we both know people who have never read a book or stopped to think for even a moment about whether their beliefs and actions should be altered in some way.
While this is an unsolved problem (and trust me, many people have tried to figure it out), there are some good rules of thumb to run with. First of all, don’t favor a binary approach. Only exploring or only exploiting doesn’t work in the long run.
Secondly, it pays to spend a lot of time exploring early on and then shifting more and more to exploitation over time. But — this is critical — you never stop exploring completely. For a person in the real world, exploring should always be part of your strategy.
There’s always some accommodation made for learning new things. This is known as the epsilon-decreasing algorithm, and, if you just want a simple heuristic for managing this dilemma, it’s a pretty good place to start.
Third, there are always inflection points where it makes sense to shift from one to the other. Sometimes it’s a moment where you realize you’ve finally reached a level of knowledge that grants you a new level of competence and the time to utilize it has come. Passing a professional exam is a simple example of this.
Other times you might suffer a bitter defeat and receive an unfiltered signal that it’s time to explore. If a big project you’ve been working on fails, for example, you might need to go back to the drawing board and evaluate how to improve for your next attempt.
I could talk about this for hours, but in general I want you to understand this: figuring out how to spread your time between exploration and exploitation is perhaps the most important problem you’ll ever face.
Don’t push this into the background — be conscious and deliberate about it. Doing that might just change your life in ways you never saw coming.