Balancing explore v exploit data trade-offs
Should I stay or should I go?
Bees perform the waggle dance to tell other bees where the best source of pollen is. This enables the hive to find food efficiently. Most bees act on this information, but around 15% don’t. Why do some bees go their own way? The answer relates to the long term survival of the hive. Some of the maverick bees find new sources of pollen which they then onward communicate via their own waggle dance to other bees. This is an example of the Explore v Exploit data trade-off. To what extent should we explore new options relative to exploiting what we already know?
Explore v exploit trade-off
Do not follow where the path may lead. Go instead where there is no path and leave a trail – Ralph Waldo Emerson
The Explore v Exploit trade-off challenge is one that arises in many contexts, including for businesses and individuals. Exploring is gathering new information and Exploiting using existing information to optimise an outcome. How do we find the optimal balance between exploring and exploiting?
To illustrate the optimisation challenge, using the bees example:
Explore only: The knowledge of existing pollen sources is not shared with other bees via a waggle dance. Some bees would get lucky and find a pollen source, but it would not be long before the hive starved.
Exploit only: All bees obey the waggle dance. The hive would thrive in the short term. However, the known pollen sources will exhaust at some point. When this happens the hive faces starvation as it is unaware of alternative options.
Between these two extremes lies the optimal solution, i.e. a balance between explore and exploit that optimises the hives chance of survival.
Explore v exploit optimisation
Man can learn nothing except by going from the known to the unknown. – Claude Bernard
There are a number of factors which influence the explore v exploit trade-off. These include: degree of uncertainty, time and resources available, cost of exploration, and the potential rewards of exploitation.
While the optimal solution depends on the situation, there are some general principles we can apply:
Explore more when we have a lot of uncertainty, the cost of exploration is low and the potential rewards high.
Exploit more when we have less uncertainty, the cost of exploration is high and the potential rewards low.
With finite time, we should explore early on so we have the most time to exploit what we learn.
A few specific tactics can help:
A/B testing allows us to see how people respond to different versions of a product.
Diversify by investing in activities with small downside risk and potentially large upside. How to Benefit from Disorder explores this further.
Iterate to enable continual learning and improvement.
Explore v exploit examples
I’m more likely to try a new restaurant when I move to a city than when I’m leaving it. - Chris Stucchio
In business, the Explore v Exploit trade-off can be seen in the decision of whether to focus on existing products and services (exploiting) or to develop new products (exploring). Only focusing on existing products will cause stagnation. Exclusively focusing on developing new products, however, embodies too much risk and beckons failure. Google asked employees to spend 20% of their time working on what they think will most benefit the company. This gave rise to Gmail and other products.
For us as individuals, we also need to decide whether to stick with what we know (exploit) or try new things and step outside our comfort zone (explore). Should we go to the same place on holiday each year or try somewhere new? Go to the same supermarket each week or not? Stick in our jobs or explore something new?
Algorithms to Live By talk by Brian Christian and Tom Griffiths
When to Stop Searching and Choose post by Phil Martin
Life Games to Play, Win and Exit post by Phil Martin
Sometimes it is good to ignore someone’s waggle dance and go our own way. As J R Tolkien put it, Not all those who wander are lost.