Impact above Replacement

2022-11-30 • 16 min read • comment via EAF

img

Summary #

There are at least four ways of thinking about replaceability:

  1. The naive view, where your impact is the value you produced using some amount of resources.
  2. The single comparison view, where your impact is the value you produced using some amount of resources, minus the value someone else would’ve produced using those resources had you not done so.
  3. The replacement view, where your impact is the value you produced using some amount of resources, minus the value the “replacement-level person” of that reference group would’ve produced using those resources. The replacement-level person is the person who’s only barely talented enough to enter a field.
  4. The God view, where your impact is the total utility in the world where you do the thing, minus the total utility in the world where you don’t do the thing. I assume for the purposes of this post that this is the normatively correct view, in order to have a benchmark.

Key takeaways:

What Replaceability Is #

You want to do some good in the world. But to do good you need resources, and some of those will be in limited supply. If you use scarce resources, that means someone else won’t. Now they can’t do as much good as they could’ve. So how should you decide which scarce resources to use, or whether to use them at all?

Replaceability is a partial answer to this. Replaceability is usually taken to mean something like “the extent to which someone else would do what you’d do in a job if you don’t take the job”. I interpret it more generally, as “the extent to which other people would’ve produced similar value with a scarce resource had you not used it”.[1]

The paradigmatic example is choosing a career: if you’re thinking of becoming a doctor to save hundreds of lives, you should perhaps keep in mind that someone else might’ve saved those lives had you chosen a different career. If so, the doctor profession is highly replaceable. But this is just one of many situations where replaceability may be a factor:

Decision Scarce resource
Choosing a job Salary, opportunities for direct impact and support from employer and colleagues
Asking someone to be a mentor Mentor's time and energy
Applying for a grant Grant, grantmaker's time and grantmaker's connections
Marketing a resource to the community Prestige and community attention

Four Views on Impact #

We can look at impact in at least four different ways:

  1. The naive view. Your impact is the value you produced directly and indirectly using some amount of resources.
    • This view, which I think is common outside effective altruism, is incorrect because it fails to consider the counterfactual.
  2. The single comparison view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value someone else would’ve produced using those same resources had you not done so.
    • This view, which I think is common within effective altruism, is incorrect because it fails to consider trickle-down effects. That is, the person who replaces you would’ve been replaced by someone less able, who in turn would’ve been replaced by someone less able, and so on. Each of these steps may involve an additional loss of impact.
  3. The replacement view. Your impact is the value you produced directly and indirectly using some amount of resources, minus the value the “replacement-level person” of that reference group would’ve produced using those same resources.
    • This is an effective altruist variant of what sabermetricians call Wins Above Replacement, a stat that aims to measure how many more wins a baseball player contributes to a team than the replacement-level player. The replacement-level player is the substitute who would’ve been called up to join the team had the player being measured not participated.
    • The replacement view seems confused, at least as a parallel to sports, because in sports it’s designed to compare people, not actions.
    • Benjamin Todd proposes something similar: that your impact is the value you produce, minus the value someone else would’ve produced in your position, plus the value of any additionally freed-up resources. But it seems hard to me to estimate the value of those freed-up resources – it feels like punting.
  4. The God view. Your impact is the value produced by everyone (including you) in the world where you use some amount of resources, minus the value that would’ve been produced by everyone in the world where you didn’t use those resources. (Because God is omniscient.)
    • This view is perhaps normatively correct but requires perfect information and computing power.

Benjamin Todd, Paul Christiano and others have thought and written about replaceability, but I think it’s fair to say no one has reached any definitive conclusion: it’s a hard problem. Todd recommends focusing on other, more robustly predictable factors, like personal fit and scale and solvability, when choosing a career. In fact, there’s been remarkably little discussion about replaceability in the past few years, I think partly because people have realised that replaceability differs less across career options than those other things (personal fit, etc.), and partly because people have tactically retreated from a difficult problem.

I think this is somewhat unfortunate, as replaceability seems to me to be decision-relevant and somewhat tractable, even if not as important as the problems Todd and Christiano have moved on to.

Why Does Replaceability Matter? #

There are ~1M licensed physicians in the US alone, which is ~2 orders of magnitude more than the number of effective altruists globally. I’d expect there to be a difference of ≥1 order of magnitude in the number of people working on different problems within effective altruism too. If the number of people working in a field affects replaceability in a way that matters when estimating counterfactual impact, it seems important to know how it matters.

To imagine a rather extreme scenario, I think it’s easy for someone who’s choosing between two jobs, one in a field with lots of people and one that basically only they can do, to have heard of replaceability and think, “Though this thing that only I can do seems less important than this thing that many other people are doing, if I don’t work on the thing that many people are doing, someone else nearly as good as me would take my place, so instead I should work on this thing that only I can do.” But if something like the replacement view is a better representation of the truth than the single comparison view, this person may be making a grievous error!

Sports seems somewhat analogous here. Sports teams are also making decisions about how to allocate scarce resources (e.g. playing time), and they do so by explicitly considering replaceability. (It’s easier for them; they have good measures of how teams and players perform.)

For example, suppose a hockey team is deciding whether to sign a star left winger or equally talented star right winger. Suppose currently its top left winger is near star calibre, whereas its top right winger is middling. Suppose its worst left winger is so bad as to be a liability, whereas its worst right winger is pretty good. If the team only compared each potential addition to the player whose spot they’d take, they’d go with the star right winger (who’s much better than the currently best right winger on the team, who’s middling). But it may be better for the team to sign the star left winger (to get rid of the marginal left winger, who’s a liability).[2]

I think it’s currently unclear how we are looking or should look at replaceability. My impression is that the single comparison view was circulated once long ago and that, ever since its flaws became apparent, there’s been something of a vacuum. With what do people fill this vacuum? I have no idea. But we make decisions, so it must be something.

Simulations #

I’ve run Monte Carlo simulations with the aim of seeing how these views perform in four simple, idealised scenarios. The scenarios pit two fields (as in, spheres of activity) against each other. (For more on how the simulations were run, see the Appendix.) The percentages signify how often you’d choose Field A if you took a given view:

# Field A Field B Naive Single Comparison Replacement God
1 1K ppl (medium talent), 100 jobs 1K ppl (medium talent), 100 jobs 49.9% 50.7% 50.0% 50.1%
2 1K ppl (high talent), 100 jobs 1K ppl (medium talent), 100 jobs 69.6% 62.3% 61.8% 62.4%
3 1K ppl (medium talent), 100 jobs 100 ppl (medium talent), 10 jobs 50.7% 26.0% 53.4% 49.4%
4 1K ppl (medium talent), 100 jobs 200 ppl (medium talent), 100 jobs 67.9% 55.8% 56.2% 56.5%

I interpret the results thusly:

  1. The fields are identical so the decision is a toss-up. Indeed, all four views choose Field A ~50% of the time.
  2. Field A has more talented people than Field B so we should pick it more often. That’s because you’re drawn randomly from each field’s talent pool, so you’ll usually end up being more talented in Field A than in Field B.
    • Most views capture this pretty well (using the God view as our ground truth), ending up at 62%.
      • But the naive view is kind of off at 70%. It’s too optimistic about higher-talent fields when jobs are scarce. I think that’s because it ignores trickle-down effects. This doesn’t matter when everyone has a job – in that case, the naive view is nearly identical to the replacement view. But when jobs are scarce there’s a difference between these, and the difference grows as the overall talent of the field grows.
  3. People are equally talented in both fields, but Field B has 10% as many people and jobs as Field A.
    • This is also a toss-up. The naive (51%) and replacement (53%) views do well here with the God view (49%) as ground truth.
      • But the single comparison view does poorly (26%). It’s too optimistic about fields with fewer people since the distance to the next most talented person will tend to be larger there. But this distance doesn’t matter much if there’s a trickle-down effect.
  4. People are equally talented in both fields, but Field B has fewer people relative to its number of jobs (2x) than does Field A (10x).
    • This isn’t a toss-up – we should choose Field A 57% of the time. That’s because we stipulated that you’re always in a position to choose between the two fields, meaning, because there are fewer people for each job in Field B, a randomly chosen person who gets a job there is less talented than a randomly chosen person who gets a job in Field A (which is more competitive). So we’ll tend to pick Field A more.
      • The single comparison (56%) and replacement (56%) views do well here.
        • But the naive view does poorly (68%). It’s too pessimistic about fields where there’s less competition in the labour market, though I’m not sure why.

Like any model, this one rests on a number of simplifying assumptions. None of these results are guaranteed to hold outside the model’s world. Still …

The Replacement View Seems Useful, Maybe #

Though I don’t feel sure enough to actually make recommendations to people faced with career decisions, I’m tentatively bullish on the replacement view. Arguments for:

Arguments against:

Replaceability is also more complicated than this post makes it out to be. For example, in the real world:

The problem that replaceability is meant to address is a pretty rare one. There doesn’t seem to be much research on it. There aren’t that many situations where people (1) pursue the same goal[5], (2) don’t usually coordinate, (3) use shared scarce resources with substantial supply and demand and (4) are able to use those resources to varying degrees of efficiency. But effective altruists are in this rare situation.

Appendix: Monte Carlo Simulations #

Here is the procedure I used to simulate career choices based on the four replaceability views:

  1. For two different fields, Field A and Field B, generate N people with different Talent Levels, and M jobs with different “Effort Multipliers”.[6] Assume these are lognormally distributed.[7]
    • Effort Multipliers represent the fact that some jobs allow a person to get more work done towards solving a problem than other jobs, e.g. by providing more opportunities or better support.
    • We can describe the effort added by a person working a job with the formula Effort = Talent Level × Effort Multiplier. So the total effort of people working on a problem (corresponding to the God view) is given by the sum of Talent Level × Effort Multiplier for all person-job pairs in both fields. (We assume people who don’t get a job produce zero effort.)
  2. For each field, assign the most talented people to the jobs with the highest Effort Multipliers, one at a time, until there are no more available jobs (or people).
  3. Select a person at random from each field, excluding those people who didn’t get a job.[8] This is you and your talent at each thing. You’re now going to decide which field to work in.
  4. For each of the four views, calculate the value of choosing Field A, and the value of choosing Field B.
    • We get a person’s naive effort in a field by multiplying their Talent Level in that field by the Effort Multiplier of the job they’d get, i.e. Your Effort = Your Talent Level × Your Effort Multiplier.
    • Your effort in each field according to the naive view is simply Your Effort.
    • Your effort in each field according to the single comparison view is Your Effort - The Person Who’d Replace You’s Effort.
    • Your effort in each field according to the replacement view is Your Effort - The Replacement-Level Person’s Effort. Remember, the replacement-level person is the person who’d barely get a job in the field.
  5. Repeat steps (1) to (4) 10,000 times for each scenario.
  6. Estimate P(choose Field A) for each of the views.

References #

Page, Scott E. 2018. The Model Thinker: What You Need to Know to Make Data Work for You. Basic Books.
Pearl, Judea, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer. John Wiley & Sons.

Footnotes #

  1. Replaceability is different from counterfactuals. Pearl, Glymour, and Jewell (2016) describes a counterfactual as “an ‘if’ statement in which the ‘if’ portion is untrue or unrealized”. This involves tallying up all the ways a thing would’ve gone differently. Replaceability is a special kind of counterfactual reasoning, dealing only with the use (or non-use) of a scarce resource. ↩︎

  2. True, ice time makes this a more subtle calculation. Signing the star left winger means the near-star-calibre left winger gets pushed down to the second line, meaning their (considerable) impact is reduced. But I think it serves as an example of these kinds of considerations mattering in practice. ↩︎

  3. I frame it as “how much effort you add”, not “how much impact you have”, because impact also depends on other things, in particular the problem areas’ relative scale (defined, after 80,000 Hours, as Good Done ÷ % of Problem Solved) and solvability (% of Problem Solved ÷ % Increase in Effort). Focusing on effort alone is cleaner as we can bracket those other concepts. As far as this post is concerned, all problems have the same scale and solvability.

    NB. “Increase in Effort” is called by 80,000 Hours “Increase in Resources”, but since I’m already using the word “resource” to refer to labour, time and money, I’m calling it “Increase in Effort” instead. ↩︎

  4. Some posts, like this one, point to income and researcher citation count as evidence of this (emphasis mine): “If job performance is like income, or the number of citations people have on academic papers, it is more like a log normal distribution[.] That is, most aspiring academics have few citations, while some have thousands, tens of thousands, or even hundreds of thousands. […] We’re very unsure about this question, and would like to see more research into it. Some evidence we’ve seen suggests that output is normally distributed even in ‘complex’ jobs, like being a doctor. However, for the most difficult and creative work, like academic research, we suspect that the variance is high in the tails. Even there, it’s hard to be confident since many measures of output (such as citation count) are likely to overstate differences in productivity.”

    As alluded to in the quoted passage, I think income and citation count aren’t good evidence. Even if talent is normally distributed (i.e. follows a bell curve), salary and citations could well have heavy tails due to nonlinear effects later in the causal chain. The Matthew effect – where having an advantage gets you further advantages – applies here too, as well-cited papers are more likely to get further citations independently of quality, and richer people are more likely to get more money regardless of talent. ↩︎

  5. Benjamin Todd calls this a “shared aim community”. ↩︎

  6. This model implicitly takes neglectedness and personal fit into account – neglectedness (and importance) is captured by a job’s impact level, and personal fit is captured by a person’s talent level. ↩︎

  7. Page (2018) writes: “In some cases, we may know the mean of the distribution and also know that all values must be positive. Given those constraints, the maximal entropy distribution must have a long tail, and as we spread the distribution across more values, we must balance high values with many low-value outcomes.”

    I don’t know the means of these talent distributions, but it does seem likely to me that (a) that talent can’t be negative and (b) the distance between the average talent and zero talent is smaller than the distance between the average talent and the greatest talent. That seems like a pretty good justification for a heavy-tailed distribution. ↩︎

  8. Note that this means that, if the number of people exceeds the number of jobs, you’ll tend to have an above average talent level. If there are 10x as many people as jobs, for example, you’re randomly selected from the 90th percentile. ↩︎