<template>
    <div>

		<div class="title">
			<p class="main">Numbercrunch by Professor Oliver Johnson</p>
			<div class="subheadings">
				<p class="published-date">21st March 2024</p>
			</div>
		</div>
		<div class="article">
			<p>I first came across Oliver Johnson, the author of Numbercrunch, on Covid Twitter. Covid Twitter was a kind of brilliant place where people of varying disciplines, suddenly all confronted with the same problem, collaborated on explaining what was happening. From case rates to how RNA vaccines worked, I learned a lot from Covid Twitter and found it a helpful space compared to the h̵̝̐͊́͜e̴̦̠̘͋̔̚l̸͍͉͉͐͘͠l̴̺͕͛̒̀s̸̻͙͇̿̾̒c̵̡̙͕͊͛͝a̴̟͊͘͜͠p̵͎̟̈́̔e̵̠̙̺͑͐͊ that is modern Twitter.</p>

			<p>Numbercrunch for me forms my Annual Re-Learn Of Basic Statistics. Last year David Speigelhalter did the honours and this year it’s Professor Johnson. Both are great science communicators with a knack for explaining probabilities, exponentials, and things like game theory in a way that is digestible and (if you’re like me) very readable. The structure of Numbercrunch, though, is more like Four Ways of Thinking from last week than books I’ve read on a similar theme in previous years. Johnson splits the book into three parts: Structure, Randomness, and Information. Claude Shannon makes repeated appearances which somewhat reinforces the sense of déjà vu that I had while reading.</p>

			<p>I’m loath to explain several of the concepts that Johnson runs through in the book but I do want to touch on a few before drawing the conclusion that I feel are important. This will be brief and powerfully non-mathsy. The first is around something we’re probably all familiar with - finding a line that explains several dots on a graph. Referred to as The Line Of Best Fit, it can be tempting to draw a straight line that gets as close as possible to all the data points. Often this is helpful, but Johnson urges caution. The first thing to note is that the scale is important. If all your points sit between 100 and 110, then starting your graph’s Y axis at 90 and ending it at 120 will make a more dramatic line make sense than if you start the Y axis at 0. Zeroing the Y axis isn’t always the best thing to do, but the point made is to be careful of overfitting analysis to data points. You can prove anything with facts after all.</p>

			<p>The second concept worth thinking about is Bayes’ Theorem. Every time I read about Bayes’ Theorem it’s like I’m finding out about it for the first time. I read it, I understand it, I internalise it, and the next day it’s gone. My hope is that by trying to explain it myself I’ll retain it because it is so unbelievably useful. Johnson uses the example of Covid tests to explain it: If you have a Covid test with 98% sensitivity, what are the chances that someone who tests positive actually is infected. To know this, we have to consider the overall infection rate in the population. Let’s say it’s about 10%. Knowing these two things (10% of people have Covid, and this test will find 98% of cases) we can work out that the chances of that person having Covid is about 85%, if I got my maths right. Which is unlikely. Another way of thinking about this is to think about set sizes. Imagine a person who is very bookish, shy and retired, and likes order and structure in their life. Is this person more likely to be a librarian or a farmer? The answer, somewhat counterintuitively, is farmer. For the simple reason that there are way more farmers than librarians. Bayes’ helps us to check for things that are conditional.</p>

			<p>In what is becoming a pattern for me this year, this book also focussed a lot on the work of Claude Shannon and information theory. I won’t rehash what I talked about in the last review, but Johnson gives a compelling overview of why Shannon’s work is so helpful when thinking about how we get information, and how ‘additions to the total sum of human knowledge’ (to coin a phrase) are developed.</p>

			<p>One of the more interesting things that comes a lot in this kind of reading for me is the idea that uncommon things are actually pretty common. Bayes’ tells us this by explaining how priors can affect the odds of something dramatically, and in ways that are counter intuitive. A great example of this is the birthday problem. That problem says that with 75 people, you approach (but don’t quite reach) 100% likelihood that there is a shared birthday among them. This goes against our basic intuition about the question, right? But the certainty comes from the number of pairs involved. We can get this by taking n number of people and saying how many ways can they be paired. The answer is that we have ( n x n-1) ÷ 2 possible pairs. For 75 people, this is 2775. The chances of one of those pairs being of two identical calendar days is astronomically high. All of this is to say, coincidences are more common than you think.</p>

			<p>This takes us nicely into the more unusual part of Johnson’s book and why I wanted to read it. His commentary on Covid during the pandemic was invaluable to me in staying sane during an insane time. I think of myself as quite maths-y generally but the numbers involved were so immense and so hard to put into context that I found his updates extremely valuable. A good example of this is his section here on comparisons between different countries on various covid metrics. It seems obvious in hindsight that comparing case numbers is meaningless. Putting to one side different testing capacity leading to higher or lower numbers based primarily on health service maturity, there were also variations in days of the week. In the UK, Monday was typically lower and Tuesday was typically higher because we didn’t process as much on weekends. This holds true for most countries - until you realise that not every country agrees on when the weekend is, or has the same 5-2 week structure as western countries.</p>

			<p>An example I think about a lot but have real trouble explaining to people is the concept of exponential growth, and in particular doubling times, in the context of the pandemic. Many people during the pandemic were extremely in favour of border closures. I was sceptical of these (and not for the probably expected Liberal Metropolitan Elite Woke Nonsense reason that people may attribute to someone of my background and politics). If you were a country like New Zealand, who could meaningfully prevent ‘seeding events’ with border closures, then they could make sense. If you are a country, like the UK, which is a hub for global travel, they are not only pointless, but can be actively harmful. There is an opportunity cost. Everyone deployed manning borders, guarding hotels, and administering daily tests is someone who isn’t in the community doing vaccines, or community tests. All the money you spend on infrastructure is money not being spent on PPE, or hospital beds. But the sheer maths of it is just overwhelmingly against border closures. Let’s imagine that instead of people and Covid, you have a pond that has some algae on it. There is mould at either end of the pond, it doubles in size every day, and it takes 10 days for the algae to cover the pond. If you instead only had algae at one end, it would cover the pond in 9 days. You may have thought 5. I did, and people often do. But because we are talking about doubling times, on the 8th day, half the pond is already covered. It doubles on day 9 and covers the whole surface. If you investigate a little further, the reasons for decisions become straightforward. That doesn’t mean every decision was correct, but often there is a reason. And it’s helpful to look for it and understand it, even if in retrospect it was wrong, or at the time seems harmful.</p>

			<p>The basic conclusion that Johnson seeks, I think compellingly, to draw, is that maths isn’t as hard as you think it is. Lots of it is, but more often than not the simple stuff is what’s useful. With some rough estimates and basic arithmetic, you can get 90% of the way there. I think that’s a good thing to think about. 90% is really good and gives you a starting point. And if nothing else, it makes you a bit better at pub quiz questions like “How many post boxes are there in Britain?”</p>
		</div>

    </div>
</template>


<script>

	export default {
		name: "NumbercrunchArticle"
	}

</script>


<style>

</style>