A look at the Bitcoin P2P network topology

December 12, 2019

If you read the original whitepaper for Bitcoin, it states that the point of a blockchain is to solve the double-spend problem for a digital currency in the case of a P2P network.

The double-spend problem

For a physical currency, we can track who has money by who holds it in their possession. They can only spend a coin once because they have to give it to someone. But how do we deal with this online? If I have 20 bucks, and I send you 20 bucks, that’s fine. But what if I also sent Bob 20 bucks, or anyone else for that matter? What happens in this situation? Which transaction, if any, is valid? This is known as the double-spend problem.

a centralised authority solution

Usually, we have a bank that records our quantity of money. When we spend money the quantity goes down, not allowing money to be spent if the quantity runs out. This prevents anyone from spending more money than they have, solving the double-spend problem. In practice, if Alice wants to send money to Bob, Alice tells her bank to transfer money to Bob, her bank tells Bob’s bank, and both banks adjust the quantities in their accounts. Note here that this system requires both Alice and Bob to trust both Alice’s bank and Bob’s bank, and for the banks to trust each other. There may also be fees associated with this service.

a decentralised solution

But what if we don’t want to have to introduce trust into the system? We want to not rely on any specific 3rd party, but if someone sends you money how can you trust that the money hasn’t been spent before. The blockchain introduced with Bitcoin, based on proof-of-work, was the first successful solution to this problem. It requires users to have computers that communicate directly with other computers on the network, rather than going through any trusted third party. This is called a peer-to-peer (P2P) network. There are countless resources out there for understanding how Bitcoin works, but a thesis by Itsi Weinstock and this video by Grant Sanderson are excellent ones.

Bitcoin architecture

In the Bitcoin P2P network, your computer isn’t directly linked to every other computer, rather a small group of them. Think of it as a social network where each computer, or node, is connected to only their friends, or peers, who they communicate with. There are around 10,000 nodes on the Bitcoin network. Each node is connected to 8-12 other nodes on average according to a snapshot taken in the seminal Coinscope paper published in 2015 which was the first to study the Bitcoin P2P network topology, which nodes are connected to which.

Built into Bitcoin is a notion of anonymity. When transactions are sent through the network, they refer to money moving between Bitcoin addresses, or wallets. However, when nodes refer to other nodes, they do so through their IP addresses. The idea is that by separating your IP and Bitcoin addresses into two separate things, it’ll be hard to track which wallets belong to which computers. So Bitcoin is ‘anonymous’ because you don’t know which computers are sending and receiving money, only wallets which are abstract and not necessarily linked to the real world. The process of deanonymising Bitcoin is then about correlating Bitcoin addresses with IP addresses, so we know which computers send which transactions.

Finding the topology

In 2014, it was discovered that knowledge of the P2P network topology could be used to deanonymise Bitcoin. This triggered an arms race between academic researchers on one side and Bitcoin Core developers on the other, as to how techniques could be developed to find the topology vs how they could be patched out of effectiveness. This has been going for some years with various papers (here, here, here, here – by far the funniest method – and here) which can be seen summarised in my thesis.

Interestingly, they all focus on how transactions propagate throughout the network. I found a few different ways to classify them which would be useful. They either actively send new transactions into the network, or passively receive transactions already in the network. They either focus on specific nodes to learn their connections or try to infer the whole network at one time. And they either use idiosyncrasies of the Bitcoin protocol to expose weaknesses, or they employ a statistical framework.

Fortunately, the first paper gave a snapshot of the network in 2015 that let us challenge some assumptions. Notably, they found that the network did not resemble a “random graph” (there is some notion of purposeful structure), and they found that around 2% of all nodes generated over 75% of all the blocks.

Looking at the papers, what really needs to be developed is a passive and statistical approach. One that just infers based on timing measurements of transactions already on the network. That way no one needs to spend money on transactions and it’s far harder to patch out.