Bitcoin Simulator
December 20, 2019
Find the GitHub repository here.
As part of my experimentation into learning the Bitcoin P2P network topology, I thought it would be useful to be able to generate transaction data. What I ended up with was a simulator that is a bit more general than how Bitcoin would actually work, and in some ways more random. Hopefully this will let us test how well our techniques will work when applied to other P2P networks and cryptocurrencies.
We model a transaction passing through the network as an infection cascade. One node starts out with the transaction, introducing it to the system, passing it to its peers who pass it to theirs and so on until the network is completely flooded with the transaction. We take a delay measurement that is based on their reception time, with an added component for the nodes to send it to our monitor node. The way that nodes send on the transaction has a number of random effects that influence it.
We have $n$ nodes and $N$ links between those nodes. We will include randomness for each node $v_i$ through a relay delay $\Lambda_i\sim\text{Exponential}(\lambda_i)$ and processing delay $\Pi_i\sim\text{Uniform}(a,\pi_i)$, where the parameters are drawn with $\lambda_i\sim\text{Gamma}(\alpha,\beta)$ and $\pi_i\sim\text{Pareto}(\gamma,\delta)$. Each $\lambda_i$ and $\pi_i$ are drawn once for each node at the start of the simulation. Note that the relay delay in the real Bitcoin network, the delay distribution for each node is drawn from a common distribution.
When a node $v_i$ receives a transaction, it draws a relay delay for each of its peers $v_j$ from $\Lambda_i$. Each of those peers then adds a delay drawn from $\Pi_j$ before it can send the transaction onto its own peers. In practice, we start at the first node to generate delay times for each node, sort by time, move to the next node to generate more delay times taking the minimum delay time for each node as the new delay time, and so on until all nodes have been moved to. At the end we add $\Lambda_i$ to each delay time to give a reception time at the monitor node.
We may also want to model a situation in which a small number of nodes generate a disproportionate quantity of the data. For this we will draw nodes from a discretised truncated Pareto distribution. With $n$ nodes, the probability we pick the $i^{th}$ node is given by $$P(D=i|n) = \frac{1}{1-\left(\frac{1}{n+1}\right)^c}\left[\frac{1}{i^c}-\frac{1}{(i+1)^c}\right]$$ where for $d_n$ proportion of nodes accounting for $d_s$ proportion of transaction sources, $c$ is calculated numerically as the solution to $$d_s = P(D\le d_n\cdot n|n) = \frac{1}{1-\left(\frac{1}{n+1}\right)^c}\left[1-\frac{1}{(d_n\cdot n + 1)^c}\right].$$
So the simulator can be set up by specifying the parameters $a,\alpha,\beta,\gamma,\delta,d_n,d_s)$ along with the number of nodes, edges and transactions.