Thanks to a quite a bit of luck I won this years version of the Underhanded Crypto Contest focused on crypto currencies. The submission consists of a writeup and – of course – backdoored code.
Data-Driven De-Anonymization in Bitcoin
Abstract
We analyse the performance of several clustering algorithms in the digital peer- to-peer currency Bitcoin. Clustering in Bitcoin refers to the task of finding addresses that belongs to the same wallet as a given address. In order to assess the effectiveness of clustering strategies we exploit a vulner- ability in the implementation of Connection Bloom Filtering to capture ground truth data about 37,585 Bitcoin wallets and the addresses they own. In addition to well-known clustering techniques, we introduce two new strategies, apply them on addresses of the collected wallets and evaluate precision and recall using the ground truth. Due to the nature of the Connection Bloom Filtering vulnerability the data we collect is not without errors. We present a method to correct the performance metrics in the presence of such inaccuracies. Our results demonstrate that even modern wallet software can not protect its users properly. Even with the most basic clustering technique known as multi- input heuristic, an adversary can guess on average 68.59% addresses of a victim. We show that this metric can be further improved by combining several more sophisticated heuristics.
A Validation-cost Metric for Bitcoin
This is the transcript of a talk I gave at the Scaling Bitcoin Conference 2015 in Hong Kong. See this mailing list post by Greg Maxwell for a general summary of the scaling measures that Bitcoin Core developers are adopting. The slides accompanying the transcript can be found here.
Exploiting CSGOJackpot’s Weak RNG
CSGOJackpot is a gambling website where players bet and win Counter Strike Go ‘skins’ (weapon textures).
Because these items can only be found by playing a lot of CSGo, they are quite rare and valuable,
and can be exchanged for example in Steam’s own Marketplace.
What is fascinating about CSGOJackpot and initially captured my attention is the sheer amount
of value that is gambled away. On average, more than 20,000$ are thrown into the pots per hour.
TL;DR: CSGOJackpot is a node.js app that uses Math.random() to determine the winning ticket. Of course, it’s not cryptographically secure and trivial to predict the next number given two outputs of the random number generator. I did not try to profit from this vulnerability but for the lulz I set up a twitch stream and revealed the next winning percentage in exchange for a drawing of Gabe Newell. See the submission gallery and a recording of the stream.
EDIT: There was quite some discussion about this issue on /r/GlobalOffensive.
EDIT 2: This vulnerability does not exist anymore in CSGOJackpot and I don’t know a similar site which is vulnerable.
Can Miners Exploit a Block Size Increase?
We don’t know yet. But modelling is in my opinion a useful tool to investigate potential effects. Therefore, I used Gavin Andresen’s mining simulator to create some more or less plausible scenarios. You can find the resulting plots on github.
Fuzzing Bitcoin Consensus
TLDR I ran afl-fuzz against libbitcoinconsensus to discover interesting Bitcoin scripts and used them to search for Bitcoin reimplementations vulnerable to forking. This discovered two bugs in btcd by Conformal. See the bitcoinconsensus_testcases repository for the discovered Bitcoin scripts.
Ethereum Bug Bounty Submissions
I found two vulnerabilities in the crypto currency Ethereum: negative transactions and predictable ECDSA nonce.
As part of the bug bounty program I was awarded with 20 Bitcoin.
Bitcoin Seeder DoS Vulnerability
The DNS parser of the Bitcoin Seeder was vulnerable to a denial of service attack. A specially crafted DNS request could trigger infinite recursive function calls that lead to a stack overflow. See the exploit. The vulnerability was fixed in commit 11e935b.
Guessing Bitcoin’s P2P Connections
The paper Deanonymisation of clients in Bitcoin P2P network (2014) by Biryukov, Khovratovich and Pustogarov (BKP), who describe an attack on Bitcoin Core clients, has started some discussion lately. The main idea of the paper is to first get a set of nodes $E_v$ to which your victim $v$ is directly connected to (“entry nodes”). Second, for each transaction $t$ record the $10$ nodes $P_t$ which first propagate $t$. The authors experimentally show that if $|E_v \cap P_t|$ is bigger than $3$ then there is a pretty high chance that $t$ actually originated from $v$. However, both attack stages basically require a Sybil attack – the attacker has to be connected to a lot of nodes a lot of times. ‘A lot’ means that in their experiments they had 50 connection to each full node (~250) in the test network. As a result, such an attack seems to be powerful, but certainly won’t be undetected.
In this post I show that the first stage of the attack, namely learning the nodes a victim is directly connected to can be done with a single connection to the victim. In addition to BKP’s attack, knowing all outbound peers of a client could significantly increase the success probability of a double spend. Note that all experiments are based on Bitcoin Core 0.9.4, but 0.10.0 shows the same behavior.
TLDR The attacker can reliably guess all of the outbound connections of a victim by making a selection from the known addresses of a victim based on the timestamp of the addresses.
Update A fix has been merged to bitcoind. The timestamp is not updated anymore when receiving a message from a connected peer. Instead, it is only updated when the peer disconnects. The fix is released in bitcoin core 0.10.1.
Privacy in BitcoinJ
As part of my epic quest to apply supervised machine learning to the blockchain in order to discover transaction patterns, I reviewed various wallet implementations in the hope of finding privacy leaks.
tl;dr If you are using a wallet that is built upon BitcoinJ, such as Android Wallet, Multibit and Hive Wallet, you have almost zero wire privacy. An attacker who manages to connect to your wallet is easily able to figure out all addresses you control. This is not very likely to get fixed in the near future.
Update: Mike Hearn’s reply addresses additional problems and improvements. There was also accompanying discussion on reddit.