At Breaking Bitcoin 2019 in Amsterdam I gave a talk about how to build secure protocols on BIPtaproot or more specifically how to avoid the dangers we learned about so far. There was not enough time to cover everything. The talk also gives an introduction to how to use our MuSig implementation in libsecp256k1zkp. The video recording is on youtube (slides). Thanks to kanzure there’s also a transcript of the talk.
Erratum: MuSig nonces can not be preshared. Only nonce commitments. See https://github.com/ElementsProject/secp256k1zkp/pull/73 for details.
]]>nixbitcoin (github.com/fortnix/nixbitcoin) is a project I contribute to in my spare time that provides nix packages and nixos modules for easily installing Bitcoin nodes and higher layer protocols. The initial idea was to build myself a lightning node in a reproducible way. I talked more about the motivation and how to use it at the LightningHackdayMUC (video, slides).
]]>Last weekend a bunch of hackers assembled for the 3rd Lightning Netword Hackday in Berlin. The event was packed with interesting sessions, neat hacks and exciting discussions which were concluded with the traditional dinner & drinks at ROOM77. I gave a talk about “Schnorr and Taproot in Lightning” (slides, video) focusing on privacy and security implications.
]]>At the recent Building on Bitcoin conference in Lisbon I gave a talk about a few new ideas in the scriptless scripts framework. The first part was mainly about blind coinswaps, which is a way to swap bitcoins with a tumbler without revealing which coin are swapped. The second part about how to exchange ecash tokens peertopeer using scriptless scripts and Brands credentials. You can find the talk on youtube and the slides here. Thanks to kanzure there’s also a transcript of the talk.
EDIT: I’ve added a note about the security of Blind Schnorr signatures against forgery to the slides.
In short, a naive implementation of the scheme is vulnerable to Wagner’s attack.
An attacker can forge a signature using 65536 parallel signing sessions and O(2^32)
work.
There are already a few good explanations of the bug, for example at the Monero StackExchange and the modern crypto mailing list. This article gives additional background about the signature scheme and the properties of the curve that allowed this bug to slip in.
Apart from the value at stake, this bug is interesting because it shows the risks of breaking a specialized cryptosystems such as Ed25519 apart and apply the parts in other contexts. Ed25519 is designed for plain cryptographic signatures and the curve it is based on is used in CryptoNote to implement onetime ring signatures. In contrast to a regular signature scheme, onetime ring signatures using the curve require that part of the signature does not generate a small subgroup. Ensuring this is necessary when using curves with a cofactor. CryptoNote did not do that.
A ring signature proves that the signer is among a set of public keys (aka “the ring”), without revealing which public key belongs to the signer. The construction used in CryptoNote/Monero and also for example in Confidential Transactions is based on hash rings.
For simplicity assume that for now the ring only consists of 1 key, which essentially reduces the scheme to a Schnorr signature.
As usual, G
is a generator of cyclic group in which the discrete logarithm is hard and we’re using additive notation for group operations.
Then the signature scheme consists of the of the following three algorithms (keygen, sign, verify):
1 2 3 4 5 6 7 8 9 10 11 12 13 

Let’s get a basic informal understanding for why such Schnorr based schemes work by taking the perspective of Eve, who does not know the discrete logarithm x
of P
.
Obviously, Eve would invalidate the signature when attempting to just change the message m
.
Further, when trying to fake a signature without knowing x
Eve can not to just set s
as in the regular signing algorithm.
But to make a signature that passes verification for some public key P
Eve must find s
, s.t. k*G = s*G + e*P
.
We can rearrange that in the following way:
1 2 3 

That means that if she would find such an s
she could compute the discrete logarithm of P
.
This is a contradiction.
Note that during verification the output of the hash function e
is also part of the input to the hash function.
How about during signing Eve chooses s
at random and then simply hashes s*G + e*P
?
The problem is that the properties of a cryptographic hash function prevent Eve from knowing e
before before evaluating the hash function.
So e
can not be fed into the hash function and as a consequence s
must be chosen to account for e
only after hashing.
Rings of size one naturally don’t make a lot of sense but are sufficient for this post’s purpose. The curious can for example have a look at the explanation in the Borromean signature paper(section 2.2).
Onetime ring signatures are used in CryptoNote to allow combining the privacy properties of ring signatures with a mechanism to detect double spending.
This is done by introducing the concept of a “key image”.
The key image is a group element that is deterministically derived from a key but in itself doesn’t reveal anything about the key.
Define hashp
to be a hash function that hashes to an element in the group.
Then the key image I
for the key pair (x, P=x*G)
is I = x*hashp(P)
so P
and I
have the same discrete logarithm.
A onetime ring signature includes the key image belonging to the signer.
The CryptoNote protocol allows using ring signatures when spending coins by enforcing that each key image can occur only once in the blockchain.
Let’s for example assume there are two unspent coins – in our case just represented by public keys P1
and P2
.
Alice knows the private key to P1
, so she can spend the coin by providing a onetime ring signature with P1
, P2
and the key image I
corresponding to P1
.
An observer can not tell whether P1
or P2
was spend.
But if Alice would attempt to spend P1
again (even with a different ring) she would require the same key image which is rejected by the network.
However, P2
can still be spent because the signature uses a different key image.
Now the concrete onetime ring signature scheme – again shown only for rings of size 1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Intuitively, it’s very hard to create a signature where I != x*hashp(P)
because the same s
that is used to prove knowledge of the discrete logarithm x
of P
is also used for I
.
The onetime ring signature scheme for rings larger than 1 is described in the CryptoNote whitepaper(section 4.4) although in a less space efficient way. The construction is also related to the proof of discrete log equality used in perfectly binding Confidential Transactions. Note that actually by now a more general scheme that is based on onetime ring signatures called Ring Confidential Transactions has replaced regular onetime ring signatures in Monero.
As mentioned in the beginning, CryptoNote uses Ed25519’s curve (also referred to simply as “Ed25519”) to represent its group elements.
One of Ed25519’s properties is that the number of points on the curve (curve order) is larger than the number of points in the group generated by the base point G
(group order).
The group order is the prime l = 2^252 + 27742317777372353535851937790883648493
and the curve order is 8*l
The ratio of the curve order and the group order is known as the cofactor which is 8 in the case of Ed25519.
This is, for example, different to the curve secp256k1 used in Bitcoin which has cofactor 1.
The cofactor indicates that there are groups of low order on the curve.
For example, let P
be the point represented by 26e8958fc2b227b045c3f489f2ef98f0d5dfac05d3c63339b13802886d53fc05
then P
generates the (unique) group of order 8
which implies 8*P = 0
.
There are only few points on the curve with low order.
One way to find them is to generate a random point on the curve with order a*l
and then multiply by l
to get a point of order 1 <= a <= 8
.
On the other hand, hashp
ensures that the resulting point is in the prime order group by multiplying it by 8
before outputting it.
In the case of the regular Ed25519 signature algorithm it doesn’t matter if a public key is of low order.
But Ed25519 implementations make sure that a private key is a multiple of 8 (“clamping”), so when multiplying it with a low order point then the result is always 0 (instead of leaking bits from the private key).
Therefore, after running the DiffieHellman protocol on such a curve the key shared between the two parties can be 0
if one party doesn’t behave “contributory”.
Assume the attacker owns a coin and therefore can create a regular onetime ring signature to spend it with key image I = x*hashp(P)
.
The attacker can spend the coin again with key I' = I + L
where L
is a low order point with order o
.
Remember the verification equation includes s*hashp(P) + e*I
.
If o
divides e
, (e = e'*o
) then e*I' = e*I + e'*o*L = e*I
so a valid signature with I'
can created in the same way as with I
(except that the message m
now has to commit to I'
).
Since o
is at most 8
it is easy to to retry signing until there is a suitable hash e
.
Interestingly, in ByteCoin this was exploited in a much less effective way.
The attacker used a low order public key P
requiring a low order key image I
.
Because there are only 8 low order points on the curve (multiple representations of the same point are disallowed in CryptoNote) this attack can be only performed 8 times in one blockchain.
The fix implemented in Monero is to verify that each key image I
generates a group of the prime order l
by checking that l*I = 0
.
If I
actually had order l' != l
then for l*I = 0
to hold, l
must be divisible by l'
which is a contradiction because l
is prime.
So the additional cost introduced by the fix is one scalar multiplication per transaction input.
This article gave yet another example for how cryptographic parts can not be easily repurposed. In particular, when implementing more complex protocols based on curves with a cofactor (like for example Ed25519 or Curve25519) the group order of user supplied generator points should always be verified. Deciding casebycase whether that’s necessary is quite dangerous in practice. There is, however, potential to get rid of this bug class for some cryptosystems by eliminating cofactors through point compression (see Decaf). Alternatively, when designing a new cryptosystem it should be considered to use a prime order curve such as secp256k1.
]]>I am a complete outsider to Monero and especially the Monero development community, but having reviewed the CT design and implementation (in libsecp256k1) extensively during my day job, I was very interested in the design decisions underlying RingCT. Very quickly I found a red flag in the ring signature scheme called ASNL used in the range proofs. This scheme is a new contribution by the paper and indeed turned out to be exploitable such that an attacker would be able to create coins from nothing. You can find the exploit code on GitHub and a detailed explanation in this post.
While writing the exploit code and preparing this blog post I learned that an anonymous person called RandomRun reported a flaw in the security proof of ASNL, which convinced the Monero devs to publish a bugfix release that switches to Borromean signatures (good call!). As a result the upcoming hard fork will not be vulnerable to this exploit. Interestingly, the error in the security proof is exactly the flipside of the vulnerability discussed in this post.
EDIT: The Monero community reacted to this article (see reddit) but they didn’t like its style. Also, they got the timeline of the discovery of the bug wrong.
I have the highest respect for RandomRun and parts of the Monero community. It takes an incredibly strong character to drop an 0day worth tens of millions USD. However, that the original hard fork schedule of RingCT remains unchanged despite a complete break of the system raises more than a few questions. Even more so when the author of RingCT called for more review by the end of October.
Confidential transactions include a range proof to prevent negative amounts.
These range proofs use a generalization of ring signatures in which
the conjunction of multiple rings is proven, for example that the prover knows the discrete logarithm of (Pk1 OR Pk2) AND (Pk1 OR Pk3) AND ...
The original CT scheme introduced Borromean signatures for that purpose which are based on rings of hashes and provide space savings when public keys appear more than once.
Instead, the RingCT paper proposes a new scheme called Aggregate Schnorr Nonlinkable Ring Signature because it has “perhaps simpler security proofs” (RingCT paper).
A ASNL signature consists tuples (P1_j, P2_j, L1_j, s2_j)
for j = 1, ..., n
and s
which
is supposed to prove that the signer knows the DL of (P1_1 OR P2_1) AND ... AND (P1_n OR P2_n)
.
Let’s consider the n = 1
case (no conjunction) informally.
The verifier checks that
1


where H
is a hash function.
So either
x
of P1
then sets1 2 3 

x
of P2
then sets1 2 3 

In the case of multiple conjunctions (n > 1
), the verifier computes LHS < L1_1 + ... L1_n
and RHS < s*G + H(s2_1*G + H(L1_1)P2_1)P1_1 + ... + H(s2_n*G + H(L1_n)P2_n)P1_n
and checks that LHS = RHS
.
In short, this is vulnerable because you can just choose some L1_j
such that it cancels out the summand on the right hand side where both DLs of P1 and P2 are unknown.
In contrast, the “proof” of security of ASNL assumes that any adversaries knows a
s.t. a*G = L1_j
for all j
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

Abstract
We analyse the performance of several clustering algorithms in the digital peer topeer currency Bitcoin. Clustering in Bitcoin refers to the task of finding addresses that belongs to the same wallet as a given address. In order to assess the effectiveness of clustering strategies we exploit a vulner ability in the implementation of Connection Bloom Filtering to capture ground truth data about 37,585 Bitcoin wallets and the addresses they own. In addition to wellknown clustering techniques, we introduce two new strategies, apply them on addresses of the collected wallets and evaluate precision and recall using the ground truth. Due to the nature of the Connection Bloom Filtering vulnerability the data we collect is not without errors. We present a method to correct the performance metrics in the presence of such inaccuracies. Our results demonstrate that even modern wallet software can not protect its users properly. Even with the most basic clustering technique known as multi input heuristic, an adversary can guess on average 68.59% addresses of a victim. We show that this metric can be further improved by combining several more sophisticated heuristics.
As we’ve seen over the last two days scalability is a multidimensional problem. One of the main topics of research is increasing the blocksize to increase transaction throughput. The assumption is that as technological progress is continuing and transaction throughput is increased accordingly, the cost for runnning a fully validating node stays constant.
However, blocksize proposals usually only directly change one aspect of full node costs – the blocksize. The actual costs for a node are composed of multiple factors, such as the resources required to validate a block or to store the utxos. And these factors are not necessarily in a linear relationship with each other. This has been discussed more detailed in Mark’s talk at the previous Scaling Bitcoin conference.
The most prominent example for showing nonlinear relationships consists of putting as many OP_CHECKSIG operations into a single transaction as possible. For each checksig operation, the whole transaction is hashed and a signature is verified. Assuming 1MB blocks, it is possible to create a block that takes more than 10 minutes to validate on my 2014 laptop. It is clear that each proposal that increases blocksize also needs a strategy to deal with these nonlinearities.
One of those strategies is to put a hard limit the number of signature verifications and the number of bytes that are hashed for a block to be valid. We see some problems with this approach: First, as it stands there is no intuitive way to choose these limits nor how they grow with the blocksize. Second, there are other factors that influence validation cost, which might not relevant now, but could get significant in bigger blocks if not properly limited. For example, it is possible to create a 1MB block that takes 5 seconds to validate on my laptop, which just consists of as many HASH opcodes as possible. And third, placing hard limits on certain factors completely ignores the relationship between those factors.
These relationships exist, because thiose factors influence validation cost in some way. This brings us to the concept of cost metrics.
The goal of the cost metric approach is to tie consensus rules to actual resource requirements. The idea is that cost of a block is a function of certain block properties. As an example, the block cost could be represented by a weighted sum of block size, validation cost and utxo growth.
When we have agreed on such a cost metric, we can get rid of the hard limits and instead introduce a new consensus rule that blocks need to cost less than a threshold to be valid.
One aspect of a full cost function are validationcost. We can view validation cost as the time it take to validate a block on a reference machine. Then we can introduce a threshold saying that a block is not allowed to exceed 30 seconds validation time on a reference machine. In other words, we want to find a function from block features like the number of bytes that are hashed for signature validation to validation time on the reference machine. To do that, we assume a simple model function that states that the validation duration is a linear combination of block features, collect data about the actual validation duration on that machine and then fit the model to the data.
The one dimensional situation is depicted in the right, there is one data point for each block consisting of the number of bytes that were hashed and the time it took to validate. With this data it is possible to determine the effect or coefficient of hashing on validation time which is represented as a line in the plot. This coefficient can then be used in a consensus rule.
MAYBE: If we assume that the resources involved grow at the same speed, this kind of metric can be naturally scaled by multiplying the whole equation with the inverse of the growth factor.
Validation cost is affected first and foremost by OP_CHECKSIG, that is signature verification and hashing the transaction. Bitcoin Core already limits the number of OP_CHECKSIGs but this is insufficient for our case because what counts are the number of OP_CHECKSIGs that are executed. We built on Gavin Andresen’s code to count those factors while validating transactions. We also record hashing via the OP_HASH opcodes, and how many bytes are written and removed from the stack. And the number of inputs which loosely corresponds to the number of lookups in the utxo set. And of course we also measured our dependent variable, the ConnectBlock duration on the reference machine.
As a reference machine we used my laptop, which has two 3gHz i7 cores. To collect block feature data and the corresponding ConnectBlock duration, we reindexed mainchain, testchain and custom regtest chains which for example consisted hardtovalidate blocks. I found out that I could comfortably use the computer while using only 5GB of 8GB RAM, so I set the dbcache option to 3GB. dbcache determines how much data is cached in memory We ran Bitcoin Core version 0.11.2 with libsecp validation and disabled checkpoints.
After estimating the coefficients using linear regression, we get useful information like for each kilobyte of hashing validation takes 0.005 millisecond longer for each signature verification it takes 0.1 millisecond longer. Other features do not play a comparably significant role at the moment, even though it is possible to create a block that takes around 5 seconds to validate and only consists of hash opcodes.
The validation cost function fit is very accurate: for a random test selection of test and mainnet we get an average absolute error of less than 4 ms. Most importantly, the estimated function is able to predict hardtovalidate blocks very accurately: The one tested example was a block that took 130.4ms to validate, 131.7 was predicted.
So, now we derived a validation cost metric that corresponds to validation time on a reference machine and we can define a new consensus rule that would require a block to have a smaller validation cost than some threshold. After picking a threshold, there would be a situation like in this plot, where xaxis is the block size, yaxis validation time and the green area represents the space of valid blocks.
However, picking another threshold is difficult. because there is no one size fits all solution: (1) you don’t want to constrain potential use cases but (2) and you also don’t want want to sum validation time and bandwidth worst cases.
On the other hand, we can try to relate bandwidth requirements and validation cost using a simple weighted sum for example and then pick a single threshold.
And this is exactly the idea behind the costmetric, find all factors affecting node cost and how exactly they influence node costs and then pick a reasonable cost threshold. And what this idea really entails is moving away from blocksize proposals to arguing about total node costs.
Now the question is how exactly do you convert bandwidth requirements, validation time to cost? Does it make sense to trade off one second of network latency with one second of validation duration? How do we bring additional cost factors in, like utxo set size? How futureproof is that solution?
There is certainly no single correct answer to these questions. We can, however, show the advantages of a cost function while building on existing block size proposals. Most block size proposals consider average use at a specific maximum block size. So in terms of cost threshold it would make a lot of sense to allow maximum sized blocks only in combination with average validation time. In this way we can prevent blocks that have both a worstcase size and worstcase validation time. We get the average validation duration for a specific block size using the data we collected earlier with the reference machine.
Also we set a hard limit validation cost of 10 seconds, which seems reasonable because the maximum validation time on the reference machine was 6 seconds. to the average validation time at the maximum blocksize. Then we allow to linearly interpolate between the maximum validation time at half of the maximum blocksize
This shows an advantages of a cost metric: we constrain the worst case by bringing it closer to the average case, and still allow possible future usecases which require a lot of validation resources.
So far, the cost of maintaining the utxos has not played a role in Bitcoin. In fact with a 1MB block, the worst case utxo set size increase is almost 1MB, whereas the average over the past year is an increase of around 11kilobyte. Finding a reasonable place in the cost function is even more complicated than validation and bandwidth resources, in part because they are longterm costs. The current situation with Bitcoin is that there is no incentive to avoid increasing the utxo set size if possible. This can be as simple as moving bytes from the scriptSig to the scriptPubKey. What we can do with the cost function is placing a slight incentive to include transactions that reduce the utxo set size and thereby cheapen them. The proposed way to do this is allowing a larger validation costs when the block reduces the utxo set size. This aligns well with the fact that blocks that sweep a lot of utxos have rather extreme validation costs due to the high ratio of inputs to outputs and we want these blocks to be valid because they are extremely beneficial.
In order to determine a specific function one can compute the maximum possible decrease of utxo set size for a block of maximum size. Then linearly interpolate such that for each byte the utxo set is reduced the maximum allowed validation costs are increased until we reach let’s say half of the remaining validation cost. This rule does not give the utxo set size the prominent place in the cost function it would deserve but at least moves incentives in the right direction.
This cost function can trivially grow with the blocksize, by multiplying the validation cost limit and average validation cost with the same scaling factor. So if the blocksize is doubled, then double max validation cost point and double max validation cost and double average transaction
This situation is shown in the plot for 1MB, 2MB and 4MB maximum block sizes.
It ensures that the worst case validation time scales as fast as the block size, which is an implicit assumption underlying many blocksize proposals. Also it guarantees that average blocks are always allowed to have the maximum block size.
In conclusion, Bitcoin places various resource requirements on full nodes. And it is essential that blocksize proposals account at least for the most important ones, or extreme worst cases are . A cost metric helps with that because it sets the requirements in relation to each other.
We’ve seen that estimating a function for validation cost only, is straightforward, when assuming a reference machine, collecting data and fitting a linear function.
A more complete cost function that includes bandwidth, validation and utxo requirements is difficult to derive from the bottom up. But as we showed we can build on existing blocksize proposals to get some of the advantages of a cost metric, * such as confining the worstcase while * allowing to tradeoff various block aspects * and setting the right incentives.
]]>CSGOJackpot is a gambling website where players bet and win Counter Strike Go ‘skins’ (weapon textures).
Because these items can only be found by playing a lot of CSGo, they are quite rare and valuable,
and can be exchanged for example in Steam’s own Marketplace.
What is fascinating about CSGOJackpot and initially captured my attention is the sheer amount
of value that is gambled away. On average, more than 20,000$ are thrown into the pots per hour.
TL;DR: CSGOJackpot is a node.js app that uses Math.random() to determine the winning ticket. Of course, it’s not cryptographically secure and trivial to predict the next number given two outputs of the random number generator. I did not try to profit from this vulnerability but for the lulz I set up a twitch stream and revealed the next winning percentage in exchange for a drawing of Gabe Newell. See the submission gallery and a recording of the stream.
EDIT: There was quite some discussion about this issue on /r/GlobalOffensive.
EDIT 2: This vulnerability does not exist anymore in CSGOJackpot and I don’t know a similar site which is vulnerable.
CSGOJackpot works like this:
In addition to guessing the winning percentage, an attacker has to know the total number of tickets to be sure to win the pot. So, he has to try to place the last bet which can be tricky and is very difficult during times of high traffic because of huge lags.
The HTML showed some signs of node.js, so my hypothesis was that the site simply uses javascript’s Math.random() to determine the winning percentage. Fortunately, the full winning percentage with up to 16 digits is published after the end of a round, which is exactly the amount of digits I got when I executed Math.random() on my machine. Node.js uses the V8 javascript engine and its implementation of Math.random() (nodejs 0.12.X) is as follows:
1 2 3 4 5 6 7 8 9 

This is known as Marsaglia’s MultiplywithCarry. Note that the implementation used in nodejs 0.10.X uses a very similar algorithm, but it’s implemented in C and the conversion to floating point is done differently.
So the RNG state has 64 bits and 32 bits immediately leak from a single output. Given two subsequent outputs one can bruteforce the remaining 32 bits of the states which takes about 30 seconds on a 3.0Ghz i7 core (implemented in C). However, this failed to produce the correct state, so my guess was that there are some calls to Math.random() in between two winning percentages. It turned out that the the number of calls between varies between 8 and 35 and brute forcing this required a third winning percentage and around 5 hours in expectancy. So now I had the correct state, which I verified by creating the next 50 numbers and checking if they contained the next winning percentages. But I didn’t find any pattern by which I could determine which of the next random numbers is going to be winning percentage. Fortunately, there is another feature of CSGOJackpot which made this trivial.
The site claims to be provably fair. But this is not really the case. What they are doing is a simple commitment to the winning percentage
, by publishing a hash md5(blinding + winning percentage)
before the round (where the blinding is a uniformly random hexstring)
and revealing the blinding and winning percentage at the end of the round. Thus, they can not adjust the winning percentage to their
liking during or after the round. But, naturally, provable fairness implies that even the server does not know the winning percentage ahead of time.
However, this feature made it possible to reliably predict the next winning percentage.
I observed that the blinding just consists of two calls to Math.random() which were converted to hex with toString(16).substr(2,4)
and then concatenated. So now I just had to step through the next winning percentage candidates and the next blinding candidates and
check if their hash matched the commitment.
One more word to provably fairness. It’s quite annoying to see CSGOJackpot and the many other sites that work similarly to make exactly the same false claim. I’m not a cryptographer so take the following with a grain of salt and I’d be happy learn if I’m missing something important. A truly fair scheme seems to be possible although much more complex to implement. The underlying problem is known as coin flipping. In a two player setting you can have each player commit to a value and then XOR the value in the reveal phase to get a statistically independent result. This is how for example Satoshi Dice achieves some level of fairness.
However, in a multiparty setting (assuming the existence of a broadcast channel), this can be trivially Sybil attacked. An attacker could create multiple identities and refuse to reveal one of his commitments, if another one of his identities wins the pot. A trivial Sybilresistant construction would have each player loose more when not revealing than what is in the pot, but this does not seem really practical. Another approach is to use timelock encryption instead of commitments, which means that after a some time everybody can decrypt the value without having access to the key.
I didn’t play this game at all (it would have been unfair :) ), but for the lulz I had to at least troll them a bit. So I set up a twitch stream where I was revealing the next winning percentages in exchange for a drawing of Gabe Newell. I privately disclosed the bug to the administrator the moment I started the stream.
Sorry for the bad quality in the beginning of the recording, it gets better at the 5:04 minute mark.
See also the submission gallery.
After 2 hours of fun they fixed the issue.
Interestingly, googling “nodejs cryptographically secure random number generator” did not really result in plugandplay solutions for me.
Without knowing about the pitfalls of javascript I suggested to use crypto.randomBytes(4).readUIntLE(0, 4) / 0xFFFFFFFF
(if this is somehow wrong please
write me a message).
Unfortunately, so far they didn’t remove the “provable fairness” claim.
One of the things that must not happen during regular Bitcoin operation are forks. A fork occurs when there is a new block $B_{i+1}$ which is a valid successor to block $B_i$ for some set of Bitcoin nodes $N_v$ and invalid for the remaining nodes $N_{\neg v}$. Therefore, miners in $N_v$ will mine new blocks on top of $B_{i+1}$ and miners in $N_{\neg v}$ will still mine on $B_i$. As long as the majority of hashpower is in $N_{\neg v}$, the chain divergence will be resolved after some time, because $N_{\neg v}$’s chain will eventually get longer than $N_v$’s chain and then the nodes in $N_v$ will switch to $N_{\neg v}$’s chain. This is due to the nature of the blockchain: nodes always trust the longest valid chain (more exact: the chain with the most proof of work).
Consider for example the case of an update to the Bitcoin reference implementation that restricts valid signature encodings. $N_v$ are the nodes running the old Bitcoin version and $N_{\neg v}$ run the new version. As soon as the hash power of $N_{\neg v}$ exceeds some threshold the new consensus rule can be safely activated. In the context of Bitcoin updates this is called a softfork: a valid block becomes invalid in the new version. On the other hand, a hardfork occurs when an invalid block is valid in a new version, for example by raising the maximum block size limit. Then nodes that run the old version are represented by $N_{\neg v}$. Even if the majority of hashpower is in $N_v$, the nodes in $N_{\neg v}$ can never switch to $N_v$’s chain because some blocks are invalid for them. Therefore, in the case of a hardfork all nodes are required to update.
Forks in practice do not only happen deliberately because of updating mechanisms but can also be triggered by bugs. Bitcoin reimplementations such as libbitcoin, btcd, bitcore and toshi are particularly vulnerable to these bugs because they have to match exactly the behavior of the Bitcoin reference implementation. In order to abstract part of the consensus critical code and allow other projects to use it, Bitcoin Core developers created the bitcoinconsensus library. I am not aware of any reimplementation that already adopted libbitcoinconsensus. Right now, it only has a single function bitcoinconsensus_script_verify, which takes an output script and a transaction and returns if the transaction is allowed to spend the output.
Among other conditions, a transaction is valid if the top stack item is different from 0 after script execution. Bitcoin script is much more powerful than just verifying signatures and therefore I was curious to find interesting scripts, i.e. scripts that trigger unusual edge cases. I’ve recently heard about successes with aflfuzz whose heuristic using code coverage seemed to be particularly well suited for the task. Also, it has the capability to minimize a set of inputs such that the code coverage stays the same. After fuzzing libbitcoinconsensus for two weeks I supplied the inputs to btcd’s txscript, a reimplementation in golang, and checked if the outputs differ.
The first bug I found was in btcd’s implementation of the OP_IFDUP opcode. This opcode pushes the top stack element on the stack if it differs from 0. Because of a type conversion in btcd, a stack element that exceeds 4 bytes would have never been copied, which differs from bitcoinconsensus’ implementation of the opcode. The second bug concerned the representation of the result of OP_EQUAL. This opcode compares the two top stack elements and pushes the result on the stack. In Bitcoin Core, if the comparison fails an empty byte array is pushed on the stack. Btcd however pushed a byte array containing 0. This means that the following script would be valid in bitcoinconsensus and invalid in btcd (Note that OP_0 pushes an empty byte array to the stack):
1


Both bugs would have triggered hardforks. An attacker could simply broadcast a transaction with the affected scripts and it would be mined subsequently. Btcd would have not been able to include the block into its chain and would become stuck on the last block. Therefore, an attacker could create a block on top of btcd’s chain paying a merchant running btcd without affecting his ‘real’ coins on the main chain. Note that the attacker would not race against the hashpower of Bitcoin miners.
Dave Collins from the btcd team fixed these issues very fast and additionally improved the test coverage in Bitcoin Core for the affected and more opcodes. Additionally, he was so kind to award me with 0.5 bitcoin for the find.
You can find the result of the fuzzing, the code to produce them and test reimplementation in the bitcoinconsensus_testcases repository. If you are interested you can start fuzzing yourself and submit a pull request with new scripts you found. Also, I’ve executed the testcases only with btcd and bitcore so far.
]]>As part of the bug bounty program I was awarded with 20 Bitcoin.
]]>In this post I show that the first stage of the attack, namely learning the nodes a victim is directly connected to can be done with a single connection to the victim. In addition to BKP’s attack, knowing all outbound peers of a client could significantly increase the success probability of a double spend. Note that all experiments are based on Bitcoin Core 0.9.4, but 0.10.0 shows the same behavior.
TLDR The attacker can reliably guess all of the outbound connections of a victim by making a selection from the known addresses of a victim based on the timestamp of the addresses.
Update A fix has been merged to bitcoind. The timestamp is not updated anymore when receiving a message from a connected peer. Instead, it is only updated when the peer disconnects. The fix is released in bitcoin core 0.10.1.
When a node $n$ connects to another peer $p$ in the network it advertises its address using the “addr” message. The peer will select a number of its own peers at random which are “responsible” for $n$’s address. Then the address is forwarded to responsible peers to spread the knowledge about $n$ in the network. The number of responsible peers is either $1$ or $2$ depending on whether the address is reachable by $p$.
BKP’s attack works by recording the set of peers that first propagated a victim’s address. In order to have good chance to be in the set of responsible peers for the address, the attacker has to hold a significant number of connections to each full node in the network. Note that it is possible to have multiple connections from a single public address to a peer.
It turns out that an attacker can simply infer the peers of a victim by sending getaddr messages to him.
In bitcoin, the address structures that are send via the addr message do not only contain the IP adress and port but also a timestamp. The timestamp’s role is ensuring that terminated nodes vanish from the networks knowledge and it is regular refreshed by the nodes which have an interaction (more about that later) with the peer at that address. Bitcoin nodes usually record the addresses they hear about and send them in a reply to a getaddr using the addr message.
The following experiments show that an attacker can guess some or all of the direct peers of a victim by sorting the known addresses of the victim based on the timestamp.
A minor obstacle is that a node replies to a single getaddr message only with maximal 2500 addrs selected uniformly at random. In order to get a certain percentage $\tau$ of the known addresses of a node the attacker has to send multiple getaddr messages and record the percentage that is new to her.
1 2 3 4 5 6 7 8 9 

Experiments show that if we wait 10 seconds after each getaddr request it takes around $3.5$ minutes to collect $\tau$ percent addresses ($13,500$ in this case).
I set up a victim node $v$, which is just a regular bitcoin node. The attacker $a$ is a node that connects to $v$ via the P2P network and queries the known nodes of $v$. Second, $a$ connects to $v$ via the RPC interface and gets the true peers.
The attacker code (btcP2PStruct) is available on github. Thanks to the btcwire package it is very simple to write this kind of code.
You can find all the data to produce the graphs in the project repository.
First we consider the case where $v$ does not accept incoming connections (“client” in BKP’s terms). $v$ was running for 2 days and I recorded data for every hour but I will only discuss the last measurement because the data is very similar.
Note that $v$ returned $12,868$ known addresses. Also, a client usually has maximally 8 peers due to the default maximum number of outbound connections. This implies that an attacker can not start start this attack on a client that is not connected to her. Here we see that if the attacker obtains all peers of $v$ (without any false positives in this case).
Next, the case for the full node, which I left running for 8 days.
Again it is evident that an attacker can reliably determine all outbound connections of the victim using a threshold of 20 minutes. However, inbound peers can only be detected very poorly.
The reason for finding all outbound peers is is this logic in bitcoincore which refreshes the timestamp on every message of outbound nodes.
BKP mention a neat trick how to determine if two nodes $v_1$ and $v_2$ are connected. First, the attacker connect to $v_1$ and $v_2$ and send addr messages containing bogus addresses to $v_1$. Then, she counts the number of times one of these addresses is received from $v_2$. However, the authors leave open how many messages you need send to be certain about the hypothesis.
As we already know, the address is forwarded only to two responsible nodes so we have to compute the probabilities of our node being responsible. Using the binomial distribution we can compute the likelihood of receiving a certain number of addresses back given that we sent a certain number of addresses.
I’ve done the math using this code and some assumptions regarding the structure (edges are uniformly iid). Also, the attacker has to know or approximate the number of peers of a node, which can be done with a similar method than the one described. Connect two times to the victim, send and note the ratio of returned addr messages. If you can not connect to the node, it will most likely have 8 peers.
This theoretical model shows that that if $v_1$ is a full node and $v_2$ is a client then we need about 2000 messages to determine if they are connected with 95% probability. Similarly, if $v_1$ and $v_2$ are full nodes, the attacker needs to send 20000 messages.
However, in order to remain polite in the network this attack needs start from a candidate set of nodes. Therefore, it could be a useful method to remove the false positives which were obtained with the “getaddr”fingerprint.
It should be pointed out that even if you know a victim’s entry nodes you can not simply connect to those few and listen for transactions. This is because “trickling” prevents estimating the origin of a transaction without further assumptions or doing BKP’s Sybil attack. However, knowing all outbound peers of a client could significantly increase the success probability of a double spend.
Update The fix removes the update every 20 minutes and updates on disconnect
]]>tl;dr If you are using a wallet that is built upon BitcoinJ, such as Android Wallet, Multibit and Hive Wallet, you have almost zero wire privacy. An attacker who manages to connect to your wallet is easily able to figure out all addresses you control. This is not very likely to get fixed in the near future.
Update: Mike Hearn’s reply addresses additional problems and improvements. There was also accompanying discussion on reddit.
A Bloom filter is a probabilistic data structure that is used to test whether an element is a member of a set.
Bitcoin SPV nodes that use BIP 37 (we call them thin clients from now on) put
all public keys they are interested in into the Bloom filter and send the filter to their peers. Upon receiving a new transaction, peers query
the Bloom filter and only relay the transaction to the BIP 37 node if the query returned true.
Thus, thin clients normally only receive the transactions they are really interested in, i.e. mostly transactions that include one of the wallet’s keys.
The advantage of using a Bloom filter instead of just broadcasting all your pubkeys is that a Bloom filter is faster and more spaceefficient
at the cost of some false positives.
That means the thin client will receive transactions that include pubkeys which were not put into the filter.
Usually, the parameters of a Bloom filter are computed such that a certain target false positive rate (fp
) is achieved.
We want the fp rate to be relatively small (say 0.05%) to reduce bandwidth usage.
BIP 37 states:
Privacy: Because Bloom filters are probabilistic, with the false positive rate chosen by the client, nodes can trade off precision vs bandwidth usage. A node with access to lots of bandwidth may choose to have a high fp rate, meaning the remote peer cannot accurately know which transactions belong to the client and which don’t.
This has created a misunderstanding between what is ideally possible with Bloom filters and how the reality looks like. I’ll focus on BitcoinJ because it is the most widely used implementation of BIP 37, but similar vulnerabilities might exist in other implementations as well. Unfortunately, in the current BitcoinJ implementation Bloom filters are just as bad for your privacy as broadcasting your pubkeys directly to your peers.
The main idea behind this vulnerability is that BitcoinJ puts both pubkey and pubkeyhash into the Bloom filter which substantially reduces the false positive rate.
If you create a completely fresh wallet, BitcoinJ holds 271 pubkeys and computes the parameters
of the Bloom filter such that the fp rate for (271*2)+100 elements is equal to 0.05%.
Because bitcoinj initially puts only 271*2 elements into the filter (pubkey and corresponding pubkeyhash) the effective false positive rate is fp=0.000146
.
The vulnerability is that if a pubkey is truly in the filter then querying both pubkey and pubkeyhash must return true.
Because the pubkeyhash is just another almost uniformly random string, the probability of a false positive for the attacker is fp' = fp^2 = 0.0000000213
.
I obtained around 56 million pubkeys from the blockchain (midJanuary), which theoretically results in 56 million * fp' = 1.19
expected false positives when scanning the blockchain.
I ran 20 crawlers since the beginning of December and collected 70,000 distinct filters until now. These crawlers just listen for a filterload message and try to be really polite by disconnecting after 2 minutes and not sending anything. The probability that a randomly selected DNS seed returns at least one of the crawlers is 4.3%.
In fact, most of the Bloom filters from recent BitcoinJ versions show a experimental false positive rate around 0.000146. The experimental fp rate is computed by querying the filter with millions of elements which are certainly not pubkeys. Android Wallet 4.16, 4.17, 4.18 for example use the most recent BitcoinJ version (12.2) and make up 52% of the data. However, there is also MultiBit 0.5.18 whose effective fp rate is smaller than 0.00000001.
We are currently starting to analyze all filters using the described “attack” and we expect that this will take several weeks.
What we’ve already seen is that the theoretical fp'
really holds, i.e. if you create a fresh wallet and scan the whole blockchain you most likely get one false positive pubkey.
You might think that the problem is easily fixed by trading off bandwidth for more privacy and increase the fp rate to fp = sqrt(0.0005) = 0.0224
.
On the one hand this might seriously impact the bandwidth of mobile clients, and on the other, there is another another general class of vulnerabilities concerning Bloom filters:
If an attacker manages to obtain multiple, different filters from the same Wallet,
he can compute the intersection of pubkeys that match the filters and therefore removes the false positive noise similar to the “simple attack”.
Different filters mean that they have different total size of a different Nonce.
Sending different filters can happen in BitcoinJ due to multiple reasons, for example
I do think this is a critical privacy leak as it doesn’t require a sophisticated attack and wallets have practically been broadcasting all their pubkeys for years. Not only the addresses you see in your wallet, but also a lot of your future addresses have been exposed. From now on you should assume that the kind of bulk data collection I did is happening. It is difficult to say how accurate and stealthy targeted attacks would be.
According to Mike Hearn, the creator of BitcoinJ, the problems have been known from the start but fixing these issues is far from trivial because “lying consistently is hard”. I fully agree with this. Someone needs to make it their project for a few months.
There are some simple ideas to slightly improve the current status such as deploying nodes that broadcast fake bloom filters. Arthur Gervais et al., 2014 were the first to publish an academic paper on the topic and propose some more or less vague suggestions. One idea I find interesting is that thin clients should be able to install multiple filters at their peers such that no pubkey is shared between the filters. Thus, instead of recomputing the filter when the wallet creates new addresses, it would create an entirely fresh filter for the next keys. One disadvantage is that at the moment multiple filters per peer is not supported by the bitcoin wire protocol. Another issue with Bloom filters is that an attacker could safely assume that the probability is higher for two pubkeys to belong to the same person if they are closer in the transaction graph. As a countermeasure the wallet could deliberately put existing foreign pubkeys that are close into the filter.
I feel sorry for the people whose privacy has been potentially compromised unknowingly by malicious parties and we certainly won’t give away the data set but nonetheless it is really exciting what can be found in the data. If you have suggestions what to look out for and what would be interesting (not necessarily concerning machine learning) feel free to contact me.
]]>This project is a python port of Coinciding Walk Kernels (CWK) [1] and introduces an extension of the model called FeatureCWK (FCWK). If you want to jump right into some code see the benchmark.
CWKernels deal with the problem of node classification (aka linkbased classification) in which a set of features and labels for items are given just as in regular classification. In addition, a node classification algorithm accepts a graph of of items and itemitem links. It has been shown that the additional information that is inherent in the network structure improves performance for certain algorithms and datasets.
My bachelor thesis in Cognitive Science. Unfortunately, I am currently not allowed to release the data nor the analysis scripts, because the dataset is still under active research.
Abstract: This thesis evaluates psycholinguistic theories about the cognitive processing of words. Consequently, the timecourse of compound reading is analyzed using generalized additive models in a dataset of eye movements. The theories to be contrasted are sublexical (Taft and Forster, 1975), supralexical (Giraudo and Grainger, 2001) vs. dual route processing (Schreuder and Baayen, 1995) and formthenmeaning (e.g. Rastle and Davis, 2008) vs. formandmeaning (e.g. Feldman et al., 2009) processing.
As the goal is to find the best model given various predictors, some general mechanisms of eye movements will be demonstrated, e.g. the position in the line has substantial effects, single fixations last longer, are on shorter words, more in the center of the word and influenced differently by frequency measures.
Inspired by Kuperman et al. (2009) it is shown that already the early eye fixations on words are guided by first constituent and compound frequency, providing evidence for parallel dual route models.
Similar to Baayen et al. (2013), Latent Semantic Analysis (LSA) similarity scores (Landauer and Dumais, 1997) permit investigating the time point of semantic processing. The effect of LSA similarity not only shows up in the earliest word fixations, but the data reveals that semantics plays a role even before a word is fixated. In particular, the fixation position in the word is more to the right, when the semantic transparency, i.e. the similarity between compound and second constituent is high. This evidence of parafoveal semantic processing challenges opposing findings obtained with the eyecontingent boundary paradigm (Rayner et al., 1986). In the framework of naive discriminative learning (Baayen et al., 2011), the effect of transparency on fixation position reflects optimization of the landing position for accessing the orthographic information that is most discriminative for the compound.
Keywords: reading, eyemovements, compounds, semantic similarity, morphological processing, generalized additive model
Pragmatics is a subfield in linguistics, defined as “dealing with the origins, uses, and effects of signs within the total behavior of the interpreters of signs” (Morris, 1946). Pragmatics tries to explain why a simple sentence like “It’s raining.” has a lot of different interpretations, for example(Franke, 2009):
Herbert Grice introduced certain assumptions (Grice, 1975) that people rely on when making pragmatic inferences in normal circumstances. He formulated the Cooperative Principle: “Make your contribution such as its required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which your engaged.”. Using this principle, he derived four Maxims of Conversation, presented as guidelines:
Grice showed that hearers can systematically interpret utterances and infer additional information that goes beyond the semantic meaning of the uttered sentence, based on the assumption that the speaker obeys the Maxims.
The data for studying the influence of reputation on the maxims stems from the question and answer website Stackoverflow (SO). Users of the site pose programming related ques tions which others try to answer. They are encouraged to vote on the usefulness of a question or an answer, thereby directly affecting others reputation score. Because SO is collaboratively edited website, reputation directly determines the privileges of a user, ranging from voting down and editing to voting on closing or deleting questions and answers. Thus, reputation on SO is among other thing a measure of how much the community trusts a user. Stackoverflow provides monthly data dumps.
In Figure 1 we see that there are a lot of users with low reputation, higher reputation is getting more and more uncommon. The dashed line represents the mean. This can be explained in part by the fact that users start out with a reputation score of one. It looks like the distribution is following a power law, which is strengthened by Figure 2 showing that the distribution is approximately lognormal, when users with reputation scores equal to one are excluded.
In the following we will measure the effect of reputation by focussing on whether a question was closed or left open. This classification task was posted on kaggle.
When investigating the density of reputations given the question was closed or left open we can see that closed questions are posed mainly by users with low reputation (Figure 3). One interpretation is that a user with low reputation belongs to one of two different user categories, whose members have an incentive to choosing low effort. Those are users who have low reputation because they are not trustful and new users who discount the future immensely because they have a single specific question. The inverse argument, that questions posed by users with higher reputation have a lower probability of ending up closed is strengthened using a logistic regression model with reputation being the only predicting variable. This model was estimated using a dataset of 50% closed questions, whereas normally about 6% of questions end up closed. The decision boundary is where the model estimates a 50% probability of a closed question – it lies at a reputation of 491. The result is that reputation is a significant influence and this model alone has an accuracy of 59.44% on test data.
When closing a question a moderator specifies a reason for doing so, namely off topic, not constructive, not a real question, or too localized. Interestingly, there seems to be a relation between the Gricean Maxims and the reasons for closing a question. Questions labeled off topic (not related to programming) and too localized (unlikely to help future visitors) clearly violate the maxim of relevance. Not a real question are those that are ambiguous, vague, incomplete, overly broad, or rhetorical, hence the maxims of manner and quantity are both violated. The maxim of quality is violated by questions labeled not constructive because they are not supported by facts. Rather, it would solicit debate, since there is no true answer.
Figure 4 reveals that reputation influences which maxims are violated. Most questions that are incomplete are posed by low reputation users, while controversial questions are posed by high reputation users. In other words, violations of the maxim of quality are more likely from users with high reputation, whereas the opposite is true for the maxim of quantity and manner. Not shown is that questions that are labeled too localized are in a similar reputation range like not a real question, and off topic questions do not differ much from open questions.
In conclusion, even though we trust high reputation people, they are not precise about truth. This is by no means a bad thing, as long as we take this characteristic into account when interpreting their intent.
]]>Eine Einführung, die als Ausarbeitung für das Seminar “Maschinelles Lernen” an der Universität Tübingen entstanden ist. Die Grundlagen linearer, nichtlinearer, logistischer und Bayes Regression werden behandelt, sowie Verfahren zur Schätzung der Modellparameter aus statistischen Annahmen vorgestellt. Anschließend wird die logistische Regression auf den Titanic Datensatz angewandt und unter anderem gezeigt, dass das Motto “Frauen und Kinder zuerst” bei der Katastrophe zutraf.
Every plot is produced using the open source statistic software R inside the $\LaTeX$ file (Sweave). Code is here.
Charakteristisch für überwachtes maschinelles Lernen, zu der auch die Regression gehört, ist das Beschreiben der Beziehung von Zielvariable und erklärender Variable aus vorliegenden Daten, also Realisierungen von Zufallsvariablen.
Das Regressionsmodell stellt $y$ durch die Summe einer Hypothese von $x$ und einem Fehlerterm $\epsilon$ dar.
$$ y = h(x) + \epsilon $$