# Bitcoin Seeder DoS Vulnerability

The DNS parser of the Bitcoin Seeder was vulnerable to a denial of service attack. A specially crafted DNS request could trigger infinite recursive function calls that lead to a stack overflow. See the exploit. The vulnerability was fixed in commit 11e935b.

# Guessing Bitcoin’s P2P Connections

The paper Deanonymisation of clients in Bitcoin P2P network (2014) by Biryukov, Khovratovich and Pustogarov (BKP), who describe an attack on Bitcoin Core clients, has started some discussion lately. The main idea of the paper is to first get a set of nodes $E_v$ to which your victim $v$ is directly connected to (“entry nodes”). Second, for each transaction $t$ record the $10$ nodes $P_t$ which first propagate $t$. The authors experimentally show that if $|E_v \cap P_t|$ is bigger than $3$ then there is a pretty high chance that $t$ actually originated from $v$. However, both attack stages basically require a Sybil attack – the attacker has to be connected to a lot of nodes a lot of times. ‘A lot’ means that in their experiments they had 50 connection to each full node (~250) in the test network. As a result, such an attack seems to be powerful, but certainly won’t be undetected.

In this post I show that the first stage of the attack, namely learning the nodes a victim is directly connected to can be done with a single connection to the victim. In addition to BKP’s attack, knowing all outbound peers of a client could significantly increase the success probability of a double spend. Note that all experiments are based on Bitcoin Core 0.9.4, but 0.10.0 shows the same behavior.

TLDR The attacker can reliably guess all of the outbound connections of a victim by making a selection from the known addresses of a victim based on the timestamp of the addresses.

Update A fix has been merged to bitcoind. The timestamp is not updated anymore when receiving a message from a connected peer. Instead, it is only updated when the peer disconnects. The fix is released in bitcoin core 0.10.1.

# Privacy in BitcoinJ

As part of my epic quest to apply supervised machine learning to the blockchain in order to discover transaction patterns, I reviewed various wallet implementations in the hope of finding privacy leaks.

tl;dr If you are using a wallet that is built upon BitcoinJ, such as Android Wallet, Multibit and Hive Wallet, you have almost zero wire privacy. An attacker who manages to connect to your wallet is easily able to figure out all addresses you control. This is not very likely to get fixed in the near future.

# FCW Kernel

A Feature Coinciding Walk Kernel classifies nodes in a graph.

This project is a python port of Coinciding Walk Kernels (CWK) [1] and introduces an extension of the model called Feature-CWK (FCWK). If you want to jump right into some code see the benchmark.

CW-Kernels deal with the problem of node classification (aka link-based classification) in which a set of features and labels for items are given just as in regular classification. In addition, a node classification algorithm accepts a graph of of items and item-item links. It has been shown that the additional information that is inherent in the network structure improves performance for certain algorithms and datasets.

### Early Influences of Compound Frequency and Semantic Transparency

My bachelor thesis in Cognitive Science. Unfortunately, I am currently not allowed to release the data nor the analysis scripts, because the dataset is still under active research.

Abstract: This thesis evaluates psycholinguistic theories about the cognitive processing of words. Consequently, the time-course of compound reading is analyzed using generalized additive models in a dataset of eye movements. The theories to be contrasted are sublexical (Taft and Forster, 1975), supralexical (Giraudo and Grainger, 2001) vs. dual route processing (Schreuder and Baayen, 1995) and form-then-meaning (e.g. Rastle and Davis, 2008) vs. form-and-meaning (e.g. Feldman et al., 2009) processing.

As the goal is to find the best model given various predictors, some general mechanisms of eye movements will be demonstrated, e.g. the position in the line has substantial effects, single fixations last longer, are on shorter words, more in the center of the word and influenced differently by frequency measures.

Inspired by Kuperman et al. (2009) it is shown that already the early eye fixations on words are guided by first constituent and compound frequency, providing evidence for parallel dual route models.

Similar to Baayen et al. (2013), Latent Semantic Analysis (LSA) similarity scores (Landauer and Dumais, 1997) permit investigating the time point of semantic processing. The effect of LSA similarity not only shows up in the earliest word fixations, but the data reveals that semantics plays a role even before a word is fixated. In particular, the fixation position in the word is more to the right, when the semantic transparency, i.e. the similarity between compound and second constituent is high. This evidence of parafoveal semantic processing challenges opposing findings obtained with the eye-contingent boundary paradigm (Rayner et al., 1986). In the framework of naive discriminative learning (Baayen et al., 2011), the effect of transparency on fixation position reflects optimization of the landing position for accessing the orthographic information that is most discriminative for the compound.

# Influence of Reputation on Gricean Maxims

As part of the course “Game Theory and Pragmatics” by Jason Quinley at the Institute of Linguistics, I wanted to explore the influence of reputation on obeying the Gricean Maxims using data from the Q&A website Stackoverflow.

Pragmatics is a subfield in linguistics, defined as “dealing with the origins, uses, and effects of signs within the total behavior of the interpreters of signs” (Morris, 1946). Pragmatics tries to explain why a simple sentence like “It’s raining.” has a lot of different interpretations, for example(Franke, 2009):

• The speaker advises we should take an umbrella.
• The speaker declares the picnic cancelled.
• The speaker is sick of living in amsterdam.

# Regressionsmodelle Und Parameterschätzverfahren

Eine Einführung, die als Ausarbeitung für das Seminar “Maschinelles Lernen” an der Universität Tübingen entstanden ist. Die Grundlagen linearer, nichtlinearer, logistischer und Bayes Regression werden behandelt, sowie Verfahren zur Schätzung der Modellparameter aus statistischen Annahmen vorgestellt. Anschließend wird die logistische Regression auf den Titanic Datensatz angewandt und unter anderem gezeigt, dass das Motto “Frauen und Kinder zuerst” bei der Katastrophe zutraf.

Every plot is produced using the open source statistic software R inside the $\LaTeX$ file (Sweave). Code is here.

Charakteristisch für überwachtes maschinelles Lernen, zu der auch die Regression gehört, ist das Beschreiben der Beziehung von Zielvariable und erklärender Variable aus vorliegenden Daten, also Realisierungen von Zufallsvariablen.

Das Regressionsmodell stellt $y$ durch die Summe einer Hypothese von $x$ und einem Fehlerterm $\epsilon$ dar.

$$y = h(x) + \epsilon$$

PDF weiterlesen

# Visualizing Receptive Fields

The following article will visualize some mathematical models for brain cell activity. In some regions of the brain, neurons are excited or inhibited by neurons of a preceding input layer. They are called receptive field of that neuron. Since the visual area uses receptive fields as feature detectors (such as edge and edge-orientation detection) for natural images, the application of different receptive field functions on images can be nicely examined. The ipython notebook file to play with the parameters can be found on GitHub.

In [2]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import scipy.signal as signal
import numpy as n


We examine the effect on the following images. In the visual pathway the images can be seen as input from the retina to higher visual areas.

# A* for MarioAI

Check out my A* Implementation. Your nodes can be easily added by inheriting the ASearchNode class. The purpose of the A* is to handle WorldNodes in the interface of the Mario AI Competion. The aim of the competition is to develop an artificial agent in the game Infinite Mario Bros who completes a level fastest. Like in Robin Baumgartens competition submission the WorldNodes are generated using a simulation and then searched with the A*. Our current goal at the Chair of Cognitive Modeling Tübingen is to connect the the planning with a rudimentary brain module which is capable of learning object interactions. Therefore the agent will wander around to collect knowledge about his environment and then exploit it according to its motivation.

Update Some of my successors enhanced Mario’s capabilities and submitted a great video demonstration that went viral to the AAAI Video Competition 2015.