papers

These are my notes from research papers I read. Each page’s title is also a link to the abstract or PDF.

Learning neural causal models from unknown interventions

This is a follow-on to A meta-transfer objective for learning to disentangle causal mechanisms Here we describe an algorithm for predicting the causal graph structure of a set of visible random variables, each possibly causally dependent on any of the other variables. the algorithm There are two sets of parameters, the structural parameters and the functional parameters. The structural parameters compose a matrix where $$\sigma(\gamma_{ij})$$ represents the belief that variable $$X_j$$ is a direct cause of $$X_i$$.

A meta-transfer objective for learning to disentangle causal mechanisms

Theoretically, models should be able to predict on out-of-distribution data if their understanding of causal relationships is correct. The toy problem they use in this paper is that of predicting temperature from altitude. If a model is trained on data from Switzerland, the model should ideally be able to correctly predict on data from the Netherlands, even though it hasn’t seen elevations that low before. The main contribution of this paper is that they’ve found that models tend to transfer faster to a new distribution when they learn the correct causal relationships, and when those relationships are sparsely represented, meaning they are represented by relatively few nodes in the network.

Deep learning generalizes because the parameter-function map is biased towards simple functions

The theoretical value in talking about the parameter-function map is that this map lets us talk about sets of parameters that produce the same function. In this paper they used some recently proven stuff from algorithmic information theory (AIT) to show that for neural networks the parameter-function map is biased toward functions with low Komolgorov complexity, meaning that simple functions are more likely to appear given random choice of parameters. Since real world problems are also biased toward simple functions, this could explain the generalization/memorization results found by Zhang et al.