TY - JOUR
T1 - Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery
AU - Statnikov, Alexander
AU - Ma, Sisi
AU - Henaff, Mikael
AU - Lytkin, Nikita
AU - Efstathiadis, Efstratios
AU - Peskin, Eric R.
AU - Aliferis, Constantin F.
PY - 2015/12
Y1 - 2015/12
N2 - Discovery of causal relations from data is a fundamental objective of several scientific disciplines. Most causal discovery algorithms that use observational data can infer causality only up to a statistical equivalency class, thus leaving many causal relations undetermined. In general, complete identification of causal relations requires experimentation to augment discoveries from observational data. This has led to the recent development of several methods for active learning of causal networks that utilize both observational and experimental data in order to discover causal networks. In this work, we focus on the problem of discovering local causal pathways that contain only direct causes and direct effects of the target variable of interest and propose new discovery methods that aim to minimize the number of required experiments, relax common sufficient discovery assumptions in order to increase discovery accuracy, and scale to high-dimensional data with thousands of variables. We conduct a comprehensive evaluation of new and existing methods with data of dimensionality up to 1,000,000 variables. We use both artificially simulated networks and in-silico gene transcriptional networks that model the characteristics of real gene expression data.
AB - Discovery of causal relations from data is a fundamental objective of several scientific disciplines. Most causal discovery algorithms that use observational data can infer causality only up to a statistical equivalency class, thus leaving many causal relations undetermined. In general, complete identification of causal relations requires experimentation to augment discoveries from observational data. This has led to the recent development of several methods for active learning of causal networks that utilize both observational and experimental data in order to discover causal networks. In this work, we focus on the problem of discovering local causal pathways that contain only direct causes and direct effects of the target variable of interest and propose new discovery methods that aim to minimize the number of required experiments, relax common sufficient discovery assumptions in order to increase discovery accuracy, and scale to high-dimensional data with thousands of variables. We conduct a comprehensive evaluation of new and existing methods with data of dimensionality up to 1,000,000 variables. We use both artificially simulated networks and in-silico gene transcriptional networks that model the characteristics of real gene expression data.
KW - Causality
KW - Experimental data
KW - Large-scale experimental design
KW - Local causal pathway discovery
KW - Observational data
KW - Randomized experiments
UR - http://www.scopus.com/inward/record.url?scp=84960517903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960517903&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84960517903
SN - 1532-4435
VL - 16
SP - 3219
EP - 3267
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -