Finding Coexpressed Genes in Counts-Based Data: An Improved Measure with Validation Experiments

Abstract

Motivation: EST data reflects variation in gene expression, but previous methods for finding coexpressed genes in EST data are subject to bias and vastly overstate the statistical significance of putatively coexpressed genes. Results: We introduce a new method (LNP) that reports reasonable $p$-values and also detects more biological relationships in human dbEST than do previous methods. In simulations with human dbEST library sizes, previous methods report $p$-values as low as $10^{-30}$ on 1/1,000 uncorrelated pairs, while LNP reports significance correctly. We validate the analysis on real human genes by comparing coexpressed pairs to GO annotations and find that LNP is more sensitive than three previous methods. We also find a small but statistically significant level of coexpression between interacting proteins relative to randomized controls. The LNP method is based on a log-normal prior on the distribution of expression levels. Availability: Source code in Java or R is available at http://ests.sourceforge.net/