Entry Date:
March 23, 2004

Gene and Protein Microarray Analysis

Principal Investigator Roy Welsch


Today's high-throughput genomic technologies generate thousands of data points on gene expression levels that require hundreds of simultaneous comparisons. The quantity and nature of microarray data presents interesting challenges for statistical analysis. For example, much of the existing statistical theory relating to multiple comparisons assumes that the data points are independent, but microarray analysis may have underlying dependence structures that must be considered in order to maximize the power of multiple comparison procedures. Another data analysis challenge is the development of new methods to manage measurement errors, which are inherent in individual data points and can be compounded many times over in large-scale comparisons. In addition, many microarray experiments entail a larger number of inputs (genes) than samples, which means that the analytical methods must place some constraints on model complexity. "Penalty" methods drawn from other data-mining applications such as classification may be useful for model construction in genomic analysis. If this approach is successful, then penalty methods should be considered as an integral part of the initial experimental design so that the data are amenable to further analysis.