Epicenter Software




Products >> Genetrix >> Statistical Methods
Epicenter Software Genetrix

Statistical Methods


Data Input

Probeset modeling
Genetrix incorporates three algorithms for probe set modeling: Probe Profiler (Corimbia, Inc), RMA and a dChip-based approach
Probe Analysis
View data from each of the perfect match (PM) and mis-match (MM) probes for a gene. Plots show the signal level for each probe, and the PM-MM differences. Examine sample-to-sample consistency in probe patterns by cycling through each sample.
Impute missing
Missing values may be imputed using a k-nearest neighbor algorithm
Transformations
Pre-process expression data with standardization, normalization, log transform and/or data permutation. The null distribution of any test statistic can be checked by permuting either samples or genes.

Evaluate a group of samples

Categorical characteristic
Compare a group of samples (individuals) with respect to a dichotomous covariate, such as gender, or a k-group categorization, such as race, in a 2xk table. Test for homogeneity using a chi-square or exact p-value.
Continuous characteristic
Compare a group of samples (individuals) with respect to a continuous covariate, such as age or serum cholesterol with a histogram of values (these samples vs. all others) and test for equality with t-test, Wilcoxon rank sum and logistic regression.
Survival analysis
Compare survival-type outcome (survival, event-free-survival etc.) of a group of patients vs. all other patients, or k-groups of patients defined by a covariate, with a Kaplan-Meier plot and logrank statistics.
Key covariates
Define a set of sample covariates to be used routinely when assessing a list of samples; samples in the list are contrasted to all others, for each key covariate, to provide a single display of key categorical, continuous and survival comparisons.

A key covariate thumbnail can be used to annotate sample subgroups on a variety of graphical displays.
Scatterplots
Scatterplots show expression values for each sample plotted for two (or three) genes against each other or each gene for 2 or 3 samples (or groups of samples), for expression values against a selected covariate, for one principal component against another, or for average expression over a defined group of genes versus another. Plots show lines of best fit and confidence intervals, plus correlation coefficient and p-value.

Correlations between pairs of samples can be summarized for many genes in a similarity matrix.

Evaluate genes

2-group
Compare the distribution of gene expression values (in a histogram) for 2 groups of samples, defined by a covariate. Valuescan be for a single gene or the average over a set of genes. Test significance of differences in distribution with the t-test and Wilcoxon non-parametric test.
k-group
Compare the distribution of gene expression values (in a histogram) for k subgroups of samples, defined by a covariate. Values can be for a single gene or the average over a set of genes. Test significance of differences in distribution with ANOVA and Kruskal-Wallis non-parametric test.

Alternatively display differences in the form of a box plot.
Continuous covariate
Correlate the distribution of gene expression values (in a scatterplot) with values of a covariate (e.g. mitotic index) or with expression of a specified gene. Values can be for a single gene or the average over a set of genes. Determine correlation coefficient and p-value.
Time series
Display gene expression values for an ordered set of observations (e.g. a time series), optionally grouped by a covariate (e.g. treated vs. control). Values can be for a single gene or the average over a set of genes.
Survival analysis
Display a Kaplan-Meier plot comparing patients with high vs. low expression (above vs. below median value) or in tertiles. Assess significance with homogeneity and trend logrank chi-squares. Expression values can be for a single gene or the average over a set of genes.
Key comparisons
Define a set of sample covariates to be used routinely when assessing a list of genes; genes in the list are contrasted to all others, for each key comparison to provide a single display of key categorical, continuous, time series and survival comparisons.

A key comparison thumbnail can be used to annotate single genes or gene subgroups on a variety of graphical displays, including scatterplots, pathways and gene clusters.
Attribute analysis
Attributes are gene classifications in a tree structure, that include: GO codes (Molecular Function, Biological Process and Cellular Component Protein classification, InterPro and SCOP protein classes, Chromosomal band, Pathway membership and User defined gene sets.

Genes in a list are compared to all genes to determine Attributes that are over-represented (with odds ratio and statistical significance), using Chi-square tests or Fisher's exact test.

Attribute analysis can be obtained as a summary table of most significant Attributes, a comprehensive list for all Attributes, or as labeling information on graphical displays.
Scatterplot
Scatterplots show expression values plotted for two (or three) samples against each other, for expression values against a selected covariate, for one principal component against another, or for average expression over a defined group of samples versus another. Plot show lines of best fit and confidence intervals, plus correlation coefficient and p-value.

Correlations between pairs of genes can be summarized for many genes in a similarity matrix.
Chromosome map
Genes mapped to a chromosomal ideogram are grouped by their proximity in the genome, and the properties of these groups can be statistically evaluated and graphically displayed. The statistical evaluations include:

Statistics for a selected gene covariate or for average expression values (mean, median, minimum, maximum, standard deviation, proportion missing); in each instance represented as a value or a test statistic that compares the genes in the region to all other genes.

Results of application of a selected GeneScreen statistical method to the average expression in a region.

Bayesian determination of the probability of loss-of-heterozygosity (LOH) in each region, based on SNP chip data. For each sample and averaged over multiple samples.

Compare statistical properties with biological data such as the characteristics of genes in a region, presence of SNPs, and karyotype information

Screen individual genes

GeneScreen
Screens all genes for significant association with a selected covariate: a dichotomous covariate, such as male/female, responder/non-responder, a k-group covariate, a continuous covariate (e.g. age), or a survival time.

Available statistical methods include matched and unmatched t-tests, Wilcoxon 2-sample tests, unconditional and conditional logistic regression, analysis of variance, Kruskal-Wallis tests, multiple linear regression and Cox regression.
Analysis
Results include (a) a list of the genes most strongly associated with the specified outcome, with the statistic value and p-value, (b) a histogram of test statistics, compared to the theoretical null distribution, (c) a Q-Q plot that shows how the observed quantiles differ from theoretical ones, (d) a scatterplot of the strength of association against its statistical significance, for each gene, and (e) a grphical display of p-value thresholds and selected genes for given false discovery rates.
Selected-gene display
Clicking on a single gene displays a detailed single-gene analysis:

Survival outcome - Cox regression statistics and a survival plot. The distribution of expression values for the gene is also shown, and the user can divide these values into groups to examine the Kaplan-Meier curves for each group.
Members of each curve can be obtained, for more detailed analysis, by clicking on them.

Dichotomous outcome - Test statistics appropriate for 2-group comparison are tabulated, and the distribution of expression levels for this gene is shown for each of the two groups.
The samples that belong in one or more bars of the two histograms can be selected and listed.

Continuous outcome - A scatterplot of the selected gene against the continuous covariate.

k-group outcome - Distributions for each group shown in separate histograms, along with test statistics for homogeneity of the groups.
Permutation distribution
Used to derive a more robust null distribution and more accurate p-value.
Multiple comparison adjustment
Comparison of the observed statistic distribution against expected indicates whether there are more statistically significant associations that might be expected. The false discovery rate can be used to estimate the number of false positive results within a group of significant outcomes, and p-values can be adjusted to reflect the multiple testing.
Multivariate Analyses
For multivariate analytic methods, variables can be added to the analyses as adjusting factors. A variable can be used unchanged, can be automatically converted to a dichotomous value (with a specified cut-point) or can be used to create a series of dummy variables representing each distinct value of the variable.
Matched analyses
For some methods of analysis it is possible to link samples as matched pairs or groups (strata).
Stepwise gene selection
Automatically screen all genes for the one most significantly associated with the outcome, select that gene as an adjusting covariate, and then re-screen all remaining genes to see which is most significant (after adjustment).
Repeat this process until the minimum adjusted p-value exceeds a user-specified threshold.
Predictions/leave-n-out
Screen all but a set of n genes, and use the results to predict the outcome for the n excluded genes. Repeat, leaving a different set of n genes out each time.

Clustering/high-dimensional analysis

Principal components
Find coordinates (vectors) which a capture the most variance in the data, thereby reducing dimensionality and providing significant noise reduction.
PC shaving
Finds the first PC (Principal Component) and attempts to find a representative small group of homogenous elements by progressively shaving off genes that contribute least to the PC. The procedure repeats for additional PCs.



Home | Products | Buy | Support | Contact Us | All contents ©2004-2007 Epicenter Software. All rights reserved. Epicenter Software