
Probeset Modeling: Genetrix incorporates three algorithms for probe set modeling: Probe Profiler (Corimbia, Inc), RMA and a dChip-based approach
Probe Analysis: View data from each of the perfect match (PM) and mis-match (MM) probes for a gene. Plots show the signal level for each probe, and the PM-MM differences. Examine sample-to-sample consistency in probe patterns by cycling through each sample.
Impute Missing: Missing values may be imputed using a k-nearest neighbor algorithm
Transformations: Pre-process expression data with standardization, normalization, log transform and/or data permutation. The null distribution of any test statistic can be checked by permuting either samples or genes.
Categorical Characteristic: Compare a group of samples (individuals) with respect to a dichotomous covariate, such as gender, or a k-group categorization, such as race, in a 2xk table. Test for homogeneity using a chi-square or exact p-value.
Continuous Characteristic: Compare a group of samples (individuals) with respect to a continuous covariate, such as age or serum cholesterol with a histogram of values (these samples vs. all others) and test for equality with t-test, Wilcoxon rank sum and logistic regression.
Survival Analysis: Compare survival-type outcome (survival, event-free-survival etc.) of a group of patients vs. all other patients, or k-groups of patients defined by a covariate, with a Kaplan-Meier plot and logrank statistics.
Key Covariates: Define a set of sample covariates to be used routinely when assessing a list of samples; samples in the list are contrasted to all others, for each key covariate, to provide a single display of key categorical, continuous and survival comparisons.
A key covariate thumbnail can be used to annotate sample subgroups on a variety of graphical displays.
Scatterplots: Scatterplots show expression values for each sample plotted for two (or three) genes against each other or each gene for 2 or 3 samples (or groups of samples), for expression values against a selected covariate, for one principal component against another, or for average expression over a defined group of genes versus another. Plots show lines of best fit and confidence intervals, plus correlation coefficient and p-value.
Correlations between pairs of samples can be summarized for many genes in a similarity matrix.
2-Group: Compare the distribution of gene expression values (in a histogram) for 2 groups of samples, defined by a covariate. Valuescan be for a single gene or the average over a set of genes. Test significance of differences in distribution with the t-test and Wilcoxon non-parametric test.
k-Group: Compare the distribution of gene expression values (in a histogram) for k subgroups of samples, defined by a covariate. Values can be for a single gene or the average over a set of genes. Test significance of differences in distribution with ANOVA and Kruskal-Wallis non-parametric test. Alternatively display differences in the form of a box plot.
Continuous Covariate: Correlate the distribution of gene expression values (in a scatterplot) with values of a covariate (e.g. mitotic index) or with expression of a specified gene. Values can be for a single gene or the average over a set of genes. Determine correlation coefficient and p-value.
Time Series: Display gene expression values for an ordered set of observations (e.g. a time series), optionally grouped by a covariate (e.g. treated vs. control). Values can be for a single gene or the average over a set of genes.
Survival Analysis: Display a Kaplan-Meier plot comparing patients with high vs. low expression (above vs. below median value) or in tertiles. Assess significance with homogeneity and trend logrank chi-squares. Expression values can be for a single gene or the average over a set of genes.
Key Comparisons:
Attribute Analysis:
Scatterplot:
Chromosome Map: Genes mapped to a chromosomal ideogram are grouped by their proximity in the genome, and the properties of these groups can be statistically evaluated and graphically displayed. The statistical evaluations include:
GeneScreen:
Analysis: Results include (a) a list of the genes most strongly associated with the specified outcome, with the statistic value and p-value, (b) a histogram of test statistics, compared to the theoretical null distribution, (c) a Q-Q plot that shows how the observed quantiles differ from theoretical ones, (d) a scatterplot of the strength of association against its statistical significance, for each gene, and (e) a grphical display of p-value thresholds and selected genes for given false discovery rates.
Selected-Gene Display: Clicking on a single gene displays a detailed single-gene analysis:
Permutation Distribution: Used to derive a more robust null distribution and more accurate p-value.
Multiple Comparison Adjustment: Comparison of the observed statistic distribution against expected indicates whether there are more statistically significant associations that might be expected. The false discovery rate can be used to estimate the number of false positive results within a group of significant outcomes, and p-values can be adjusted to reflect the multiple testing.
Multivariate Analyses: For multivariate analytic methods, variables can be added to the analyses as adjusting factors. A variable can be used unchanged, can be automatically converted to a dichotomous value (with a specified cut-point) or can be used to create a series of dummy variables representing each distinct value of the variable.
Matched Analyses: For some methods of analysis it is possible to link samples as matched pairs or groups (strata).
Stepwise Gene Selection: Automatically screen all genes for the one most significantly associated with the outcome, select that gene as an adjusting covariate, and then re-screen all remaining genes to see which is most significant (after adjustment).
Repeat this process until the minimum adjusted p-value exceeds a user-specified threshold.
Predictions Leave-n-Out: Screen all but a set of n genes, and use the results to predict the outcome for the n excluded genes. Repeat, leaving a different set of n genes out each time.
Principal Components: Find coordinates (vectors) which a capture the most variance in the data, thereby reducing dimensionality and providing significant noise reduction.
Principal Component (PC) Shaving: Finds the first PC (Principal Component) and attempts to find a representative small group of homogenous elements by progressively shaving off genes that contribute least to the PC. The procedure repeats for additional PCs.