|
On this page I will share
some of the results and tools I have developed for the analysis of gene
expression data. I have developed the software Statistical Analysis of the GeneChip, which is an R
package. Note that the
package has moved from CRAN to Bioconductor.
The latest Bioconducor release of SAGx is here.
For those who want to take advantage of the latest changes the version in development
branch is recommended. Any comments are
appreciated; just send an e-mail to me.
The ideas behind the samroc function in SAGx are explained in this document. A simulation script for testing statistical methods for
identifying differentially expressed genes is also provided. Additionally,
the function pava.fdr,
which calculates an estimate of FDR using isotonic regression, is explained
in this
article, see also the deposited
manuscript. Comments on Statistical
methods for ranking differentially expressed genes In the article a goodness
criterion C for a ranking is introduced, see the reference. This criterion
was chosen because it is increasing in the false positive and false negative
rates, and, based on a small set of simulations, it
turned out to be easier to estimate than e.g. the sum of the false positive
and false negative rates. However, the above-mentioned software outputs this
sum. This makes it possible for the analyst to choose the size of the top
list such that this sum reaches its minimum. Rank the genes with respect to the p-value
and then choose a cut-off where ‘error’ reaches its minimum.
Typically, the sum decreases initially as one goes down the ranked gene list,
and then starts to grow until it reaches p0. Comments on A
comparative review of estimates of the proportion unchanged genes and the
false discovery rate The Averaging Theorem
implies that any estimate of LFDR implicitly
defines an estimate of FDR. Therefore I calculated SEP.FDR
by integration (or summing as it were) of the SEP LFDR
estimate. Note that the function twilight
presents an FDR estimate based on qvalue. However,
I present SEP.FDR not qvalue
under the SEP heading, since I want to keep issues separate so one can reach
conclusions regarding the ideas. To retrieve my PubMed entries from the last ten years click here.
|