False Discovery Rate for 2X2 Contingency Tables

This tool estimates the False Discovery Rate (FDR) for 2X2 Contingency Tables, based on the Fisher statistics.

Overview

When testing a large number of hypotheses, it can be helpful to
estimate or control the false discovery rate (FDR), the expected
proportion of tests called significant that are truly null. We use Fisher's exact test to calculate the exact null distribution over a set of 2X2 contingency tables.
Using these statistics we compute pooled pvalues, estimate the ratio of true nulls, and compute qvalues.

Application

We provide two versions of the application; A command line interface (FalseDiscoveryRate.exe) and a graphic user interface (FalseDiscoveryRateUI.exe). For a full description of the available options, please refer to the techincal report.

The graphic user interface supports the following options:
  • Input file - a tab delimited input file containing the description of all the tables. Each line in this file must contain one table, except for the first line that may cotnain column headers.
  • Output file - the results of the computation will be saved as a tab delimited text file.
  • Data properties
    • First row contains column headers - use this option to ignore the first line of the input file.
    • Table counts start at column - use this to ignore a number of columns before the table counts, such as the description of the table.
  • Operation properties
    • Filter irrelvant tables - use this option to ignore tables with marginals that cannot possibly achieve a significant p-value.
    • PI evaluation method - the method used to evaluate the ratio of true nulls in the test set. We support the following options:
  • PI = 1 - do not evaluate PI
  • Sum observed p-values / expected - estimate PI using the ratio between the number of observed p-values among the test set and the expected number of test with the specific p-value under the null hypothesis. When filtering is selected - this is the only valid option.
  • 2 X avg(p-value) - estiamte PI using twice the average p-value.
    • Compute Positive FDR (pFDR) - use this option to compute the positive false discovery rate.
    • Use sampling - use this option when the set of table is too large. The computation will be executed only over a sample of the input tables, and then the mapping between p-value and q-value will be applied to all tables. You can either specify the sample size, or select Automated sampling, which will start with a small sample and grow the sample size until no significant difference in the p-value to q-value mapping has been detected.
    • Huge dataset mode - use this mode to aggregate tables with identical counts. This reduces memory consumption at the cost of slower computation.
  • Output properties
    • Report progress while running - when this option is selected, a dialog reporting the progress of the computation will be displayed.
    • Ouptut all the computed statistics - when this option is selected, all the computed statistics will be saved to the output file. Otherwise, only the input data and the q-values will be written to the output file.
    • Output only tables with q-value less than - use this to restrict the output only to tables with q-value below the specified q-value.

The command line application supports similar options. Run the program without any parameters to see the avialble options

References

Jonathan Carlson and David Heckerman and Guy Shani. False Discovery Rate for 2X2 Contingency Tables. Techincal report, Microsoft Research, 2009.

Web Version

http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/

Pre-Compiled Programs for Windows

Source Code Files

(To download see Source Code tab above)
  • Solution file: FalseDiscoveryRate\FalseDiscoveryRate.sln
  • Build directory: FalseDiscoveryRate\bin\Release

Compiling the project requires
  • Visual Studio 2008 SP1

Last edited May 21, 2009 at 10:23 PM by guyshani, version 6

Comments

No comments yet.