Project Description

Computational biology tools from Microsoft Research.

The Tools

Different tools have difference licenses. Check each tool's license individually.

  • FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for efficiently performing association studies, prediction, and heritability estimation on large data sets. FaST-LMM runs on both Windows and Linux, and has been tested on data sets with over one million samples.

    Software versions:

    FaST-LMM (python): This version is our most up-to-date release and available on GitHub.

    FaST-LMM (C++): This version supports univariate GWAS and epistatic tests. The release includes Windows binary, Linux binary, and source.

    FaST-LMM-EWASher: This version support corrections for cellular heterogeneity in methylation and similar data. The release includes a python version and R version.

  • eLMM
    • eLMM (Eliminate Confouding in eQTL studies with Linear Mixed Models) is a program for performing eQTL analysis in the presence of two confounders: (1) population structure, (2) expression heterogeneity.
  • PhyloD
    • This project has moved to GitHub
    • A web server exposing this tool is available at
    • Pathogens live and reproduce inside the human host, whose immune system continually tries to rid the body of these pathogens. This leads to a tug-of-war between the pathogen and the human host, where the pathogen tries to adapt so as to “escape” the immune system, while the immune system learns to recognize and eliminate new foreign pathogens. A set of key players for the immune system are the HLA proteins, each of which can recognize specific short fragments of foreign (e.g. HIV) proteins, called epitopes, in infected cells and then alert the immune system to their presence. For rapidly evolving pathogens like HIV, a key defense mechanism is to evolve mutations that prevent the HLA proteins from recognizing the viral DNA. This evolution takes place anew in each patient, as each patient has a different set of HLA proteins that recognize different epitopes. PhyloD is a statistical tool that can identify HIV mutations that defeat the function of the HLA proteins in certain patients, thereby allowing the virus to escape elimination by the immune system. By applying this tool to large studies of infected patients, researchers are now able to start decoding the complex rules that govern the HIV mutations, in the hope of one day creating a vaccine to which the virus is unable to develop resistance.
  • Epitope Prediction
    • This tool computes the probability that a given kmer is a T-cell epitope restricted to a given HLA allele. The tool can scan for 8, 9, 10, and 11mer epitopes and over all common HLA alleles.
  • HLA Completion
    • A web server exposing this tool is available at
    • HLA sequence typing sometimes yields uncertain results. For example, an allele may be identified as A6801/6802 or simply A02. This tool takes as input HLA typing data (loci A,B,C) and probabilistically resolves the typing ambiguities (i.e., probabilistically “completes” the data to 4-digit resolution).
  • HLA Assignment
    • One way to find epitopes is to do lab studies such as ELISPOT. One problem with this approach is that, if you see a reaction in a patient, you don’t know which of the patient’s HLA genes is responsible for the reaction. This tool takes lab data from a series of patients and determines (probabilistically) which HLA genes are responsible for the reaction.
  • Create Epitome
    • This tool takes, as input, a weighted list of amino acid sequences. It creates epitomes of all lengths.
  • False Discovery Rate
    • Estimate the false discovery rate for 2X2 contingency tables, based on Fisher's statistics.

Web Versions of the Tools

Pre-Compiled Programs for Windows

General Practical Information


Last edited Aug 8 at 4:49 PM by heckerma, version 57