Normtools

Written by Chris Barnes, updates by Sam Robson

Introduction

Collection of C++ normalization and preprocessing tools. Most of the tools have been designed with CNV analyses in mind however some may be useful in more general applications. The emphasis has been placed on flexibility and the programs have been designed to mimic unix tool behaviour taking input from stdin and printing to stdout which allows efficient piping of the data.

Installation

1) To check out the package use anonymous CVS access:

cvs -z3 -d:pserver:anonymous@cnv-tools.cvs.sourceforge.net:/cvsroot/cnv-tools co -P Normtools

2) Change into the Normtools directory and type make

cd Normtools
make

This should build the applications univariate_quantile_norm, medianIQR, make_matrix, population_medianIQR.

3) Run the tests to make sure everything is working

make test

You should see "Test completed successfully". Look in the script test/test.sh for examples on data formats and how to run the programs.

univariate_quantile_norm

Performs quantile normalization on a univariate signal. Quantile normalization is described in this paper:

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bolstad BM, Irizarry RA, Astrand M, Speed TP.
Bioinformatics. 2003 Jan 22;19(2):185-93 pubmed

Input

Quantile normalisation is essentially a two step process. First the target distribution is generated then each sample is normalized to the target.
In generation mode (-gen see below) the input via stdin is a list of full directory locations of data files containing data distributions for estimation of target distribution
. The data files are assumed to contain tab delimited columns of data. In correction mode the input via stdin is the data file containing the data distribution to normalize to target distribution.

Output

In generation mode the target distribution will be written to -targetfile [file].
In correction mode the corrected distribution (along with any columns specified by -ext_columns[n1 n2 n3]) will be output to stdout.

Options


medianIQR

Performs median and/or interquartile normalisation within samples.

Options


make_matrix

Combines data from individuals to make matrices of intensities.

Options


population_medianIQR

Applies a cross population median and/or interquartile normalization on matrices of intensities.

Options