WaveNorm
Introduction
This program implements loess normalization for the correction of wave like correlations in signal intensities across the genome.
The method is described in the paper:
Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.
Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET, Carter NP, Tavaré S, Hurles ME.
Genome Biology, 2007;8(10):R228 pubmed
The largest part of this application is comprised of the loess normalisation code of Cleveland, Grosse, and Shyu obtained from:
http://netlib.bell-labs.com/netlib/a/dloess.gz
Installation
1) To check out the package use anonymous CVS access:
cvs -z3 -d:pserver:anonymous@cnv-tools.cvs.sourceforge.net:/cvsroot/cnv-tools co -P WaveNorm
2) If your system has BLAS installed change into the WaveNorm directory and type make
cd WaveNorm
make
This should build the application WaveNorm.
If your system does not have BLAS installed then you will need to download the BLAS libraries from
here and install them on your system. You will then need to edit the Makefile accordingly to point the BLAS variable to the correct directory.
Usage
Invoking WaveNorm without any arguments, or with the -h flag should give you a list of options. Eg
./WaveNorm -h
To run on the test data set, which consists of one individuals LRR values from chromosome 21q on the Illumina 370 platform, do:
./WaveNorm -f test/test_21q.dat -c 21 -r 3 -w 2000000 -l 50 -t 1.0 > results.dat
Options
-
-f [input file name]
Tab delimited file with format: chr coord ratios(1) ratios(2) ratios(3)...
Any number of ratio columns are permitted. NaN values should be removed prior to running through WaveNorm.
-
-c [chromosome name]
Can be anything that matches the chr column in the input file.
-
-s [start coord]
-e [end coord]
Not implemented.
-
-r [ratio value column number]
Column of input file containing ratio values.
-
-p [span value]
The proportion of the sequence to be contained in each window during fitting. Ideally, should be larger than most expected CNVs.
-
-w [window size,in bp]
Use instead of the -p option. Specify a window size in bp and the span will be calculated automatically from number of data points and coordinate range. Recommended value is 2000000 = 2Mb.
-
-l [end padding values]
Number of simulated datapoints by which to extend chromosome ends.
-
-d [dynamic end padding]
Use instead of the -l option. Sets length of end padding to be proportional to the number of points in dataset eg 0.01 for 1% padding at each end.
-
-t [outlier cutoff]
Values above this threshold will not be used in the fit ie the weight is set to zero.
-
-h
Displays help message.
Advanced usage
The loess tools within WaveNorm can be used inside your own C++ code. You need to include the header file WaveNorm.hpp and build against libWaveNorm.a. Look at the header file WaveNorm.hpp for the functions available and WaveNormApp.cpp for how to call them.