Jerry Halpern
(650)723-5705
funn@stanford.edu
July 23, 2004
COVTDT
This program implements analyses described in Whittemore AS, Halpern J,
and Ahsan H. 2004. The documentation for COVTDT is this note and the
Whittemore, Halpern, and Ahsan (WHA) paper. It and the COVTDT program
are available at http://www.stanford.edu/dept/HRP/epidemiology/COVTDT.
NOTE:NOTE:NOTE:
Using the notation of WHA, the program COVTDT assumes that in
equation(4) of WHA, nu_10 = nu_20 = 0. The user must specify all p
covariates as columns in the file named wcovars.dat (see below). For
most applications, an additional column of ones is needed. If this
column is omitted, the results are obtained under the assumption that
Mendelian inheritance holds at the baseline(zero) levels of the
covariates.
=================================================================
INSTALLING COVTDT
=================================================================
COVTDT is written in C. Despite the "exe" suffix, covTDTzip.exe is
actually a "zip" archive. It is named covTDTzip.exe to avoid the tendency
of some browser configurations to process a file named, ".zip"
with the zip software rather than to just download it without
alteration when the browser is requested to do so.
0)The files in covTDTzip.exe are:
covTDT_README.htm
covTDT_README.htm is this "README" file.
2aux.c
2include.h
covTDT.c
2include.h 2aux.c and covTDT.c is the source code for covTDT. The
main program is in covTDT.c .
covTDT.sol
covTDT.win
covTDT.sol is an executable for Solaris sparc machines.
covTDT.win is an executable for Windows2000 machines. They will
likely run on any Windows machine NT or later.
makefile.sol
makefile.win
These are the "makefiles" for solaris and for windows that I used.
They are used by the "make" command to process the source code using
facilities that are present on my machines and output the executables.
wcovars.dat
wpedigree.dat
wweights.dat
These files are example input files for covTDT.
1)Download covTDTzip.exe from
http://www.stanford.edu/dept/HRP/epidemiology/COVTDT and rename it
covTDT.zip after it is downloaded.
2)If you are using WINDOWS, then open a Command Window and move to
a directory, call it "dirTDT" in which you want to work with COVTDT.
3)Move to a directory, call it "dirTDT" in which you want to work with
COVTDT.
4)move covTDT.zip to "dirTDT" and unzip the covTDT.zip file.
a)On a windows machine "move covTDT.win covTDT.exe".
This executable was compiled using gcc 3.3.1 under cygwin with the
"-mno-cygwin" option (and with the mingw gcc libraries also
installed) on a Dell machine running WINDOWS2000.
b)On a Unix machine "mv covTDT.sol covTDT". This executable
was compiled using gcc3.2 on a SUN sunblade1000 running
Solaris 2.8.
5)After you have created in the current directory the files
wpedigree.dat, wcovars.dat (if you have one or more covariates), and
wweights.dat which are described below, you can execute COVTDT with the
command: "covTDT"
=================================================================
USER RESPONSES FOR COVTDT
=================================================================
1) "Enter the number of covariates(Including the
Intercept if there is one)-->"
Enter the number of covariates. If there are no
covariates the user may enter 0. The analysis done is then
almost the same as that done by other versions of the
TDT without covariates such as FBAT (Rabinowitz & Laird), FGAP
Whittemore & Halpern (2003), and TRANSMIT (Clayton 1999).
2)"Enter a model_type number for effect of genotype on status
Recessive(1), Dominant(2), or Additive(3)-->"
Enter 1, 2, or 3.
3)"Enter a model_type number for effect of genotype on covariates.
For further explanation of this, see the documentation.
None(0), Recessive(1), Dominant(2), or Additive(3)-->"
Enter 0, 1, 2, or 3.
0 corresponds to putting no constraints on the parameters nu_i.
1 corresponds to the constraints nu_1 = 0, nu_2 arbitrary
2 corresponds to the constraints nu_1 = nu_2 arbitrary
3 corresponds to the constraints nu_1 = .5 * nu_2 arbitrary
An examination of the model shows that the imposition of these
constraints give rise to relationships between genotypes and covariates
that are analogous to those termed "Recessive", "Dominant", and
"Additive" when applied to relationships between genotypes and
disease status.
to
=================================================================
THE INPUT FILES
=================================================================
A)wpedigree.dat
COVTDT always requires that a file, named "wpedigree.dat", be located
in the directory from which COVTDT is executed.
wpedigree.dat is a file with a constrained version of a format
which has been used with many programs, LINKAGE
and GENEHUNTER among others, for calculating statistics relevant
to "linkage" and/or "association" between "markers" and genes which
may be partly responsible for a phenotype of interest (for example,
breast cancer).
a)Only nuclear families, ones consisting of a one father, one mother, and
and their children may be represented in pedigree.dat.
b)The MAX number of families must be <= 6200.
c)The MAX number of individuals in a family must be <= 10.
d)Each individual is represented by a line in wpedigree.dat.
i)The lines of all individuals in a family are consecutive lines of the
file.
ii)The first line in the group of lines for the family is for the father.
iii)The second line in the group of lines for the family is for the mother.
iv)The other lines in the group of lines for the family are for children.
Each line of wpedigree.dat has nine items:
1)A Family ID: An integer which uniquely identifies the family.
2)A Within Family Individual ID: An integer consecutive from 1 for
the father line to N for the line for the last child in
a family with N members.
3)Father's ID: 1 (The family identifier of the father) for a child line
0 for a parent line.
4)Mother's ID: 2 (The family identifier of the mother) for a child line
0 for a parent line.
5)Sex: 1 If the line is for a male.
2 If the line is for a female.
6)A Status Number. This will be used only when user, on the COVTDT
command line specifies a dichotomous phenotype.
It is: 1 or 2 Depending on which of the
two states of the phenotype
was noted in the individual.
0 If the Individual's phenotype
is unknown.
In any case it is must be coded as 1, 2, or 0.
Commonly 1 is taken to mean that a phenotype possibly
partly caused by an uncommon allele (commonly coded as
markerAlleleID = 2) at a location of interest
was not observed in the individual, and 2 is taken to
mean that it was observed.
7)The "Liability Class": 1 This is strictly
a placeholder which is included only to maintain the
same form as the LINKAGE input file as it usually
appears, it is not used by COVTDT in any way.
8)The Allele ID: 1 If the marker allele from one haplotype is the
first of the two possible alleles.
2 If it is the other one.
0 If unknown and a parent. It is required in
the current version of COVTDT that each Allele of
a child's marker be known.
9)The Other Allele ID: 1 If the marker allele from the other
haplotype is the
first of the two possible alleles.
2 If it is the other one.
0 If unknown and a parent. It is required in
the current version of COVTDT that each Allele of
a child's marker be known.
EXAMPLE OF wpedigree.dat:
1 1 0 0 1 0 1 0 0
1 2 0 0 2 1 1 0 0
1 3 1 2 1 1 1 1 1
1 4 1 2 1 2 1 1 1
2 1 0 0 1 0 1 0 0
2 2 0 0 2 1 1 0 0
2 3 1 2 1 2 1 1 1
2 4 1 2 1 1 1 1 1
2 5 1 2 1 1 1 1 1
B)wweights.dat
wweights.dat must have one line for each line in wpedigree.dat, and
the lines in wweights.dat must be in the same order (ordered by
familyID and WithinFamilyIndividualID) as the lines of wpedigree.dat.
The wweights for father and mother are only place holders and
are not used, but they must be in the file.
Each line of wweights.dat has a single item:
The individual's "phenotype" weight. How it is constructed depends on
the specific nature of the analysis being done. In the case of a
binary outcome (no_disease or has_disease) It is
-prob(disease|covariates), minus the probability of disease given the
covariates under the null hypothesis that disease is independent of
genotype, when the individual's status is no_disease; and it is
1-prob(disease|covariates) when individual's status is has_disease.
For indications of how to construct weights for other types of
outcomes see the discussion of analyses with no covariates in Shih &
Whittemore(2002). In that paper the weight needed is "a(\phi)"
defined in equation (12). Examples of a(\phi) are given in equations
(13), (15), and (16) of that paper. For example, in the case of a
binary outcome--1(no_disease), 2(has_disease)--and no covariates,
then the weight is -phenocopyRate when the outcome is 1 and
1-phenocopyRate when the outcome is 2.
C)wcovars.dat
wcovars need not exist if the number of covariates in the
analysis is 0.
wcovars.dat must have one line for each line in wpedigree.dat, and
the lines in wcovars.dat must be in the same order (ordered by
familyID and WithinFamilyIndividualID) as the lines of
wpedigree.dat. The lines in wcovars.dat for father and mother are
only place holders and are not used, but they must be in the file.
Each line of wcovars.dat has as many items as there are
covariates. There is no provision for missing covariates.
If the user wants to include an intercept in the model, some column,
which is the same one for all subjects,
(the 1st column will usually be used for clarity) must have the
same value (usually 1 will be used) for all subjects.
NOTE: A number of the quantities specified for wpedigree.dat,
wweights.dat, and wcovars.dat, are redundant, or not needed for the
calculations of the current version of COVTDT, but they are required
never-the-less as specified above to i)maintain the form of files
required for a number of other widely used programs which process
this kind of data, ii)maintain items which make the manipulation of
the data easy, iii)make it possible to generalize the program, and
vi)provide means for checking the input for errors.
REFERENCES:
=================================================================
1)Clayton, Am. J. Hum. Genetics, 65:1170-7, 1999.
2)Rabinowitz D, Laird NM (2000) A unified approach to adjusting
association tests for population admixture with arbitrary pedigree
structure and arbitrary missing marker information. Hum Hered
504:227-233
3)Shih MC, Whittemore AS. 2002. Tests for genetic
association using family data. Genet Epidemiol 22:128 145.
4)Whittemore AS, Halpern J. Genetic association tests for
family data with missing parental genotypes: A comparison. Genet
Epidemiol. 2003 Jul;25(1):80-91.
5)Whittemore AS, Halpern J, and Ahsan H. 2004. Covariate Adjustment
in Family-based Association Studies. Submitted for publication.