Health Research and Policy

covTDT Documentation

Jerry Halpern
(650) 723-5705

July 23, 2004



covTDT

This program implements analyses described in Whittemore AS, Halpern J, and Ahsan H. 2004. The documentation for COVTDT is this note and the Whittemore, Halpern, and Ahsan (WHA) paper. It and the COVTDT program are availabile at

COVTDT_abs

NOTE:NOTE:NOTE:
Using the notation of WHA, the program COVTDT assumes that in equation(4) of WHA, nu_10 = nu_20 = 0. The user must specify all p covariates as columns in the file named wcovars.dat (see below). For most applications, an additional column of ones is needed. If this column is omitted, the results are obtained under the assumption that Mendelian inheritance holds at the baseline(zero) levels of the covariates.

=================================================================
INSTALLING COVTDT
=================================================================

COVTDT is written in C. Despite the "exe" suffix, covTDTzip.exe is actually a "zip" archive. It is named covTDTzip.exe to avoid the tendency of some browser configurations to process a file named, ".zip" with the zip software rather than to just download it without alteration when the browser is requested to do so.

0)The files in covTDTzip.exe are:
covTDT_README.htm
covTDT_README.htm is this "README" file.

2aux.c
2include.h
covTDT.c
2include.h 2aux.c and covTDT.c is the source code for covTDT. The main program is in covTDT.c .

covTDT.sol
covTDT.win
covTDT.sol is an executable for Solaris sparc machines.
covTDT.win is an executable for Windows2000 machines. They will likely run on any Windows machine NT or later.

makefile.sol
makefile.win
These are the "makefiles" for solaris and for windows that I used. They are used by the "make" command to process the source code using facilities that are present on my machines and output the executables.

wcovars.dat
wpedigree.dat
wweights.dat
These files are example input files for covTDT.

1)Download covTDTzip.exe from
http://www.stanford.edu/dept/HRP/epidemiology/COVTDT and rename it covTDT.zip after it is downloaded.

2)If you are using WINDOWS, then open a Command Window and move to a directory, call it "dirTDT" in which you want to work with COVTDT.

3)Move to a directory, call it "dirTDT" in which you want to work with COVTDT.

4)move covTDT.zip to "dirTDT" and unzip the covTDT.zip file.
a)On a windows machine "move covTDT.win covTDT.exe".
This executable was compiled using gcc 3.3.1 under cygwin with the "-mno-cygwin" option (and with the mingw gcc libraries also installed) on a Dell machine running WINDOWS2000.
b)On a Unix machine "mv covTDT.sol covTDT". This executable was compiled using gcc3.2 on a SUN sunblade1000 running Solaris 2.8.

5)After you have created in the current directory the files wpedigree.dat, wcovars.dat (if you have one or more covariates), and wweights.dat which are described below, you can execute COVTDT with the command: "covTDT"

=================================================================
USER RESPONSES FOR COVTDT
=================================================================

1) "Enter the number of covariates(Including the Intercept if there is one)-->"

Enter the number of covariates. If there are no covariates the user may enter 0. The analysis done is then almost the same as that done by other versions of the TDT without covariates such as FBAT (Rabinowitz & Laird), FGAP Whittemore & Halpern (2003), and TRANSMIT (Clayton 1999).

2)"Enter a model_type number for effect of genotype on status Recessive(1), Dominant(2), or Additive(3)-->"

Enter 1, 2, or 3.

3)"Enter a model_type number for effect of genotype on covariates. For further explanation of this, see the documentation. None(0), Recessive(1), Dominant(2), or Additive(3)-->"

Enter 0, 1, 2, or 3.
0 corresponds to putting no constraints on the parameters nu_i.
1 corresponds to the constraints nu_1 = 0, nu_2 arbitrary
2 corresponds to the constraints nu_1 = nu_2 arbitrary
3 corresponds to the constraints nu_1 = .5 * nu_2 arbitrary
An examination of the model shows that the imposition of these constraints give rise to relationships between genotypes and covariates that are analogous to those termed "Recessive", "Dominant", and "Additive" when applied to relationships between genotypes and
disease status.
to

=================================================================
THE INPUT FILES
=================================================================

A)wpedigree.dat

COVTDT always requires that a file, named "wpedigree.dat", be located in the directory from which COVTDT is executed. wpedigree.dat is a file with a constrained version of a format which has been used with many programs, LINKAGE and GENEHUNTER among others, for calculating statistics relevant to "linkage" and/or "association" between "markers" and genes which may be partly responsible for a phenotype of interest (for example, breast cancer).

a)Only nuclear families, ones consisting of a one father, one mother, and and their children may be represented in pedigree.dat.
b)The MAX number of families must be <= 6200.
c)The MAX number of individuals in a family must be <= 10.
d)Each individual is represented by a line in wpedigree.dat.
i)The lines of all individuals in a family are consecutive lines of the file.
ii)The first line in the group of lines for the family is for the father.
iii)The second line in the group of lines for the family is for the mother.
iv)The other lines in the group of lines for the family are for children.

Each line of wpedigree.dat has nine items:

1)A Family ID: An integer which uniquely identifies the family.

2)A Within Family Individual ID: An integer consecutive from 1 for the father line to N for the line for the last child in a family with N members.

3)Father's ID: 1 (The family identifier of the father) for a child line 0 for a parent line.

4)Mother's ID: 2 (The family identifier of the mother) for a child line 0 for a parent line.

5)Sex: 1 If the line is for a male.
2 If the line is for a female.

6)A Status Number. This will be used only when user, on the COVTDT command line specifies a dichotomous phenotype. It is: 1 or 2 Depending on which of the two states of the phenotype
was noted in the individual.
0 If the Individual's phenotype is unknown. In any case it is must be coded as 1, 2, or 0.
Commonly 1 is taken to mean that a phenotype possibly partly caused by an uncommon allele (commonly coded as markerAlleleID = 2) at a location of interest was not observed in the individual, and 2 is taken to mean that it was observed.

7)The "Liability Class": 1 This is strictly a placeholder which is included only to maintain the same form as the LINKAGE input file as it usually appears, it is not used by COVTDT in any way.

8)The Allele ID: 1 If the marker allele from one haplotype is the first of the two possible alleles.
2 If it is the other one.

0 If unknown and a parent. It is required in the current version of COVTDT that each Allele of a child's marker be known.

9)The Other Allele ID: 1 If the marker allele from the other haplotype is the first of the two possible alleles.
2 If it is the other one.

0 If unknown and a parent. It is required in the current version of COVTDT that each Allele of a child's marker be known.

EXAMPLE OF wpedigree.dat:

1 1 0 0 1 0 1 0 0
1 2 0 0 2 1 1 0 0
1 3 1 2 1 1 1 1 1
1 4 1 2 1 2 1 1 1
2 1 0 0 1 0 1 0 0
2 2 0 0 2 1 1 0 0
2 3 1 2 1 2 1 1 1
2 4 1 2 1 1 1 1 1
2 5 1 2 1 1 1 1 1

B)wweights.dat

wweights.dat must have one line for each line in wpedigree.dat, and the lines in wweights.dat must be in the same order (ordered by familyID and WithinFamilyIndividualID) as the lines of wpedigree.dat. The wweights for father and mother are only place holders and are not used, but they must be in the file.

Each line of wweights.dat has a single item: The individual's "phenotype" weight. How it is constructed depends on the specific nature of the analysis being done. In the case of a binary outcome (no_disease or has_disease) It is -prob(disease|covariates), minus the probability of disease given the covariates under the null hypothesis that disease is independent of genotype, when the individual's status is no_disease; and it is 1-prob(disease|covariates) when individual's status is has_disease. For indications of how to construct weights for other types of
outcomes see the discussion of analyses with no covariates in Shih & Whittemore(2002). In that paper the weight needed is "a(\phi)" defined in equation (12). Examples of a(\phi) are given in equations (13), (15), and (16) of that paper. For example, in the case of a binary outcome--1(no_disease), 2(has_disease)--and no covariates, then the weight is -phenocopyRate when the outcome is 1 and 1-phenocopyRate when the outcome is 2.

C)wcovars.dat
wcovars need not exist if the number of covariates in the analysis is 0.

wcovars.dat must have one line for each line in wpedigree.dat, and the lines in wcovars.dat must be in the same order (ordered by familyID and WithinFamilyIndividualID) as the lines of wpedigree.dat. The lines in wcovars.dat for father and mother are only place holders and are not used, but they must be in the file.

Each line of wcovars.dat has as many items as there are covariates. There is no provision for missing covariates. If the user wants to include an intercept in the model, some column,
which is the same one for all subjects, (the 1st column will usually be used for clarity) must have the same value (usually 1 will be used) for all subjects.

NOTE: A number of the quantities specified for wpedigree.dat, wweights.dat, and wcovars.dat, are redundant, or not needed for the calculations of the current version of COVTDT, but they are required never-the-less as specified above to i)maintain the form of files required for a number of other widely used programs which process this kind of data, ii)maintain items which make the manipulation of the data easy, iii)make it possible to generalize the program, and vi)provide means for checking the input for errors.

REFERENCES:
=================================================================
1)Clayton, Am. J. Hum. Genetics, 65:1170-7, 1999.

2)Rabinowitz D, Laird NM (2000) A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 504:227-233

3)Shih MC, Whittemore AS. 2002. Tests for genetic association using family data. Genet Epidemiol 22:128 145.

4)Whittemore AS, Halpern J. Genetic association tests for family data with missing parental genotypes: A comparison. Genet Epidemiol. 2003 Jul;25(1):80-91.

5)Whittemore AS, Halpern J, and Ahsan H. 2004. Covariate Adjustment in Family-based Association Studies. Submitted for publication.

Covariate adjustment in family-based association studies. Whittemore AS , Halpern J, Ahsan H. Division of Epidemiology, Department of Health Research and Policy, Stanford University School of Medicine, Stanford , California.

Family-based tests of association between a candidate locus and a disease evaluate how often a variant allele at the locus is transmitted from parents to offspring. These tests assume that in the absence of association, an affected offspring is equally likely to have inherited either one of the two homologous alleles carried by a parent. However, transmission distortion was documented in families in which the offspring are unselected for phenotype. Moreover, if offspring genotypes are associated with a risk factor for the disease, transmission distortion to affected offspring can occur in the absence of a causal relation between gene and disease risk. We discuss the appropriateness of adjusting for established risk factors when evaluating association in family-based studies. We present methods for adjusting the transmission /disequilibrium test for risk factors when warranted, and we apply them to data on CYP19 ( aromatase ) genotypes in nuclear families with multiple cases of breast cancer. Simulations show that when genotypes are correlated with risk factors, the unadjusted test statistics have inflated size, while the adjusted ones do not.

The covariate-adjusted tests are less powerful than the unadjusted ones, suggesting the need to check the relationship between genotypes and known risk factors to verify that adjustment is needed. The adjusted tests are most useful for data containing a large proportion of families that lack disease-discordant sibships , i.e., data for which multiple logistic regression of matched sibships would have little power.

Software for performing the covariate-adjusted tests is available at Genet. Epidemiol . (c) 2004 Wiley - Liss , Inc.

PMID: 15593089 [ PubMed - as supplied by publisher]

Footer Links: