Jerry Halpern
(650)723-5705
funn@stanford.edu
July 23, 2004 


                       COVTDT

This program implements analyses described in Whittemore AS, Halpern J,
and Ahsan H. 2004.  The documentation for COVTDT is this note and the
Whittemore, Halpern, and Ahsan (WHA) paper. It and the COVTDT program
are available at http://www.stanford.edu/dept/HRP/epidemiology/COVTDT.

NOTE:NOTE:NOTE:
Using the notation of WHA, the program COVTDT assumes that in
equation(4) of WHA, nu_10 = nu_20  = 0.  The user must specify  all p
covariates as columns in the file named wcovars.dat (see below). For
most applications, an additional column of ones is needed. If this
column is omitted, the results are obtained under the assumption that
Mendelian inheritance holds at the baseline(zero) levels of the
covariates.


=================================================================
                     INSTALLING COVTDT
=================================================================

COVTDT is written in C. Despite the "exe" suffix, covTDTzip.exe is
actually a "zip" archive. It is named covTDTzip.exe to avoid the tendency
of some browser configurations to process a file named, ".zip"
with the zip software rather than to just download it without
alteration when the browser is requested to do so.

0)The files in covTDTzip.exe are:
     covTDT_README.htm
covTDT_README.htm is this "README" file.

     2aux.c
     2include.h
     covTDT.c
2include.h 2aux.c and covTDT.c  is the source code for covTDT. The
 main program is in covTDT.c .

     covTDT.sol
     covTDT.win
covTDT.sol is an executable for Solaris sparc machines.
covTDT.win is an executable for Windows2000 machines. They will
likely run on any Windows machine NT or later.

     makefile.sol
     makefile.win
These are the "makefiles" for solaris and for windows that I used.
They are used by the "make" command to process the source code using
facilities that are present on my machines and output the executables.

     wcovars.dat
     wpedigree.dat
     wweights.dat
These files are example input files for covTDT.


1)Download covTDTzip.exe from
  http://www.stanford.edu/dept/HRP/epidemiology/COVTDT and rename it
  covTDT.zip after it is downloaded.

2)If you are using WINDOWS, then open a Command Window and move to
  a directory, call it "dirTDT" in which you want to work with COVTDT.

3)Move to a directory, call it "dirTDT" in which you want to work with
  COVTDT.

4)move covTDT.zip to "dirTDT" and unzip the covTDT.zip file.
  a)On a windows machine "move  covTDT.win  covTDT.exe".
      This executable was compiled using gcc 3.3.1 under cygwin with the
      "-mno-cygwin" option (and with the mingw gcc libraries also
      installed)  on a Dell machine running WINDOWS2000.
  b)On a Unix machine  "mv covTDT.sol covTDT".  This executable
      was compiled using gcc3.2 on a SUN sunblade1000  running 
      Solaris 2.8.

5)After you have created in the current directory the files 
  wpedigree.dat, wcovars.dat (if you have one or more covariates), and
  wweights.dat  which are described below, you can execute COVTDT with the
  command: "covTDT" 


=================================================================
                    USER RESPONSES FOR COVTDT 
=================================================================

1) "Enter the number of covariates(Including the 
              Intercept if there is one)-->"

    Enter the number of covariates.  If there are no
    covariates the user may enter 0.  The analysis done is then
    almost the same as that done by other versions of the
    TDT without covariates such as FBAT (Rabinowitz & Laird), FGAP
    Whittemore & Halpern (2003), and TRANSMIT (Clayton 1999). 

2)"Enter a model_type number for effect of genotype on status
           Recessive(1), Dominant(2), or Additive(3)-->"

    Enter 1, 2, or 3.

3)"Enter a model_type number for effect of genotype on covariates.
For further explanation of this, see the documentation.
           None(0), Recessive(1), Dominant(2), or Additive(3)-->"

   Enter 0, 1, 2, or 3.
   0 corresponds to putting no constraints on the parameters nu_i.
   1 corresponds to the constraints nu_1 = 0,  nu_2 arbitrary
   2 corresponds to the constraints nu_1 = nu_2 arbitrary
   3 corresponds to the constraints nu_1 = .5 * nu_2 arbitrary
 An examination of the model shows that the imposition of these
 constraints give rise to relationships between genotypes and covariates
 that are analogous to those termed "Recessive", "Dominant", and 
 "Additive" when applied to relationships between genotypes and
 disease status.
 to


=================================================================
                     THE INPUT FILES
=================================================================

A)wpedigree.dat

COVTDT always requires that a file, named "wpedigree.dat", be located
  in the directory from which COVTDT is executed. 
wpedigree.dat is a file with a constrained version of a format
 which has been used with  many programs, LINKAGE 
 and GENEHUNTER among others,  for calculating statistics relevant 
 to "linkage" and/or "association" between "markers" and genes which
 may be partly responsible for a phenotype of interest (for example,
 breast cancer).

a)Only nuclear families, ones consisting of a one father, one mother, and
  and their children may be represented in pedigree.dat.
b)The MAX number of families must be <= 6200.
c)The MAX number of individuals in a family must be <= 10. 
d)Each individual is represented by a line in wpedigree.dat. 
    i)The lines of all individuals in a family are consecutive lines of the
       file.
   ii)The first line in the group of lines for the family is for the father. 
  iii)The second line in the group of lines for the family is for the mother. 
   iv)The other lines in the group of lines for the family are for children. 


Each line of wpedigree.dat has nine items:

1)A Family ID: An integer which uniquely identifies the family.

2)A Within Family Individual ID: An integer consecutive from 1 for 
              the father line to N for the line for the last child in
              a family with N members.  

3)Father's ID:  1 (The family identifier of the father) for a child line 
                0   for a parent line. 

4)Mother's ID:  2  (The family identifier of the mother) for a child line 
                0 for a parent line. 

5)Sex:   1 If the line is for a male.
         2 If the line is for a female.

6)A Status Number.  This will be used only when user, on the COVTDT
                      command line specifies a dichotomous phenotype. 
                        It is:  1 or 2 Depending on which of the
                                       two states of the phenotype 
                                       was noted in the individual. 
                                     0 If the Individual's phenotype
                                       is unknown.   
               In any case it is must be coded as 1, 2, or 0.
               Commonly 1 is taken to mean that a phenotype  possibly 
                partly caused by an uncommon allele (commonly coded as
                markerAlleleID = 2) at a location of interest 
                was not observed in the individual, and 2 is taken to
                mean that it was observed.

7)The "Liability Class":  1  This is strictly
                 a placeholder which is  included only to maintain the
                 same form as the LINKAGE input file as it usually
                 appears, it is not used by COVTDT in any way.

8)The Allele ID: 1  If the marker allele from one haplotype is the
                         first of the two possible alleles.
                 2  If it is the other one.

                 0  If unknown and a parent.  It is required in
                    the current version of COVTDT that each Allele of
                    a child's marker be known. 

9)The Other Allele ID: 1  If the marker allele from the other
                                haplotype is the
                                first of the two possible alleles.
                       2  If it is the other one.
  
                       0  If unknown and a parent.  It is required in
                          the current version of COVTDT that each Allele of
                          a child's marker be known. 

EXAMPLE OF wpedigree.dat:

1 1 0 0 1 0 1 0 0
1 2 0 0 2 1 1 0 0
1 3 1 2 1 1 1 1 1
1 4 1 2 1 2 1 1 1
2 1 0 0 1 0 1 0 0
2 2 0 0 2 1 1 0 0
2 3 1 2 1 2 1 1 1
2 4 1 2 1 1 1 1 1
2 5 1 2 1 1 1 1 1
                                 

B)wweights.dat

wweights.dat must have one line for each line in wpedigree.dat, and
  the lines in wweights.dat must be in the same order (ordered by
  familyID and WithinFamilyIndividualID) as the lines of wpedigree.dat. 
  The wweights for father and mother are only place holders and
  are not used, but they must be in the file.

Each line of wweights.dat has a single item:
  The individual's "phenotype" weight.  How it is constructed depends on
  the specific nature of the analysis being done.  In the case of a
  binary outcome (no_disease or has_disease) It is
  -prob(disease|covariates), minus the probability of disease given the
  covariates under the null hypothesis that disease is independent of
  genotype, when the individual's status is no_disease; and it is
  1-prob(disease|covariates) when individual's status is has_disease.
  For indications of how to construct weights for other types of
  outcomes see the discussion of analyses with no covariates in Shih &
  Whittemore(2002). In that paper the weight needed is "a(\phi)"
  defined in equation (12).  Examples of a(\phi) are given in equations
  (13), (15), and (16) of that paper.  For example, in the case of a
  binary outcome--1(no_disease), 2(has_disease)--and no covariates,
  then the weight is -phenocopyRate when the outcome is 1 and
  1-phenocopyRate when the outcome is 2.


C)wcovars.dat
wcovars need not exist if the number of covariates in the
analysis is 0.

wcovars.dat must have one line for each line in wpedigree.dat, and
  the lines in wcovars.dat must be in the same order (ordered by
  familyID and WithinFamilyIndividualID) as the lines of
  wpedigree.dat.  The lines in wcovars.dat for father and mother are
  only place holders and are not used, but they must be in the file.

Each line of wcovars.dat has as many items as there are
  covariates.  There is no provision for missing covariates.
  If the user wants to include an intercept in the model, some column,
  which is the same one for all subjects,
  (the 1st column will usually be used for clarity) must have  the
  same value (usually 1 will be used) for all subjects.  
   

NOTE: A number of the quantities specified for wpedigree.dat,
  wweights.dat, and wcovars.dat, are redundant, or not needed for the
  calculations of the current version of COVTDT, but they are required
  never-the-less as specified above to i)maintain the form of files
  required for a number of other widely used programs which process
  this kind of data, ii)maintain items which make the manipulation of
  the data easy, iii)make it possible to generalize the program, and
  vi)provide means for checking the input for errors.


REFERENCES:
=================================================================
1)Clayton, Am. J. Hum.  Genetics, 65:1170-7, 1999.

2)Rabinowitz D, Laird NM (2000) A unified approach to adjusting
    association tests for population admixture with arbitrary pedigree
    structure and arbitrary missing marker information. Hum Hered
    504:227-233

3)Shih MC, Whittemore AS. 2002. Tests for genetic
    association using family data. Genet Epidemiol 22:128 145.

4)Whittemore AS, Halpern J.  Genetic association tests for
    family data with missing parental genotypes: A comparison.  Genet
    Epidemiol. 2003 Jul;25(1):80-91.

5)Whittemore AS, Halpern J, and Ahsan H. 2004. Covariate Adjustment
    in Family-based Association Studies.  Submitted for publication.