# FGAP Documentation

Jerry Halpern
(650) 723-5705

June 6, 2002

=================================================================
INSTALLING FGAP
=================================================================

NOTE:NOTE:NOTE: If there is a serious problem be sure to Check the version of java that you are using to run FGAP. FGAP is written in java. It requires that the Java 2 Runtime Environment Standard Edition 1.3.1 or later be installed on your computer and that the "java" command be available for use in the directory in which you will use FGAP. The present version is compiled using j2sdk1.4.0 and seems to run properly under 1.3.1 or 1.4.0.

The Java 2 Runtime Environment is freely available from SUN at http://developer.java.sun.com

2)Then include both FGAP.jar and also the "current directory" in the "CLASSPATH" environmental variable. In UNIX you can do this by entering, at the prompt, either a)CLASSPATH="$CLASSPATH: :." or b)setenv CLASSPATH="$CLASSPATH: " depending on which shell you are using. In WINDOWS, an environment variable is most easily set by using the suite of programs which are usually called "system" and which are usually located in the control panel. In some cases you will need to have the system administrator do this for you, although it is usually possible for you to set CLASSPATH for your own account by opening a command window. you will need to enter something like "set CLASSPATH="%CLASSPATH%; ;." [In Windows %CLASSPATH% designates the present CLASSPATH before before any changes are made].
3)If you are using WINDOWS, then open a Command Window and move to a directory in which you want to work with FGAP.
4)After you have created in the current directory the files pedigree.dat and (if necessary) phenotype.dat which are described below, you can execute FGAP with the command: a)"java FGAP" -- This will open a Graphical User Interface. b)"java FGAP c" -- This will start up FGAP in command line mode. A particularly useful way to use b) is at the prompt or in a shell script in the form: "java FGAP c < fileWithFGAPinstructions >> outputFile" where fileWithFGAPinstuctions is a simple text file, and outputFile is where you want the results of the analyses to be stored. The command file, fileWithFGAPinstructions, should have no more than one FGAP command on a line and it should end with a line giving the "run" command (see below).

=================================================================
FGAP COMMANDS:
=================================================================

1)MODE OF INHERITANCE Summary: modeltype Argument: a, d, or r Default: a ModeOfInheritance is: a for an additive model. d for a dominant model. r for a recessive model.

2)TYPE OF OUTCOME Summary: phenotype Argument: typeOfOutcome indicates whether the phenotype is binary or not, and if so, that that user will specify a sporadic rate for this phenotype. Default: blank which means unspecified. If this command is given then the argument must be "b". If "phenotype b" is specified then the phenotype is binary/qualitative, "disease is present" or "disease is absent",for example. In this case FGAP reads only pedigree.dat (see ow for definition of "StatusNumber"), computes the IndividualPhenotypeWeights, IndividualPhenotypeWeights is set equal to ((StatusNumber - 1) - ) if StatusNumber is 1 or 2. or 0 if StatusNumber is 0. is a floating point number from 0 and 1. It is the sporadic penetrance, the probability of having a StatusNumber = 2 when no MarkerAlleleID = 2, Also, note that when "phenotype b" is specified, the program expects that the a sporadicRate will be specified in a Sporadic Rate command. When the type of outcome is not specified to be binary by this command, then the program expects there to be a phenotype.dat file (see below) and reads it to get the IndividualPhenotypeWeights.

3)SPORADIC RATE Summary: enullphi Argument: A number from 0 to 1 which gives the probability that there will be a positive binary outcome only very weakly related, if at all, to inheritance. Default: -1 which means "unspecified" This command is used only when "phenotype b" has been specified.

4)ASSUME HARDY-WEINBERG(HW) EQUILIBRIUM or NOT HW Summary: hwnhw Argument: hwOrNotHw indicates whether or note to assume that the populaton is in HW equilibrium. Default: 1 If hwnhw = 1 It will be assumed that Hardy-Weinberg equilibrium holds and that the population genotype frequencies (aa, aA, and AA) are a function of a single parameter, gamma, which is the population frequency of allele A, the allele presumed to cause the phenotype with StatusNumber=2. So the genotype frequenes of (aa, aA, and AA) are [(1-gamma)*(1-gamma), 2*gamma*(1-gamma), and gamma*gamma] respectively. If hwnhw = 2 It is assumed that Hardy-Wienberg equilibrium does not hold. Thus the gentoype frequencies, aa, aA, and AA are given by ((1-gamma1-gamma2), gamma1, and gamma2) where gamma1 and gamma2 are the population frequencies not of alleles but of the genotypes aA and AA.

5)GENOTYPE FREQUENCES WILL BE COMPUTED OR SPECIFIED BY THE USER Summary: freqs Argument: indicates whether the program will compute the genotype frequencies of aa, aA, and AA, or whether the user will specify them. Default: 1 If freqs = 1 the program will compute the maximum likelihood estimates for the genotype frequencies of aa, aA, and AA. If freqs = 2 the user must provide enough information to specify the genotype frequencies of aa, aA and AA. In this case a valid value for freq1 must be given, and if "hwnhw 2", then a valid value for freq2 must also be given.

6)PARAMETER1 WHEN REQUIRED TO BE SPECIFIED BY THE USER. Summary: freq1 Argument: parameter1 is needed in some instances to specify the genotype frequencies of aa, aA, and AA. parameter1 is a number from 0 to 1. Default: -1 which means "unspecified" This command is used if and only if the command "freq 2" has been given previously. When hwnhw has the value 1, then the frequencies of aa, aA, and AA are taken to be (1-parameter1)*(1-parameter1), parameter1*(1-parmeter1), and parameter1*parameter1 respectively. When hwnhw has the value 2, then the parameter1 is the frequency of aA.

7)PARAMETER2 WHEN REQUIRED TO BE SPECIFIED BY THE USER. Summary: freq2 Argument: parameter2 is needed in some instances to specify the genotype frequencies of aa, aA, and AA. parameter2 is a number from 0 to 1. Default: -1 which means "unspecified" This command is used if and only if the commands "freq 2" and "hwnhw 2" have been given previously. When hwnhw has the value 2, then the parameter2 is the frequency of AA. In this case the frequency of aa is 1-parameter1-parameter2 and parameter1+parameter1 must be <= 1;

8)TO RUN AN ANALYSIS Summary: r or run Enter "r" or "run" at the prompt to do the analysis associated with the imput parameters that you have specified.

9)TO END THE SESSION Summary: q or quit Enter "q" or "quit" at the prompt to do the analysis associated with the imput parameters that you have specified. When using the GUI, terminate the session by closing the main Window.

=================================================================
THE INPUT FILES
=================================================================

A)pedigree.dat FGAP always requires that a file, named "pedigree.dat", be located in the directory from which FGAP is executed. pedigree.dat is a file with a constrained version of a format which has been used with many programs, LINKAGE, FGAPLINK, and GENEHUNTER among others, for calculating statistics relevant to "linkage" and/or "association" between "markers" and genes which may be partly responsible for a phenotype of interest (for example, breast cancer). a)Only nuclear families, ones consisting of a one father, one mother, and and their children may be represented in pedigree.dat. b)The MAX number of familys must be <= 6200. c)The MAX number of individuals in a family must be <= 10. d)Each individual is represented by a line in pedigree.dat. i)The lines of all individuals in a family are consecutive lines of the file. ii)The first line in the group of lines for the family is for the father. iii)The second line in the group of lines for the family is for the mother. iv)The other lines in the group of lines for the family are for children. Each line of pedigree.dat has nine items:

1)A Family ID: An integer which uniquely identifies the family.

2)A Within Family Individual ID: An integer consecutive from 1 for the father line to N for the line for the last child in a family with N members.

3)Father's ID: 1 (The family identifier of the father) for a child line 0 for a parent line.

4)Mother's ID: 2 (The family identifier of the mother) for a child line 0 for a parent line.

5)Sex: 1 If the line is for a male. 2 If the line is for a female.

6)A Status Number. This will be used only when user, on the FGAP command line specifies a dichotomous phenotype. It is: 1 or 2 Depending on which of the two states of the phenotype was noted in the individual. 0 If the Individual's phenotype is unknown. In any case it is must be coded as 1, 2, or 0. Commonly 1 is taken to mean that a phenotype possibly partly caused by an uncommon allele (commonly coded as markerAlleleID = 2) at a location of interest was not observed in the individual, and 2 is taken to mean that it was observed.

7)The "Liability Class": 1 This is strictly a placeholder which is included only to maintain the same form as the LINKAGE input file as it usually appears, it is not used by FGAP in any way.

8)The Allele ID: 1 If the marker allele from one haplotype is the first of the two possible alleles. 2 If it is the other one. 0 If unknown and a parent. It is required in the current version of FGAP that each Allele of a child's marker be known.

9)The Other Allele ID: 1 If the marker allele from the other haplotype is the first of the two possible alleles. 2 If it is the other one. 0 If unknown and a parent. It is required in the current version of FGAP that each Allele of a child's marker be known. EXAMPLE OF pedigree.dat: 1 1 0 0 1 0 1 0 0 1 2 0 0 2 1 1 0 0 1 3 1 2 1 1 1 1 1 1 4 1 2 1 2 1 1 1 2 1 0 0 1 0 1 0 0 2 2 0 0 2 1 1 0 0 2 3 1 2 1 2 1 1 1 2 4 1 2 1 1 1 1 1 2 5 1 2 1 1 1 1 1 B)phenotype.dat When FGAP is run when the ouput phenotype is b (binary), FGAP reads a second file, named "phenotype.dat", located in the directory that FGAP is executed from. phenotype.dat must have one line for each line in pedigree.dat, and the lines in phenotype.dat must be in the same order (ordered by familyID and WithinFamilyIndividualID) as the lines of pedigree.dat. Each line of phenotype.dat has three items:

1)A Family ID which uniquely identifies the family. This is the same Family ID as the corresponding Family ID in pedigree.dat.

2)An Within Family Individual ID, consecutive from 1 for the father line to N for the line for the last child in a family with N members. This is the same Within Family Individual ID as the corresponding Within Family ID in pedigree.dat.

3)The Individual's Phenotype Weight. This is the weight, a(\phi) defined in equation (12) of Shih & Whittemore(2001). Examples of a(\phi) are given in equations (13), (15), and (16). EXAMPLE OF phenotype.dat: This is the phenotype.dat file which corresponds to the pedigree.dat file above if the analysis is being run for a binary/qualitative phenotype with a background penetrance, the probability of having a StatusNumber = 2 when no MarkerAlleleID = 2, of .06: 1 1 0 1 2 -0.06 1 3 -0.06 1 4 0.94 2 1 0 2 2 -0.06 2 3 0.94 2 4 -0.06 2 5 -0.06 If the Example files for pedigree.dat and phenotype.dat given above are present in the directory from which FGAP is called, then there are two ways to obtain the same results from FGAP: 1E) Use the default parameters except use "phenotype b" and "enullphi .06". This calls for an additiveModel, a binaryPhenotype, and a backgroundPenetrance of .06. Since a binaryPhenotype is specified, FGAP reads only pedigree.dat. It does not read phenotype.dat.

The IndividualPhenotypeWeights is set equal to ((StatusNumber - 1) - .06) if StatusNumber is 1 or 2. 0 if StatusNumber is 0. 2E) Use the default parameters. The default model, additive, is used. Since the binary phenotype is not specified, FGAP reads phenotype.dat where the IndividualPhenotypeWeights (which are, in this example, the same as those which would be calculated in 1E) are stored. The output from from either 1E) or 2E) is:
------------------------------------------------------------
i)For all statistics, it is assumed that gamma is estimated from the data. ii)The ExactVariance is used for each statistic. iii)For the NF statistic the distribution of the missing parental genotypes is taken to be conditional on the observed family genotype. iv)gamma is the population frequency of the "bad" allele gamma= 0.0000 BackgroundPenetrance= 0.0600 Number of Families is 2 Model is Additive Phenotype is Dichotomous Non Est. Total Founder Founder gamma std-gamma OUT 1 2 -0.0070 -0.1771 -0.0033 0.0000 [ 0.00103]
------------------------------------------------------------
END OF OUTPUT for "java FGAP a b .06"
------------------------------------------------------------

NOTE: A number of the quantities specified for pedigree.dat and phenotype.dat, above, are redundant, or not needed for the calculations of the current version of FGAP, but they are required never-the-less as specified above to i)maintain the form of files required for a number of other widely used programs which process this kind of data, ii)maintain items which make the manipulation of the data easy, iii)make it possible to generalize the program, and vi)provide means for checking the input for errors.