/*
Please use the following reference when using this software:
Reference: NL Tintle, D Gordon, F McMahon, SJ Finch
Using duplicate genotyped data in genetic analyses: testing association and estimating error rates, SAGMB, 2007
Example implementing the duplicate data in test of genetic association macro
Example: Consider a situation where 100 sample units are genotyped once
and the other 100 sample units are genotyped twice. Assume we have a case-control
study and so there are 100 cases and 100 controls. Thus, there are 50 cases
that are genotyped once, 50 cases that are genotyped twice, 50 controls that
are genotyped once and 50 controls that are genotyped twice.
Assume that of the singly classified cases: 35 are classified AA, 10 are classified AB,
and 5 are classified BB. While, of the re-genotpyed cases, 27 are genotyped to AA both times,
5 are genotyped AA once and AB once, 7 are genotyped to AB twice, 0 are genotyped
to AA once and BB once, 2 are genotyped to AB once and BB once, and 9 are genotpyed to BB
both times.
Assume that of the singly classified controls: 20 are classified AA, 20 are classified AB,
and 10 are classified BB. While, of the re-genotpyed controls, 17 are genotyped to AA both times,
3 are genotyped AA once and AB once, 19 are genotyped to AB twice, 1 is genotyped
to AA once and BB once, 6 are genotyped to AB once and BB once, and 4 are genotpyed to BB
both times.
Then, we have a 2x3 table of singly genotyped data as follows:
| AA | AB | BB | Total |
-------------------------------------------------
Cases | 35 | 10 | 5 | 50 |
-------------------------------------------------
Controls | 20 | 20 | 10 | 50 |
-------------------------------------------------
Total | 55 | 30 | 15 | 100 |
And, we also have a 2x6 table of re-genotyped data as follows:
| AA both | AA once | AB both | AA once | AB once | BB both | Total|
| times | and AB once | times |and BB once |and BB once| times | |
-----------------------------------------------------------------------------------------------
Cases | 27 | 5 | 7 | 0 | 2 | 9 | 50 |
-----------------------------------------------------------------------------------------------
Controls | 17 | 3 | 19 | 1 | 6 | 4 | 50 |
-----------------------------------------------------------------------------------------------
Total | 44 | 8 | 26 | 1 | 8 | 13 | 100 |
These values are the values which will be part of the sas dataset which is passed to
the SAS macro.
For example, we could create a small SAS dataset named temp as follows:
data temp;
input snpnumb na1 na2 na3 nu1 nu2 nu3 na11 na12 na22 na13 na23 na33 nu11 nu12 nu22 nu13 nu23 nu33;
datalines;
1011023 35 10 5 20 20 10 27 5 7 0 2 9 17 3 19 1 6 4
;
run;
**NOTE: In the above example the SNP number is 1011023
***NOTE: the input line should match what is shown above, exactly. However, you can
have multiple lines of data. Each line of data will return a ts/p-value. Thus,
if you had many SNP's, each SNP would get one line in the data file temp.
**Note: *In the macro, A=case, U=control, AA=1, AB=2 and BB=3;
*/
**THE FOLLOWING CODE CAN BE RUN TO DECLARE THE NEW DATA SET TEMP
AND RECEIVE A DATASET, TEMP1, WHICH CONTAINS ALL OF THE INPUT
VARIABLES, PLUS THE F-TEST STATISTIC AND ASSOCIATED P-VALUE, PLUS
THE CHI-SQ AND P-VALUE FOR THE TEST IGNORING INCONSISENTS
Note: The dataset can ONLY have ONE SNP at a time. Multiple SNPs need to be
run in batch calls of the macro.
;
data example;
input snpnumb na1 na2 na3 nu1 nu2 nu3 na11 na12 na22 na13 na23 na33 nu11 nu12 nu22 nu13 nu23 nu33;
datalines;
1011023 35 10 5 20 20 10 27 5 7 0 2 9 17 3 19 1 6 4
;
run;
/******************
At this point, we could use either the permutation test macro or the regeno_test
macro to compute the MANOVA test statistic. However, let's assume that we are interested
in also computing the permutation test statistic. We decide that we want to base the
permutation test statistic on 500 permutations (4-9 minutes of computing time for output).
To run the macro we need to input 500 as the number of permutations.
The parameters are ordered as follows:
1- input database (in this case temp)
2- output database (in this case we'll call it temp1)
3- the number of permutations to compute the permutation test statistic (in this case 500)
********/
%permtest(example,exampleout,0);
proc print data=exampleout;run;
/*THE CODE ABOVE YIELDS A MANOVA F TEST STATISTIC VALUE OF
8.41440 with a p-value of 0.000311461.
THE TS/P-VALUE IGNORING INCONSISTENTS IS 14.9108/0.000578306
Each time you run the macro you will get a different permutation
p-value. The time I ran it, I got p-value of 0.002 for n=500 perumutations.
Though 500 permutations is not enough if you have an alpha much below 0.01.
THUS WE WOULD REJECT THE NULL HYPOTHESIS AT ALPHA=0.01 AND CONCLUDE
THAT THERE IS EVIDENCE OF GENETIC ASSOCIATION BETWEEN CASE/CONTROL
AT SNP 1011023
*/