An algorithm to apply Fisher's exact test to any two-dimensional contingency table
There are
two main difficulties in applying Fisher's
exact test to contingency tables with more than two rows and two
columns. Taken
together, these two difficulties have led some textbook authors to say
categorically that the test can be used only for 2 ´ 2 tables. The first difficulty is the lack
of an algorithm that is both simple to program and known to be
logically sound.
The second difficulty is that the computation becomes extremely
time-consuming
when the table increases in size. This paper presents solutions to both
difficulties, and an approximate method of estimating the magnitude of
the
second difficulty.
Consider
the following 3 ´ 4 contingency table, which shows how many
students (randomly sampled from a fictitious population) fall into four
mutually exclusive subject categories and three mutually exclusive
voting
categories.
|
Arts |
Science |
Divinity |
Maths |
Total |
Party 1 |
3 |
1 |
0 |
6 |
10 |
Party 2 |
1 |
4 |
4 |
2 |
11 |
Party 3 |
0 |
1 |
5 |
0 |
6 |
Total |
4 |
6 |
9 |
8 |
27 |
Given the
marginal totals, and the null hypothesis
that the row categories are independent of the column categories, the
expected
frequencies within the table would be as follows.
|
1.48 |
2.22 |
3.33 |
2.96 |
|
1.63 |
2.44 |
3.67 |
3.26 |
|
0.89 |
1.33 |
2.00 |
1.78 |
The
observed departure from the expected frequencies
could be measured, but not reliably tested, by Pearson's c2
statistic, which in this
example has a value of 17.675.
In
principle, Fisher’s method is quite simple. We
generate every possible table that is compatible with the given
marginal
totals, and calculate the exact probability p
of each table, using Fisher’s formula (1934). In the above example
there are
8634 possible tables that would give the same marginal totals. We then
need to
partition the tables into those that are more extreme than the observed
table
(i.e. less compatible with the null hypothesis) and those that are less
extreme. By summing the probabilities of the extreme tables we obtain a
P value that is used in the usual
classical way as a test of the null hypothesis.
One
obvious approach is to calculate the exact
probability that, given these marginal totals, the departure from the
expected
frequencies, as measured by c2
, would be equal to or
greater than 17.675.
An
alternative and intuitively appealing approach
suggested by Anscombe ( ) was that the value of p
calculated by Fisher’s formula for a particular possible table
should be used not only for calculating the sum P of
the probabilities of extreme tables, but also as a measure of
how extreme the table is. Those values of p
that are smaller than the p value of
the observed table are summed to find the total probability of all
tables more
extreme than the observed one.
A further
detail remains to be decided: what do we
do with tables that are exactly as extreme as the observed table? I
have
adopted another of Anscome’s suggestions here: the P
value used to test Ho is the sum of the p
values smaller than that of the
observed table plus half the sum of the p
values equal to that of the observed table. This decision has been the
subject
of a number of studies, so I shall not elaborate on it here.
For
contingency tables more complex than 2 ´ 2 tables, the difficulty is to design a
systematic way of generating every possible table, so that it can be
tested for
extremity and its exact probability calculated.
The
systematic method developed here is to attach a
unique numerical value L to each possible table, and work through the
tables in
order of their values of L, so that none of the tables can be
overlooked or
counted twice. Consider the cells of the table, i.e. the elements of an
array
A, to be read like a book from left to right and from top to bottom.
These
elements when read in that order represent the digits of the number L,
(using
arithmetic to any base that is greater than the largest element). When
we are
considering a particular element, we call the preceding elements the
higher-order elements of the array, and the later numbers the
lower-order
elements. We start with the table that has the smallest possible value
of L,
and then repeatedly increment L by the smallest possible increment
consistent
with the constraints of the given marginal totals, and making sure that
no
element is ever less than zero. This algorithm depends on two
subroutines
described below.
The first
subroutine (labelled as Minimise in the source code)
minimises
the value of the low-order elements of L, starting from row y
and column x, without altering the higher-order
elements, and without altering
the marginal totals.
This
subroutine depends on the following theorem,
whose proof is rather too obvious to be worth spelling out:
the
element
A(y,x) can be reduced without altering the higher-order elements or the
marginal
totals if and only if there is a non-zero element A(y+v, x+u) where u,
v are
positive integers.
The second
subroutine (labelled as Increment in the source code)
is
designed to increment by 1 the lowest-order element of A that is
capable of
being incremented without altering the higher-order elements or the
marginal
totals.
This
subroutine depends on the following theorem,
whose proof is even more obvious:
an
element
A(y,x) can be incremented without altering the higher-order elements or
the
marginal totals if and only if there exist two non-zero elements
A(y+v,x) and
A(y,x+u) where u, v and positive integers.
The
subroutines Increment
and Minimise are used alternately (in
the UK sense of the word).
There are
reasonable grounds for believing that this
systematic method of working through the possible contingency tables
does, in
fact, generate every possible table exactly once. Firstly, the program
based on
it has given the same answers as other software packages when applied
to 2 ´ 2 tables.
Secondly, it agrees exactly with
the published analyses of Yates (1934) for 2 ´ 3 tables. Thirdly, when the
probabilities calculated by equation 1 are summed for all possible
contingency
tables consistent with a given set of marginal totals, the answer found
has
invariably been between 0.99999 and 1.00001 for numerous analyses in
which the
number of possible tables varies from 6 to 2954315. Fourthly, I have
made the
source code available so that users can check its logic (see
http://www.statistics.org.uk).
For each
of the possible contingency tables
generated by the subroutine Increment,
the calculation of its exact probability p
is fairly simple, and is based on the work of Fisher (1934) and Yates
(1934).
If n1, n2, n3 . . . are the row totals and m1,
m2,
m3 . . . are
the column totals and a1, a2, a3
. . . are the elements of the array, and G
is the grand total, then
Apart from the difficulty of logical design,
there remains
the difficulty of lengthy computation. One method that I have used to
reduce
the computation time is to create a look-up table of logarithms of
factorials
at the start, so that the above formula can be evaluated quickly.
Another
obvious method is to calculate the logarithm of only once at the
start of the calculation, because it is a common factor in the value of
p for all possible contingency tables
with the given marginal totals.
A much more important method of reducing the computation time depends on recognising whole sets of possible tables that are all more extreme than the observed table, or are all less extreme. This saves us from having to generate them all one by one.
Table 1. Some examples of contingency tables analysed by the Exact Test. All these tables took less than 1 second of computing time with a 1.8 GHz Pentium 4 processor, except the last table, which took 62 seconds.
Observed contingency table |
Estimated numberof possible tableswith samemarginal totals |
Actual number of possible tables with same marginal totals |
P value by Fisher’s exact test |
|
|
|
|
|
|
|
25 |
1 |
|
|
|
|
|
18 |
4 |
|
|
6 |
6 |
0.109 |
|
|
|
|
|
|
|
50 |
4 |
12 |
|
|
|
|
2 |
2 |
3 |
|
56 |
35 |
0.017 |
|
|
|
|
|
|
|
50 |
4 |
2 |
|
|
|
|
2 |
3 |
5 |
|
64 |
54 |
7.3 ´ 10-6 |
|
|
|
|
|
|
|
6 |
9 |
2 |
|
|
|
|
7 |
2 |
4 |
|
84 |
74 |
0.101 |
|
|
|
|
|
|
|
25 |
8 |
14 |
|
|
|
|
55 |
27 |
9 |
|
864 |
798 |
0.0098 |
|
|
|
|
|
|
|
1 |
6 |
0 |
|
|
|
|
0 |
12 |
3 |
|
|
|
|
2 |
6 |
1 |
|
1736 |
1224 |
0.157 |
1 |
14 |
0 |
|
|
|
|
|
|
|
|
|
|
|
0 |
7 |
0 |
|
|
|
|
0 |
11 |
4 |
|
|
|
|
3 |
6 |
0 |
|
1736 |
1224 |
0.011 |
1 |
14 |
0 |
|
|
|
|
|
|
|
|
|
|
|
3 |
1 |
0 |
6 |
|
|
|
1 |
4 |
4 |
2 |
7503 |
8634 |
0.0033 |
0 |
1 |
5 |
0 |
|
|
|
|
|
|
|
|
|
|
3 |
1 |
0 |
6 |
|
|
|
1 |
4 |
4 |
2 |
|
|
|
0 |
1 |
5 |
0 |
5799660 |
2954315 |
0.020 |
2 |
5 |
3 |
6 |
|
|
|
|
|
|
|
|
|
|
When only
part of a possible contingency table has
been filled in, it is sometimes possible to recognise that the table
when
completed is bound to be extreme, or is bound not to be extreme. It is
also
possible to calculate the exact probability of a partly specified
table. So it
is not necessary to calculate the exact probability of every particular
table:
a whole set of tables can often be processed in one short calculation.
How
do we
recognise when a partly specified table T0 is bound to lead
only to
extreme tables or only to non-extreme tables?
Suppose we
were using Fisher’s p as a measure of how extreme a
table is.
This
expression can be factorised into and
a set of separate factors
for
each row or column. For a given row or column with c
cells still to be allocated and a
total of N observations to be
distributed among them, the minimum conceivable value of p
will occur when a1
= a2 = a3 . . .
= N/c and the maximum conceivable value of
p will occur when one of the cells
contains N and the others contain
zero. Thus the contribution of a particular row or column to ln(p) will always lie between
and
-
If the
column totals vary more than the row totals,
this formula is best applied column by column; otherwise it is best
applied row
by row. This will help to make the limits narrower and more
informative.
For
example, suppose we have the partly specified
table as shown below.
3 |
1 |
2 |
5 |
11 |
4 |
6 |
|
|
20 |
|
|
|
|
8 |
|
|
|
|
4 |
|
|
|
|
6 |
12 |
11 |
14 |
12 |
49 |
Rows 2, 3, 4 and 5 respectively have 10, 8, 4 and 6 observations still to be distributed. The contributions of these rows to ln(p) therefore lies between the following values.
Row no. |
N |
c |
N/c |
cLn((N/c)!)
= Min contribution
to ln(p) |
Ln(N!)
= Max contribution
to ln(p) |
2 |
10 |
2 |
5 |
9.574983 |
15.10441 |
3 |
8 |
4 |
2 |
2.772589 |
10.6046 |
4 |
4 |
4 |
1 |
0 |
3.178054 |
5 |
6 |
4 |
1.5 |
5.3174 |
6.579251 |
Total |
|
|
|
17.66497 |
35.46632 |
The
contribution U of the unspecified
cells to ln(p) is therefore somewhere between
17.66497
and 35.46632.
The
contribution S of the six specified
cells to ln(p) is We also use the fact
that the contribution M of the
marginal totals is
18.30205
The
conclusion is that
and
so p = 1.412 ´ 10-15 to 7.599 ´ 10-8.
If the value of Fisher’s p for the observed table lies outside this range then we do not need to construct all possible tables that start with 3, 1, 2, 5, 4, 6 . . . because we know whether they are more extreme or less extreme than the observed table. We merely need to calculate the sum of their p values, and this can be done very quickly. It is the same as the a priori probability that the table will start with 3, 1, 2, 5, 4, 6.
The
following procedure is designed only for tables
that are filled in from left to right and from top to bottom, like
writing a
paragraph. The empty rows are all at the bottom, and the empty cells in
the
partly completed row are all to the right.
First,
complete any row or column that has only one
cell unspecified: this can be done in only one way that is compatible
with the
marginal totals.
Then,
starting from the partly completed contingency
table T0, construct two complete contingency tables T1
and T2 as follows. T1 has the same column totals
as T0
and comprises all the complete rows of T0, plus
one
complementary row, which can be filled in in only one way that is
compatible
with the column totals.
Construct T2
from the partly filled row
of T0 plus one complementary row. The column totals of T2
are equal to the contents of the last row of T1.
At this
stage, the rationale of the calculation
begins to be clear. The hypothetical allocation of numbers in all
possible ways
to complete the table T0 could be done in two stages, with
calculable probabilities. First, the numbers could be allocated to the
cells of
T1, and then the last row of T1 could be further
divided
among the cells of T2. The exact probability of T0
is
therefore the product of the probabilities of T1 and T2.
The exact
probability p1 of T1 is
calculated using Fisher’s
formula. Before doing the same with T2, the empty columns to
the
right of T2 are merged into one empty column, whose contents
can be
filled in in only one way that is compatible with the marginal totals.
We can
then calculate the exact probability p2
of T2 and hence calculate p
= p1 ´ p2,
which is the exact probability of the partly specified table T0.
A worked
example is shown below. Suppose our partly
completed table T0 is as follows.
3 |
1 |
2 |
5 |
11 |
4 |
6 |
|
|
20 |
|
|
|
|
8 |
|
|
|
|
4 |
|
|
|
|
6 |
12 |
11 |
14 |
12 |
49 |
Then table
T1 is as follows, and its
exact probability p1 is 0.0059862.
3 |
1 |
2 |
5 |
11 |
9 |
10 |
12 |
7 |
38 |
12 |
11 |
14 |
12 |
49 |
Table T2
is initially
4 |
6 |
? |
? |
20 |
5 |
4 |
? |
? |
18 |
9 |
10 |
12 |
7 |
38 |
and when
condensed it becomes
4 |
6 |
10 |
20 |
5 |
4 |
9 |
18 |
9 |
10 |
19 |
38 |
And its
exact probability is 0.0727953.
The exact
probability of table T0 is
therefore 0.0059862 ´ 0.0727953 = 0.000435767.
One
special case needs to be catered for explicitly:
if table T0 has no complete rows, then T1 is
empty and p1
= 1.
Counting the Cost
It is wise before embarking on a calculation to try to count the cost. If we started a Fisher’s exact test with no estimate of the time required, we could be in the tantalising position of having run the program for several days and be reluctant to abandon the attempt since the solution may be only minutes away. Our decision would be easier if we had an approximate measure of the number of possible tables that would be compatible with the marginal totals.
Gail and
Mantel proposed an approximate method of
estimating the total number of possible tables that are compatible with
given
marginal totals. Their algorithm has been used by Metha and Patel in
their
calculation of exact probabilities. I have used a similar but not
identical
method. My software agrees with the worked example of Gail and Mantel,
but does
not agree with the calculations of Metha and Patel: I suspect that they
have
made a programming error.
A slight
but useful refinement can be made to Gail
and Mantel’s method. Their formulae do not treat rows and columns as
equivalent. The predicted number of possible tables changes when the
table is
rotated or transposed so that the rows become columns and the columns
become
rows. I have found that the prediction is substantially better if Gail
and
Mantel’s method is carried out twice, rotating the table between runs,
and then
using the mean of the two estimates. See fig. 1.
As the
number of rows and columns increases, and as
the counts within the cells increase, the calculation soon becomes
impracticably slow. But it may be noticed that it is a calculation that
lends
itself very easily to vector processing or parallel processing, since
there is
no need to complete the analysis of one possible contingency table
before
beginning the analysis of other possible tables with the same marginal
totals.
We may expect therefore that when parallel and vector processors become
readily
available it should become easier to analyse large contingency tables
in this
way.
Fisher (1934). The logic of inductive inference. Journal of the Royal Statistical Society, Series A, 98: 39-54.
Yates, F. (1934). Contingency Tables involving small numbers and the Chi-squared test. Journal of the Royal Statistical Society, Supplement 1, pages 217-235.
Anscombe, F. J. (1981) Computing in Statistical Science through APL. New York: Springer-Verlag. Page 295.