An algorithm to apply Fisher's exact test to any two-dimensional contingency table

 

Ian C. McKay, August 2002

 

There are two main difficulties in applying Fisher's exact test to contingency tables with more than two rows and two columns. Taken together, these two difficulties have led some textbook authors to say categorically that the test can be used only for 2 ´ 2 tables. The first difficulty is the lack of an algorithm that is both simple to program and known to be logically sound. The second difficulty is that the computation becomes extremely time-consuming when the table increases in size. This paper presents solutions to both difficulties, and an approximate method of estimating the magnitude of the second difficulty.

 

Consider the following 3 ´ 4 contingency table, which shows how many students (randomly sampled from a fictitious population) fall into four mutually exclusive subject categories and three mutually exclusive voting categories.

 

 

Arts

Science

Divinity

Maths

Total

Party 1

3

1

0

6

10

Party 2

1

4

4

2

11

Party 3

0

1

5

0

6

Total

4

6

9

8

27

 

Given the marginal totals, and the null hypothesis that the row categories are independent of the column categories, the expected frequencies within the table would be as follows.

 

 

1.48

2.22

3.33

2.96

 

1.63

2.44

3.67

3.26

 

0.89

1.33

2.00

1.78

 

The observed departure from the expected frequencies could be measured, but not reliably tested, by Pearson's c2 statistic, which in this example has a value of 17.675.

 

In principle, Fisher’s method is quite simple. We generate every possible table that is compatible with the given marginal totals, and calculate the exact probability p of each table, using Fisher’s formula (1934). In the above example there are 8634 possible tables that would give the same marginal totals. We then need to partition the tables into those that are more extreme than the observed table (i.e. less compatible with the null hypothesis) and those that are less extreme. By summing the probabilities of the extreme tables we obtain a P value that is used in the usual classical way as a test of the null hypothesis.

 

One obvious approach is to calculate the exact probability that, given these marginal totals, the departure from the expected frequencies, as measured by c2 , would be equal to or greater than 17.675.

 

An alternative and intuitively appealing approach suggested by Anscombe ( ) was that the value of p calculated by Fisher’s formula for a particular possible table should be used not only for calculating the sum P of the probabilities of extreme tables, but also as a measure of how extreme the table is. Those values of p that are smaller than the p value of the observed table are summed to find the total probability of all tables more extreme than the observed one. 

 

A further detail remains to be decided: what do we do with tables that are exactly as extreme as the observed table? I have adopted another of Anscome’s suggestions here: the P value used to test Ho is the sum of the p values smaller than that of the observed table plus half the sum of the p values equal to that of the observed table. This decision has been the subject of a number of studies, so I shall not elaborate on it here.

 

For contingency tables more complex than 2 ´ 2 tables, the difficulty is to design a systematic way of generating every possible table, so that it can be tested for extremity and its exact probability calculated.

 

The systematic method developed here is to attach a unique numerical value L to each possible table, and work through the tables in order of their values of L, so that none of the tables can be overlooked or counted twice. Consider the cells of the table, i.e. the elements of an array A, to be read like a book from left to right and from top to bottom. These elements when read in that order represent the digits of the number L, (using arithmetic to any base that is greater than the largest element). When we are considering a particular element, we call the preceding elements the higher-order elements of the array, and the later numbers the lower-order elements. We start with the table that has the smallest possible value of L, and then repeatedly increment L by the smallest possible increment consistent with the constraints of the given marginal totals, and making sure that no element is ever less than zero. This algorithm depends on two subroutines described below.

 

The first subroutine (labelled as Minimise in the source code) minimises the value of the low-order elements of L, starting from row y and column x, without altering the higher-order elements, and without altering the marginal totals.

 

This subroutine depends on the following theorem, whose proof is rather too obvious to be worth spelling out:

the element A(y,x) can be reduced without altering the higher-order elements or the marginal totals if and only if there is a non-zero element A(y+v, x+u) where u, v are positive integers.

 

The second subroutine (labelled as Increment in the source code) is designed to increment by 1 the lowest-order element of A that is capable of being incremented without altering the higher-order elements or the marginal totals.

 

This subroutine depends on the following theorem, whose proof is even more obvious:

an element A(y,x) can be incremented without altering the higher-order elements or the marginal totals if and only if there exist two non-zero elements A(y+v,x) and A(y,x+u) where u, v and positive integers.

 

The subroutines Increment and Minimise are used alternately (in the UK sense of the word).

 

There are reasonable grounds for believing that this systematic method of working through the possible contingency tables does, in fact, generate every possible table exactly once. Firstly, the program based on it has given the same answers as other software packages when applied to 2 ´ 2 tables. Secondly, it agrees exactly with the published analyses of Yates (1934) for 2 ´ 3 tables. Thirdly, when the probabilities calculated by equation 1 are summed for all possible contingency tables consistent with a given set of marginal totals, the answer found has invariably been between 0.99999 and 1.00001 for numerous analyses in which the number of possible tables varies from 6 to 2954315. Fourthly, I have made the source code available so that users can check its logic (see http://www.statistics.org.uk).

 

For each of the possible contingency tables generated by the subroutine Increment, the calculation of its exact probability p is fairly simple, and is based on the work of Fisher (1934) and Yates (1934). If n1, n2, n3 . . . are the row totals and m1, m2, m3  . . . are the column totals and a1, a2, a3 . . . are the elements of the array, and G is the grand total, then

 

Apart from the difficulty of logical design, there remains the difficulty of lengthy computation. One method that I have used to reduce the computation time is to create a look-up table of logarithms of factorials at the start, so that the above formula can be evaluated quickly. Another obvious method is to calculate the logarithm of  only once at the start of the calculation, because it is a common factor in the value of p for all possible contingency tables with the given marginal totals.

 

A much more important method of reducing the computation time depends on recognising whole sets of possible tables that are all more extreme than the observed table, or are all less extreme. This saves us from having to generate them all one by one.

 


Table 1. Some examples of contingency tables analysed by the Exact Test. All these tables took less than 1 second of computing time with a 1.8 GHz Pentium 4 processor, except the last table, which took 62 seconds.

 

Observed contingency table

Estimated number

of possible tables

with same

marginal totals

Actual number of

possible tables

with same

marginal totals

P value

by Fisher’s

exact test

 

 

 

 

 

 

 

 

25

1

 

 

 

 

 

18

4

 

 

6

6

0.109

 

 

 

 

 

 

 

50

4

12

 

 

 

 

2

2

3

 

56

35

0.017

 

 

 

 

 

 

 

50

4

2

 

 

 

 

2

3

5

 

64

54

7.3 ´ 10-6

 

 

 

 

 

 

 

6

9

2

 

 

 

 

7

2

4

 

84

74

0.101

 

 

 

 

 

 

 

25

8

14

 

 

 

 

55

27

9

 

864

798

0.0098

 

 

 

 

 

 

 

1

6

0

 

 

 

 

0

12

3

 

 

 

 

2

6

1

 

1736

1224

0.157

1

14

0

 

 

 

 

 

 

 

 

 

 

 

0

7

0

 

 

 

 

0

11

4

 

 

 

 

3

6

0

 

1736

1224

0.011

1

14

0

 

 

 

 

 

 

 

 

 

 

 

3

1

0

6

 

 

 

1

4

4

2

7503

8634

0.0033

0

1

5

0

 

 

 

 

 

 

 

 

 

 

3

1

0

6

 

 

 

1

4

4

2

 

 

 

0

1

5

0

5799660

2954315

0.020

2

5

3

6

 

 

 

 

 

 

 

 

 

 

 

 

 

A Faster algorithm

 

When only part of a possible contingency table has been filled in, it is sometimes possible to recognise that the table when completed is bound to be extreme, or is bound not to be extreme. It is also possible to calculate the exact probability of a partly specified table. So it is not necessary to calculate the exact probability of every particular table: a whole set of tables can often be processed in one short calculation.

 

How do we recognise when a partly specified table T0 is bound to lead only to extreme tables or only to non-extreme tables?

 

Suppose we were using Fisher’s p as a measure of how extreme a table is.

 

This expression can be factorised into and a set of separate factors for each row or column. For a given row or column with c cells still to be allocated and a total of N observations to be distributed among them, the minimum conceivable value of p will occur when a1 = a2 = a3 . . . = N/c and the maximum conceivable value of p will occur when one of the cells contains N and the others contain zero. Thus the contribution of a particular row or column to ln(p) will always lie between and -

 

If the column totals vary more than the row totals, this formula is best applied column by column; otherwise it is best applied row by row. This will help to make the limits narrower and more informative.

 

For example, suppose we have the partly specified table as shown below.

3

1

2

5

11

4

6

 

 

20

 

 

 

 

8

 

 

 

 

4

 

 

 

 

6

12

11

14

12

49

 

Rows 2, 3, 4 and 5 respectively have 10, 8, 4 and 6 observations still to be distributed. The contributions of these rows to ln(p) therefore lies between the following values.

 

Row no.

N

c

N/c

cLn((N/c)!) = Min

contribution to ln(p)

Ln(N!) = Max

contribution to ln(p)

2

10

2

5

9.574983

15.10441

3

8

4

2

2.772589

10.6046

4

4

4

1

0

3.178054

5

6

4

1.5

5.3174

6.579251

Total

 

 

 

17.66497

35.46632

 

 

The contribution U of the unspecified cells to ln(p) is therefore somewhere between

17.66497 and 35.46632.

 

The contribution S of the six specified cells to ln(p) is  We also use the fact that the contribution M of the marginal totals is  18.30205

The conclusion is that

and so p = 1.412 ´ 10-15 to 7.599 ´ 10-8.

 

If the value of Fisher’s p for the observed table lies outside this range then we do not need to construct all possible tables that start with 3, 1, 2, 5, 4, 6 . . . because we know whether they are more extreme or less extreme than the observed table. We merely need to calculate the sum of their p values, and this can be done very quickly. It is the same as the a priori probability that the table will start with 3, 1, 2, 5, 4, 6.

 

How do we calculate the exact probability of a partly specified table?

The following procedure is designed only for tables that are filled in from left to right and from top to bottom, like writing a paragraph. The empty rows are all at the bottom, and the empty cells in the partly completed row are all to the right.

 

First, complete any row or column that has only one cell unspecified: this can be done in only one way that is compatible with the marginal totals.

 

Then, starting from the partly completed contingency table T0, construct two complete contingency tables T1 and T2 as follows. T1 has the same column totals as T0 and comprises all the complete rows of T0, plus one complementary row, which can be filled in in only one way that is compatible with the column totals.

 

Construct T2 from the partly filled row of T0 plus one complementary row. The column totals of T2 are equal to the contents of the last row of T1.

 

At this stage, the rationale of the calculation begins to be clear. The hypothetical allocation of numbers in all possible ways to complete the table T0 could be done in two stages, with calculable probabilities. First, the numbers could be allocated to the cells of T1, and then the last row of T1 could be further divided among the cells of T2. The exact probability of T0 is therefore the product of the probabilities of T1 and T2.

 

The exact probability p1 of T1 is calculated using Fisher’s formula. Before doing the same with T2, the empty columns to the right of T2 are merged into one empty column, whose contents can be filled in in only one way that is compatible with the marginal totals. We can then calculate the exact probability p2 of T2 and hence calculate p = p1 ´ p2, which is the exact probability of the partly specified table T0.

 

A worked example is shown below. Suppose our partly completed table T0 is as follows.

 

3

1

2

5

11

4

6

 

 

20

 

 

 

 

8

 

 

 

 

4

 

 

 

 

6

12

11

14

12

49

 

Then table T1 is as follows, and its exact probability p1 is 0.0059862.

 

3

1

2

5

11

9

10

12

7

38

12

11

14

12

49

 

Table T2 is initially

4

6

?

?

20

5

4

?

?

18

9

10

12

7

38

 

 and when condensed it becomes

 

4

6

10

20

5

4

9

18

9

10

19

38

 

And its exact probability is 0.0727953.

 

The exact probability of table T0 is therefore 0.0059862 ´ 0.0727953 = 0.000435767.

 

One special case needs to be catered for explicitly: if table T0 has no complete rows, then T1 is empty and p1 = 1.

 

 

Counting the Cost

It is wise before embarking on a calculation to try to count the cost. If we started a Fisher’s exact test with no estimate of the time required, we could be in the tantalising position of having run the program for several days and be reluctant to abandon the attempt since the solution may be only minutes away. Our decision would be easier if we had an approximate measure of the number of possible tables that would be compatible with the marginal totals.

 

Gail and Mantel proposed an approximate method of estimating the total number of possible tables that are compatible with given marginal totals. Their algorithm has been used by Metha and Patel in their calculation of exact probabilities. I have used a similar but not identical method. My software agrees with the worked example of Gail and Mantel, but does not agree with the calculations of Metha and Patel: I suspect that they have made a programming error.

 

A slight but useful refinement can be made to Gail and Mantel’s method. Their formulae do not treat rows and columns as equivalent. The predicted number of possible tables changes when the table is rotated or transposed so that the rows become columns and the columns become rows. I have found that the prediction is substantially better if Gail and Mantel’s method is carried out twice, rotating the table between runs, and then using the mean of the two estimates. See fig. 1.

 

 

 

 

As the number of rows and columns increases, and as the counts within the cells increase, the calculation soon becomes impracticably slow. But it may be noticed that it is a calculation that lends itself very easily to vector processing or parallel processing, since there is no need to complete the analysis of one possible contingency table before beginning the analysis of other possible tables with the same marginal totals. We may expect therefore that when parallel and vector processors become readily available it should become easier to analyse large contingency tables in this way.

 


References

 

Fisher (1934). The logic of inductive inference. Journal of the Royal Statistical Society, Series A, 98: 39-54.

Yates, F. (1934). Contingency Tables involving small numbers and the Chi-squared test. Journal of the Royal Statistical Society, Supplement 1, pages 217-235.

Anscombe, F. J. (1981) Computing in Statistical Science through APL. New York: Springer-Verlag. Page 295.