## Saturation of the PCR Survey

### Saturation Curves

The completeness of the sample can be estimated using a first
order saturation model
* N*_{new}(n) = N_{total}
[ 1 - exp*(-K n)
]*

where *N*_{new}(n) is the number of distinct Hox genes
encountered after sequencing *n* clones. Saturation curves are
shown below:

**Figure.** Saturation curve for the three PCR series.

### Probability of Missing a Gene

The probability of missing a gene given the expected number
*N*_{total} of distinct genes and a sample size
of *n* sequences PCR products (which Hox genes) is
* Prob[miss] = (
1-1/N*_{total} )^{n}

PCR series |
* n * |
*N*_{total} |
*N*_{found} |
*K* |
corr. |
*chi*^{2} |
Prob[miss] |

5E5/3F |
29 |
8.02 |
8 |
0.1575 |
0.988 |
2.55 |
0.021 |

5E5-2/3F |
64 |
14.52 |
14 |
0.080 |
0.991 |
18.15 |
0.010 |

5E/3F |
144 |
22.88 |
24 |
0.027 |
0.972 |
283.82 |
0.002 |

Note that the E/F series does not fit well to the saturation curve.
This was noted earlier, see also [Misof, M.Y. and Wagner G.P., Evidence
for Four Hox Clusters in the Killifish *Fundulus Heteroclitus*
(Teleostei), *Mol. Phyl. Evol.* **5**: 309-322 (1996).

### Summary

The survey is fairly well saturated in the sense that it is unlikely
that:

(1) any further sequences would be found in **5E5/3F**,

(2) more than one more sequence would be found in **5E5-2/3F**,

(3) any further sequences would be found in **5E/3F**.