Sample
Sizes to Detect Election Errors
We
especially want to detect errors which change an election winner. The same
arithmetic applies whether errors are accidental or hacks.
If
a winning margin is 2%, an error of just 1% could have caused it: taking 1% of
the total votes from candidate A and adding them to candidate B, so it changes
the winning margin by 2 percentage points.
A. Random samples of individual ballots.
How many ballots do we need to sample individually, to
make it likely we'll find any errors which changed outcomes?
Winning
margin of each contest, as % of all ballots |
1.25% |
2.50% |
5% |
10% |
20% |
40% |
With
that margin, outcome could be wrong if
there were this many erroneous ballots |
0.63% |
1.25% |
2.50% |
5% |
10% |
20% |
You
can word that as 1 error in this many records |
1
in 160 |
1
in 80 |
1
in 40 |
1
in 20 |
1
in 10 |
1
in 5 |
This
random sample of ballots gives 63.2% probability of detecting that
level of error (sample = 2 divided by the winning margin) |
160 |
80 |
40 |
20 |
10 |
5 |
A
bigger random sample of ballots gives 90% probability of detecting that
level of error (sample ≈ 4.5 divided by the winning margin) |
368 |
184 |
91 |
45 |
22 |
11 |
Tables are calculated in a spreadsheet |
So a sample of 184 individual ballots has 90%
chance of detecting errors, if 1.25% of all ballots were counted erroneously.
This sample reassures the public that contests with winning margins of 2.5% or
more weren't created by erroneous counts.
This same sample is also more than enough to show
that winning margins down to 1.25% weren't created by errors. That needs a
sample of 160. However the probability that the sample will find such a small
error level is only 63%, not 90%, which is not as reassuring.
The only way to be 100% sure of finding small
levels of error is to check all ballots, not a sample. Image
audits check them all for errors in the software which interprets and
tallies the votes. These still need a sample to check for some kinds of scanner
errors which are not visible on the scanned image. Fortunately that kind of
scanner error has not been found in elections so far.
How likely is it that each sample of individual ballots finds error,
if present?
Winning
margin of each contest, as % of all ballots |
1.25% |
2.50% |
5% |
10% |
20% |
40% |
With
that margin, outcome could be wrong if
there were this many erroneous ballots |
0.63% |
1.25% |
2.50% |
5% |
10% |
20% |
Size
of random sample |
Probability
that sample size at left will detect the error rate above |
|||||
1 |
0.63% |
1.25% |
2.5% |
5.0% |
10.0% |
20.0% |
2 |
1.25% |
2.5% |
4.9% |
9.8% |
19.0% |
36.0% |
11 |
6.7% |
12.9% |
24.3% |
43.1% |
68.6% |
91.4% |
22 |
12.9% |
24.2% |
42.7% |
67.6% |
90.2% |
99.3% |
45 |
24.6% |
43.2% |
68.0% |
90.1% |
99.1% |
100.0% |
91 |
43.5% |
68.2% |
90.0% |
99.1% |
100.0% |
100.0% |
184 |
68.5% |
90.1% |
99.1% |
100.0% |
100.0% |
100.0% |
368 |
90.0% |
99.0% |
100.0% |
100.0% |
100.0% |
100.0% |
500 |
95.6% |
99.8% |
100.0% |
100.0% |
100.0% |
100.0% |
1,000 |
99.8% |
100.0% |
100.0% |
100.0% |
100.0% |
100.0% |
As in the table above, a sample of 184
individual ballots has 90% chance of detecting errors, if 1.25% of all ballots were
counted erroneously. This table shows the probabilities for bigger and smaller
samples. A sample of 500 has 99.8% chance of catching this 1.25% error level.
B. Random samples of precincts, voting machines, or other batches
of ballots.
How many batches do we need to sample, to make it likely
we'll find any errors which changed outcomes?
Many
states don't sample individual ballots, They keep ballots in batches. Each
batch may be a precinct, voting machine, or a group of ballots which went
through a scanner together. Each batch usually has a few hundred ballots.
These
states choose a sample of batches. They tally all ballots in each sampled
batch, and compare to the original election machine's tally of the same batch.
This will find all batches with errors. Tallying hundreds of ballots in each
batch is costly and time-consuming, So sampling individual ballots can save
work where it's possible.
If
we do re-tally a sample of batches, how big a sample do we need? The hardest
errors to find are where a hack or error happened in a few scanners, precincts,
etc, so it might affect all or a big fraction of ballots in some batches, and
no ballots in other batches, and we need to find the few erroneous batches.
Winning
margin of each contest, as % of all ballots |
1.25% |
2.50% |
5% |
10% |
20% |
40% |
With
that margin, outcome could be wrong if
there were this many erroneous ballots |
0.63% |
1.25% |
2.50% |
5% |
10% |
20% |
If
worst batches are 100% wrong, this random sample of
batches gives 90% chance of detecting error |
368 |
184 |
91 |
45 |
22 |
11 |
If
worst batches are 50% wrong, this random sample of
batches gives 90% chance of detecting error |
184 |
91 |
45 |
22 |
11 |
5 |
If
worst batches are 25% wrong, this random sample of
batches gives 90% chance of detecting error |
91 |
45 |
22 |
11 |
5 |
2 |
So a sample of 45 batches has 90% chance of
detecting errors, if 1.25% of all ballots were counted erroneously, and if error
levels averaged 25% in the worst batches. This sample reassures the public that
contests with winning margins of 2.5% or more weren't created by errors.
If some batches
were entirely erroneous, by mis-programming or bad scanners, a 45-batch sample
has 90% chance of detecting error levels of 5%, so smaller errors could sneak
through. A sample of 184 batches has 90% chance of detecting errors, if 1.25%
of all ballots were counted erroneously, and if error levels averaged 100% in the
worst batches.
Having batches 100% wrong is rare. It could
happen for example if ballots from a drop box in a 100% Democratic area
go through an erroneous election scanner whose programming is off
a line and tallies Democratic votes as Republican, in one or more contests. No
one labels which drop box a batch comes from, so this batch would look as if it
came from an all-Republican area and would not necessarily be investigated.
Even looking for batches which are 25% erroneous,
requires re-tallying 45 batches to check winning margins of 2.5% or more, and
91 batches for margins of 1.25% or more.
C. Formulas
There are formulas connecting sample size, error level, and risk
limit or confidence level in detecting those errors. The following formulas
apply to simple random samples.
e = erroneous items
(ballots or batches) as fraction of all
items
e = probability that one
random item in the sample contains error
(1-e) = probability that one
random item in the sample is accurate. Multiply this n times for sample
of n:
(1-e)n = r = risk that n random items will all be accurate,
i.e. all n will miss the erroneous items
That final formula can be rearranged to calculate error rate and sample size:
1-r1/n = e = error rate which a sample of n can detect, with only r
chance of missing it (fractional exponent means the nth root).
e multiplied by total
number of items in the jurisdictions (ballots or batches) = number of erroneous
items which a sample of n can detect, with only r chance of
missing it
log(r) / log(1-e) = n = sample size needed, so the chance of missing error
level e is only r (logarithms can be natural or any base, as long
as they have the same base as each other).
Errors in batches depend on how
concentrated erroneous ballots are:
f = erroneous ballots as
fraction of all ballots
w = erroneous ballots as
fraction of worst batches
f
/ w =
Erroneous batches as fraction of all batches. This can be used as e in
the formulas above.
For
example, this is the table of batch sample sizes used above, with w and
f/w explicitly shown.
Winning
margin of each contest, as % of all ballots |
1.25% |
2.50% |
5% |
10% |
20% |
40% |
|
With
that margin, outcome could be wrong if
there were this many erroneous ballots |
0.63% |
1.25% |
2.50% |
5% |
10% |
20% |
f |
If
worst batches are 100% wrong (w=1), this
many batches would be erroneous |
0.63% |
1.25% |
2.50% |
5% |
10% |
20% |
f/w |
If
worst batches are 50% wrong (w=0.5), this
many batches would be erroneous |
1.25% |
2.50% |
5% |
10% |
20% |
40% |
f/w |
If
worst batches are 25% wrong (w=0.25), this
many batches would be erroneous |
2.50% |
5% |
10% |
20% |
40% |
80% |
f/w |
If
worst batches are 100% wrong, this random sample of
batches gives 90% chance of detecting error |
368 |
184 |
91 |
45 |
22 |
11 |
|
If
worst batches are 50% wrong, this random sample of
batches gives 90% chance of detecting error |
184 |
91 |
45 |
22 |
11 |
5 |
|
If
worst batches are 25% wrong, this random sample of
batches gives 90% chance of detecting error |
91 |
45 |
22 |
11 |
5 |
2 |
|
The
table shows smaller samples needed if we assume the worst batches are 25%
erroneous, not 100% erroneous.