DON’T GET TESTY — PART 2. Unthoughtful use of SARS-CoV-2 testing can increase risk for COVID-19 (and cause other avoidable harms)

Sean C. Lucan, MD, MPH, MS

19 min readNov 16, 2020

[Also appearing on Linkedin]

DON’T GET TESTY - PART 1 presented a summary including:

Section A: bottom line — weekly testing is ill-advised (discussion and reasoned conclusions)

What follows below is supporting detail:

Section B: testing basics — an illustrative example for non-experts (ok to skip if you are experienced in epidemiology)
Section C: calculation inputs — actual numbers for the current SARS-CoV-2 situation (information to justify the numbers)
Section D: categorical harms — enumerating the ways testing goes badly (this is the crux)
Section E: gray areas — dealing with uncertainty (additional nuance in the weeds)

B.) FOR ANY TEST — A PLAIN LANGUAGE PRIMER ON TESTING CONSIDERATIONS

Most people think test results are truth. They are not.

In the simplest example, any condition (SARS-CoV-2 or otherwise) is either present or it isn’t. A test should correctly tell us which is which. But tests are imperfect.

To demonstrate let’s imagine a completely ridiculous made-up screening “test” requiring no prerequisite medical knowledge (please see APPENDIX for example; those with a strong foundation in epidemiology might instead skip to the next section).

— — — — — — — — — — — — — — — — — — — — — — —

C.) NOT JUST AN ACADEMIC EXERCISE — THE NUMBERS FROM THE MADE-UP SCREENING TEST ARE THOSE ACTUALLY EXPECTED FOR SARS-COV-2

The example from the APPENDIX is ridiculous. But the numbers are real. The values are those we would expect if a real school conducts weekly SARS-CoV-2 testing (campus-wide, in asymptomatic students). To refocus from Appendix tables, we can simply substitute “virus” for our condition and “PCR” for our test:

Table 1. Expected school results for SARS-CoV-2 testing

In this particular case, the “test” is saliva-based polymerase chain reaction (PCR). However, other SARS-CoV-2 screening tests could just as easily apply. Instead of saliva, a school might sample using nasal, nasopharyngeal, or oropharyngeal swabs. Instead of PCR, a school might choose antigen testing or nucleic acid amplification. The specifics are less important. The following argument might apply for any screening test. Relatively minor differences in test characteristics would not meaningfully alter conclusions.

For saliva-based PCR though, we should be clear from where the number come. While no SARS-CoV-2 tests have been designed for, tested in, or explicitly approved for asymptomatic screening (especially in children), use in asymptomatic individuals might result in higher specificity than use in symptomatic disease. Then again, if other beta coronaviruses are present (and not causing even mild cold symptoms), higher specificity might not be the case. To my knowledge, only two studies have assessed specificity for salivary PCR. As reported in a meta-analysis, both studies showed specificity of ~ 98%. Thus, the specificity used in tables (Table 1 and Appendix tables A2-A5) is 98%.

For sensitivity, the values reported from studies like the two in the previous paragraph are not really relevant. In such studies, the reference standard for “truth” is just another PCR test. To make an analogy-to the test from the APPENDIX, a “test for girl”-validating saliva PCR against nasopharyngeal PCR would be like validating hair length against something like wearing earrings (yes/no), having a high voice (yes/no), or having a small shoe size (yes/no). None terribly good markers for “girl.” In validating a test, you don’t want to compare the candidate test to an equally bad (or perhaps worse) test. You want to compare the candidate test to a gold standard. For “girl,” the gold standard might be some biological confirmation (e.g., perhaps a combination of chromosomes, internal organs, external genitalia, hormones; secondary sex characteristics; etc.). For “virus,” the gold standard might be isolates from laboratory confirmed COVID-19. In another meta-analysis, studies compared saliva PCR to just such a biological gold standard. Based on these studies, sensitivity for saliva PCR was ~ 62%. The sensitivity in the Table 1 (and Appendix tables A2-A5) is thus 62%. However, in some of the tables, the sensitivity appears lower due to rounding (the sample is small and we need to look at whole students as opposed to fractions of students). Nonetheless, a lower value may be appropriate since sensitivity will almost certainly be lower at most times, even in people who ultimately develop symptoms.

For pre-test probability, or prevalence in our sample, because SARS-CoV-2 testing in the U.S. has been a complete fiasco, we actually have no idea how much disease is out there-or, at least, we have very little idea. Reported SARS-CoV-2 positivity rates in geographic areas around the school of interest have recently been as high as ~5%. But that number is almost entirely meaningless. Besides problems with sensitivity and specificity (above), community testing is not done at random. Testing is not population-based. It is a mess of medical referral, public-health referral, and self-referral (all more likely to include those who actually have the virus). The true prevalence may be lower. Then again, given the high number of asymptomatic (or minimally symptomatic) people who never get tested, prevalence might alternatively be higher. We just don’t know for sure. Another metric is the number of cases per 100K population. In the county around where the school of interest is located, this number of cases per 100K has recently been ~100. In other words, 100/100,000 or about 0.1%. That number is equally as fictitious as the 5% value above (for similar reasons). To help refine the figure, an earlier report suggested actual cases may exceed reported cases roughly 10-fold. So perhaps 10 x 0.1% or 1% is more reasonable. Although there are prevalence estimators that might also be used, their calculations mostly depend on a lot on iffy assumptions. In the end, we might do just as well with our 1% estimate. This prevalence is the value reflected in Tables 1 (and Appendix table A5).

— — — — — — — — — — — — — — — — — — — — — — —

D.) A STANDARD 2X2 TABLE SUGGESTS THE HARMS — ONE CELL AT A TIME

What follows is an analysis of the harms to expect from weekly, school-wide, SARS-CoV-2 screening. Please note: harms change only by relative degree if prevalence, sensitivity, or specificity ( Section C) increase. You can adjust values and see for yourself with this helpful calculator. In all scenarios, harms likely exceed benefits.

1. HARMS FROM FALSE POSITIVES

Table 2. False positives (wrong about having the virus)

“False positives” are probably the test results of greatest consequence. False positives represent students labeled incorrectly as having the virus when, in fact, they do not. As Table 2 shows, every week, our hypothetical school can expect four times more false positives (n=8) than true positives (n=2). Stated differently, among students told that they have the virus, the verdict will be a lie four times more often than it will be the truth.

Over the course of a month, by rough math, the school might expect false positive results in a total of 32 students: 8 false positive results per week x 4 weeks. Thirty-two students is 8% of the entire school population. And that’s in just one month!

Admittedly, the actual expected number of false positives per month might be a bit lower. For one reason, as students are informed they are positive, they will be removed from subsequent rounds of testing. With each successive testing round, the total number of students tested will decrease: from 400 to 392 (400–8) after the first week; from 392 to 384 after the second week; and so on. For subsequent false positives, the same percentage of a smaller number will be a smaller number.

Regardless, the total number of individuals excluded from school due to false positives is likely to be much MUCH higher. Why? Because: (a) if any of the students incorrectly labeled as “positive” have siblings who are also students, those siblings will have to be sent home too; (b) if a parent of a falsely positive student is a teacher, the teacher will need to be sent home as well. If there is then no one else to teach the class, the entire class will need to go home; © if more than some guideline-suggested arbitrary number of “positives” are found in a class (or cohort, or school), a whole grade (or whole facility) may need to shut down. Needlessly!

What is the toll?

Educational compromise (at best, minimal disruption in shifting to remote learning; at worst, an entirely inferior academic experience)
Psychological distress (not limited to depression, anxiety, fear, confusion, self-blame, disengagement, thoughts of self-harm)
Social detriment (including but not limited to stigma for “positive” individuals who brought “negative” consequences)
Justified resentment of families (particularly families of “positive” children who know the results are very likely to be untrue)
Disparities within families (the standard isolation period for a “positive” individual is 10 days; but for any exposed sibling or parent, the standard quarantine period would be 14 days; there is a scientific basis for these differences, but that is cold comfort for impacted families, especially with the logistical challenges of possible different return dates)
Administrative burden for the school (endless, massive, needless work for school administrators, nurses, and staff, distracting individuals from their usual critical duties)
Ripple effects for society (when children get sent home and there is no one to watch them, will one or more parent need to say home too? What if parents work in education? Or healthcare? Or food systems? Or law enforcement?)
Increased risk for COVID-19: (1) People removed from campus are more likely to engage in behaviors exposing them to the SARS-CoV-2, increasing the chance of virus accompanying them when they return to school-or spreading from them while they are remote; (2) if we exclude those who falsely test positive from future testing (for at least 90 days as the CDC suggests), these individuals will still be susceptible to infection and can still subsequently bring the virus to campus; but these individuals (believing they have “already had it”) might be less vigilant in their behaviors and less concerned about possible new symptoms (e.g., “it’s probably just allergies”).

Note: even in the setting of rising community cases, the actual prevalence would need to more than three times higher before the number of true positives would exceed the number of false positives. Even then, the absolute number of false positive would be high. Unacceptably high. Under any realistic scenario, a school can reasonably expect to unnecessarily, and inappropriately, exclude literally dozens of individuals every month-perhaps even the entire student body!

2. HARMS FROM FALSE NEGATIVES

Table 3. False negatives (wrong about NOT having the virus)

With weekly screening, falsehoods will not be limited to the majority of those testing positive. The school will also necessarily lie to students who test negative. On a weekly basis, testing would be expected to yield two “false negative” results. In other words, two students per week (8 students per month) who actually have the virus will be told: “All clear, come on in.”

If the presumed goal of weekly testing is to identifying asymptomatic cases, false negatives represent testing failure. The virus will not be detected. Instead, it will be welcomed into the school. No one will know infected individuals are infected. Moreover, the impression will be that everyone is virus-free.

What is the toll?

Misrepresenting the facts (falsely communicating to the entire school community; implying virus is not in the midst when, inevitably, it will be)
Increased risk for COVID-19: Being falsely reassured, members of the school community (including families at home) may become lax in behaviors; there may be lesser attentiveness to protective measures (e.g., masking, distancing, hand hygiene, etc.)

3. HARMS FROM TRUE POSITIVES:

Table 4. True positives (right about having the virus)

“True positives” are the whole reason for testing in the first place. Presumably (although it doesn’t make much sense to me), the main goal of weekly screening is to identify asymptomatic cases. The idea is that doing so will make the whole community safer. Will it? Someone having the virus at the time of testing could have had it-and could have been transmitting it-for a full week before detection. In fact, given the problem of false negative results just discussed, if the person had a false negative during the previous week’s testing, that person could have been spreading the virus more than a week! Moreover, the virus detected at the time of testing might be dead, inactive, and non-transmittable. It might pose zero threat to anyone. Deterioration to harmless viral debris generally happens within 10 days of symptom onset when people have symptoms; for asymptomatic individuals, time period is less clear.

What is the toll?

Potentially prolonged absences (true-positive individuals may continue to shed inconsequential viral fragments for months. Excluding such individuals from school for months would be baseless. But testing them again is likely to result in repeat “positivity”- “true positive” from the standpoint of virus being present, “false positive” from the standpoint of infectious implication; even though the CDC recommends not testing “positive” individuals again for 90 days, I have personally cared for patients having two positive results more than 100 days apart, a span matched in published literature)
Increased risk for COVID-19: Somewhat controversially/theoretically, as long as risk-mitigation measures are in place (masking, distancing, barriers, etc.), having a low level of virus in the school might actually be a good thing. Very low viral doses (as facilitated by masking and other measures) could promote asymptomatic cases. Low-level exposure might foster “variolation” until there is a vaccine. Conversely, removing kids from school might support a return to higher-risk out-of-school activities; the result could be higher viral doses, more severe disease, and then greater risk of spread to families and the wider community.

4. HARMS FROM TRUE NEGATIVES

Table 5. True negatives (right about NOT having the virus)

“True negatives” are the students correctly identified as not having the virus. Such students could represent 97% of all those tested. On the surface, this group seems unhurt. But they too face harm.

What is the toll?

Physical discomfort (admittedly, while spitting into a collection container is not that bad, alternative forms or sampling, particularly nasopharyngeal swabs, are very uncomfortable; swabs may be particularly traumatizing in children-especially if repeated week after week).
Psychological burden (regardless of sampling procedure, there may be anticipatory anxiety about testing, not only about the process of giving the sample, but also about potential outcomes-including false results)
Increased risk for COVID-19: Testing necessarily consumes resources (from time and effort to dollars and cents). If resources go to a comparatively unhelpful activity when they otherwise could be devoted to well-established protections, this is opportunity cost. This is harm. As a consequence, COVID-19 might be comparatively more likely than it would have been had resources been applied differently (e.g. to high-grade filtering, to enhanced ventilation, to physical barriers, to disinfectants, to more rigorous cleaning protocols, etc.)

— — — — — — — — — — — — — — — — — — — — — — —

E.) THE FIFTH ELEMENT — UNCERTAINTY

So far, surprisingly, the school for which I have been advising has realized fewer harms than the above math predicts. Part of the reason likely relates to the math itself-or rather the inputs for the calculations. In Section C above, an argument was made for a 1% prevalence. In fact, “functional prevalence” may be lower. By “functional prevalence,” I mean the virus is not evenly distributed across the population but occurs only in those at risk. “At risk” may describe only a minority of people. Indeed, the area around the school was previously hit hard by COVID-19 in the spring. As a consequence, many people were already exposed, many have already recovered, and many may now have some degree of protective immunity. Although early data suggested antibodies to SARS-CoV-2 may wane with time, more recent evidence supports hope for durability. A new tracker estimates SARS-CoV-2 antibodies (seroprevalence) by county. The county around the school has an estimated seroprevalence of ~33%. In other words, about 1/3 of the population may already have some level of protection. Additionally, as much as perhaps another 50% of the remaining individuals may have a degree of pre-existing immunity from T cells. The implications may be that a full 2/3 of the school’s surrounding population (and, by extension, perhaps 2/3 of the student body) could already be relatively invulnerable to SARS-CoV-2; they may have partial protections in place to prevent virus from replicating to a detectable (or transmittable) level. So maybe our 1% prevalence estimate could be revised down by 2/3-perhaps to 0.33% (Table 6).

A lower community prevalence would better explain the outcomes seen so far at the school. At least, lower prevalence would provide better explanation once another factor is considered: indeterminants. Throughout the arguments to this point, we have been operating in a fictional world of clean dichotomies-positive, negative; virus, no virus; symptomatic, asymptomatic. The real world is not quite so neat. There are gray areas. In the area between PCR positive and PCR negative, there are indeterminants.

5. HARMS FROM THE GRAY AREAS

Table 6. Indeterminant test results

Indeterminants are ambiguous test results that can happen for a variety of reasons: insufficient sample quantity; poor sample quality; partial reaction in laboratory assays (e.g., perhaps due other circulating beta coronaviruses); mishandled sample; or even just processing delay (i.e., ambiguity through “no result yet”). Adding to this complexity is the reality that some students (some families) choose to test outside of the school program. Outside testing providers may use different samples, different labs, and different technology. Remember, SARS-CoV-2 testing in the U.S. was never coordinated or standardized. It is the Wild West!

Add to this chaos the variability that chance introduces. In doing so, you are likely to get results like those in Table 6. These results more closely align with the school’s actual experience. Additional variability, week-to-week, likely relates to small numbers and rounding errors (e.g., we can’t have ½ a student testing positive).

Fortunately for the real school, all student with indeterminant results to date have retested to negative. But given indeterminants were most likely just “false positives that didn’t quite make the cut,” what does that mean for other “positive” results which are not definitively “true?”

Unfortunately, there is no way (absent the unrealistic prospect of direct lung sampling, followed by viral culture) to know if a PCR positive is actually “true.” There is also no way to know if a PCR positive is “false.” Of course, you could repeat the test. But two wrongs don’t make a right. Garbage x garbage = multiplicative garbage (plus exponential headaches and expense). The students who tested “not quite positive” (indeterminant) and then tested negative are lucky-at least in relative terms, at least so far. But what about students who test fully positive on the initial screen?

One such student had no exposures, no elevated risk, and, as a consequence, no faith in the “positive” result. That student got follow up testing with a different PCR test. The result was negative. So which result to believe? Seems like a tie. So the student got a second repeat follow-up PCR test: negative again. Two to one. All clear?

In fact, there is no way to know if the first test result was true and the second two false. Or if the first result was false and the second two true. Or if all results were true (possible depending on timing). Or if all results were false (less likely, but still possible). Regardless, lots and lots of costs (financial, emotional, operational, etc.). In the end, the student was kept home “out of an abundance of caution.”

What is the toll?

Psychological distress (not limited to depression, anxiety, fear, confusion, uncertainty, self-doubt)
Administrative headaches (what is the protocol?)
Continued ambiguity (what happens when a student testing “indeterminant” comes back “indeterminant” again on repeat testing?)
Increased risk for COVID-19: Risks are the same as those above (Section D), depending in which of the four cells of a 2x2 table “indeterminants” ultimately land (footnote to Table 6). Regardless, the opportunity costs are enormous. Resources squandered on unraveling ambiguity cannot be directed go towards measures of actual benefit.

— — — — — — — — — — — — — — — — — — — — — — —

APPENDIX

Imagine a test to at-a-glance distinguish girls from boys. Let’s say the “test for girl” is based on hair length: long hair (a positive test) = girl, short hair (a negative test) = boy. [We will ignore that hair can be other lengths and people, of course, can be intersex].

To visualized the testing possibilities, we can use a standard 2x2 table:

Table A1. hair-length screening test to at-a-glance identify girls

We want to see how well the screening test performs. Specifically, we want to determine how often girls have long hair and, by exclusion, how often “not girls” have “not long hair” (how often boys have short hair). So we observe all students arriving to a given school one morning. When we do, we find the following:

Table A2. Values obtained in testing the test

Based on the observed data (Table A2), it turns out girls usually do have long hair. And boys usually have short hair. But not always.

Of 200 girls, 124 (62%) of them had long hair. This is sensitivity (Table A3)-how often the “test” (having long hair) identifies the condition (being a girl) when the student is, in fact, a girl.

Of 200 boys, 196 (98%) of them had short hair. This is specificity (Table A3)-how often a negative test (the absence of long hair) will identify the negative condition (absence of being a girl) when the student, is in fat not a girl. More plainly, specificity in this case is how often short hair will identify a boy when the student is in fact at boy.

Table A3. Test characteristics for the hair-length screening test

Since the boys in our sample did not often have long hair but the girls did often have short hair, the hair-length screening test appears better at correctly identifying when a student is “not a girl” than identifying when a student is a girl. In other words, the test has high specificity but only modest sensitivity.

Great. But that’s not the whole story. Sensitivity and specificity are test characteristics only. They tell you how good a test is, but not how useful it will be in a given population. Usefulness depends critically on another factor: pre-test probability. In other words, before even conducting the test, how likely is it that the condition (in this case, of being a girl) is present in the first place? How common are female students? What is the prevalence of girls?

In the case of the school above, half the student are girls. Thus, the prevalence of being a girl is 50%. In such a scenario, when we get a positive test result (when a student has long hair), there is a high probability that student is, in fact, a girl. This concept, of positive-test accuracy, is known as positive predictive value (PPV). In the case of the school in Table A4, the PPV is 96.9%. What the value means is that a positive test (having long hair) is almost always correct-at least in the given school’s population! In only 4 out of 128 results (3.1%) will a positive test at the school incorrectly identify a “not girl” (a boy) as a girl:

Table A4. Interpreting the test results in a school with 50% girls

For a negative test, the probability of being correct is somewhat less. In other words, a result of “not long” hair (short hair) will only correctly identify a student as a “not girl” (a boy) 72.1% of the time (Table A4). The concept of negative-test accuracy is called negative predictive value (NPV). In 76 out of 272 results (27.9%), having short hair at the school will incorrectly identify a girl student as being a boy student. Being wrong more than 1/4 of the time is obviously not great.

But again, how useful a test is depends critically on pre-test probability-in this case, on prevalence. What if girls are much less prevalent? What if being a girl is rare?

Imagine now we shift attention form a mixed-gender school to an “all-boys” school. Actually, “all-boys,” for this example, will not be not quite accurate. Let’s say the imagined school recently faced legal challenges; as a result, it now has a handful of female students. Currently, girls represent 1% of the student body. How would our hair-length screening test perform at this school?

Table A5. Interpreting test results in a school with 1% girls

Well test characteristics, sensitivity and specify, would remain unchanged. These values do not depend on prevalence. Nonetheless, the calculated sensitivity in Table A appears lower only because we have to round to the nearest whole student (we can’t have fractions of a student, after all). Assuming similar populations characteristics (similar hair-style distributions in this case), test characteristics are unrelated to the population being tested.

PPV and NPV would be much different however (Table A5 vs Table A4). This reality makes perfect sense. A test suggesting “not girl” is very likely to be correct in the setting where there are few female students. Thus NPV will be near perfect (99.5%). Conversely, a test suggesting “girl” is very likely to be wrong when there are few female students. Thus the PPV will be only 20%. Put another way, a positive test is four times more likely to be wrong than it is to be right (8 results vs. 2 results).

Originally published at https://www.linkedin.com.

DON’T GET TESTY — PART 2. Unthoughtful use of SARS-CoV-2 testing can increase risk for COVID-19 (and cause other avoidable harms)

Written by Sean C. Lucan, MD, MPH, MS