|
Objectives
AbbreviationsELCAP = Early Lung Cancer Action Project; LDCT = low-dose CT; RCT = randomized controlled trial Lung cancer continues to carry a bad prognosis among the solid tumors. It accounts for 6% of deaths in the United States each year and is the leading cause of cancer deaths in both men and women. Given the improvement in survival seen with breast cancer, testicular cancer, and colorectal cancer, little progress has been made over the years in survival with lung cancer. It has been thought that the lack of effective screening for lung cancer–and, therefore, the lack of detection and treatment of early-stage disease–is behind the absence of improvement in lung cancer mortality. At the present time, without screening, only 20 to 25% of patients have resectable disease at the time of diagnosis. Screening for lung cancer was assessed in several large randomized studies in the 1960s and 1970's1,2 in the hope that early detection would result in surgical treatment of earlier disease as well as improved survival. Although the studies were of slightly different designs, all found that screening with chest radiographs with or without sputum cytology increased the detection rate of earlier carcinomas and was associated with longer survival, but no decrease in mortality occurred. (Survival and mortality are not the same. See "Terms and Concepts.") The findings of these studies led to the initially disputed, but finally accepted, recommendation not to screen for lung cancer. The advent of chest CT has caused the screening issue to be raised again. CT clearly provides more information about the lung parenchyma and mediastinum than conventional chest radiographs, raising the possibility again that detection of even smaller lesions may decrease mortality. Preliminary cohort studies (not randomized) using the relatively new technique of low-dose CT (LDCT) have shown greatly increased detection of much smaller lung nodules than previously seen on chest radiographs, generating a controversy about renewal of screening for lung cancer.3,4 The controversy centers on the need for randomized controlled trials (RCTs). Many feel such studies will take too long, denying many patients potentially curative surgery.5 Others feel that such trials are needed to truly establish the efficacy of screening in reducing mortality before investing long-term resources and possibly causing harm by evaluation of false-positive results.6 Controversies generally arise for several reasons. One may be inadequate data on which to draw conclusions. Another may be differences of opinion about the interpretation or analysis of the data. A third reason may be that the conclusions derived from the data (the evidence) conflict with entrenched practices or beliefs. Sometimes what is intuitively obvious turns out not to be true when tested with elimination of bias and confounding variables, which is the goal of the RCT. The medical literature is replete with RCTs that have disproved established beliefs. For example, extracranial-intracranial bypass was at one point one of the most commonly performed neurosurgical procedures, until an RCT published in 1985 demonstrated no effect of the procedure in preventing cerebral ischemia.7 Extracranial-intracranial bypass is now performed infrequently, in only a select subset of patients. A second, more recent example is the landmark study of arthroscopy for osteoarthritis of the knee employing sham surgery in one of the study arms.8 This study was able to establish that arthroscopic intervention was not better than sham surgery. Both studies challenged established beliefs and both yielded important information for future patient care. There are many more such examples that have changed our treatment of diseases such as congestive heart failure and coronary artery disease. In order to "reassess" lung cancer screening, this Update will define some terms and concepts important to the controversy, review past screening efforts, review the results to date of cohort studies, and offer a view of the current status of screening with LDCT. Terms and ConceptsA number of terms are important in understanding and interpreting screening data. The prevalence screen is the initial baseline screening examination. It will identify existing disease. The incidence screen is the subsequent screening examination that will detect new disease. Lead-time bias refers to a bias of diagnostic intervention. If a tumor is detected earlier by screening than it would have been had it become symptomatic, but the patient would have succumbed to it anyway from a set time (ie, a given number of cell divisions from its inception), the patient will appear to have survived longer after diagnosis by screening. In this case, earlier diagnosis does not affect mortality but will alter survival. Length bias, present mostly in prevalence screens, refers to the tendency of slow-growing tumors to show up on initial examinations. Although lethal, they have a longer preclinical phase than aggressive tumors presenting at interval examinations with symptoms. Overdiagnosis is a much-disputed concept in lung cancer. It refers to the possibility that nonlethal cancers exist that will be detected and treated as a result of screening. This situation has been observed in prostate carcinoma, which is a frequent incidental finding at autopsy. Perhaps because lung cancer is seen as such a lethal tumor when it presents symptomatically, the concept of overdiagnosis has been difficult to accept. Data do exist, however, that support the concept. Two autopsy studies have found evidence of lung cancers that were not suspected during life and were not the cause of death.9,10 These cancers occurred in greater numbers than seen in the general population. This finding suggests the presence of clinically insignificant or pseudodisease . In a screening study in Japan,11 lung cancer was found equally in men and women, although the women were largely nonsmokers, in contrast to the men. The finding of carcinomas in large numbers of nonsmokers raises the question of pseudodisease, but does not establish it without evidence of mortality differences. In the study from Japan, there are no data about secondhand smoke that might offer an alternative explanation for at least some of the cancers in nonsmoking women. Trial designs also need to be understood for both their strengths and limitations. The randomized controlled trial (RCT) has been viewed as the gold standard for many years and has helped establish important information that has altered patient care. In terms of screening, participants are subjected to different levels of screening and the end point is disease-specific mortality. The point of such a design is to eliminate confounding variables and selection bias. RCTs established the value of fecal occult blood screening for colon cancer and the lack of value of chest radiographs and sputum cytology in screening for lung cancer. These studies are not without problems, however. They may be complicated by insufficient power to detect the expected difference in mortality and they may have excessive crossover. Population-based studies are broad screening programs used to assess disease-specific mortality. This design was used to establish the efficacy of cervical cancer screening. This sort of study may, however, be complicated by artifactual increases in incidence and mortality from screening, as was seen in prostate screening with the prostate-specific antigen test.12 Finally, observational (cohort) studies use screening in a selected group (the cohort) and assess efficacy by earlier detection of disease. The study assumes that earlier detection equates with lower mortality. Historical data on early-stage survival or survival data from follow-up are used. This design may be complicated by lead-time bias, length bias, and bias in recruitment or selection of volunteers. In addition, survival may not equate with mortality for a number of reasons. Survival, or the standard 5-year survival, is the percentage of people with the disease surviving 5 years after diagnosis. The mortality rate is the age-adjusted number of people dying from the disease per 100,000 per year. This figure is disease-specific mortality. All-cause mortality or overall mortality is the age-adjusted number dying from any cause per 100,000 per year. Interestingly, and perhaps not intuitively, survival and mortality do not necessarily correlate with each other.13 Both incidence and survival may increase if earlier disease or pseudodisease is detected. However, unless there is an effective treatment, mortality will not improve. An additional concept of importance is that of stage shift . This concept is related to survival figures and lead-time bias as well. For a screening program to be efficacious, it must not only demonstrate detection of earlier-stage disease, but also a drop in late-stage disease. This is called stage shift and provides necessary, but not sufficient, evidence of screening efficacy. Mortality must still decrease. History of Lung Cancer ScreeningAs stated before, five major RCTs of lung cancer screening were completed in the 1960s and 1970s in the United States, the United Kingdom, and Czechoslovakia, and collectively screened more than 90,000 people. Because these studies were very similar and had similar results, the Mayo Lung Project 1 will be reviewed as an example. This study screened 10,933 male smokers over the age of 45 years with lung function good enough to tolerate a lobectomy. After initial screening with chest radiographs and sputum cytology, prevalence cases were identified in 0.83%. The remainder were randomized to undergo chest radiographs and sputum cytology every 4 months for 6 years. The control group was advised to have a yearly chest radiograph and sputum cytology, which was the standard advice at the Mayo Clinic at that time, but no attempt was made to encourage this approach beyond the suggestion. Compliance was 75% by the end of the study, and more than half of the control group had chest radiographs during the study. More lung cancers were detected in the screened group (206 vs 160) and more were resectable (48% vs 32%). However, there was no difference in lung cancer mortality at the end of the study or at 20-year follow-up.14 There was also no difference in all-cause mortality. In fact, there was a slightly higher, but not statistically significant, increase in mortality in the screened group. Also, no stage shift was seen. An increased number of early cancers were detected, but the same number of late-stage cancers occurred as well. It has been noted that the study was insufficiently powered, having been designed to detect only a 50% mortality reduction. Significant crossover also occurred, with many in the unscreened group following the advice to obtain an annual chest radiograph. Nevertheless, the results were the same as for the other four large screening trials at the time. The results have been extensively analyzed and debated over the years. Most recently, Marcus et al 14 analyzed the extended follow-up (20.5 years) of 6,523 participants from the initial study. As previously noted, survival was longer in the screened group (16 vs 5 years), but mortality was not different. These data suggest either a lead-time bias or the presence of pseudodisease. A reevaluation of the pathology specimens by three pathologists blinded to the origin of the specimens found good agreement between observers in 85.5% of the cases.15 There was disagreement about the invasiveness of 8 lesions, 7 of these from the screened group. There was an increased number of carcinomas in situ in the screened group, which resulted in more squamous cell carcinomas. This finding was believed to possibly account for some cases of overdiagnosis. These interpretations have been disputed by Strauss,16 who reanalyzed the trial as a closed cohort design and believed that an increased incidence of cancer in the screened group accounted for the poorer mortality. He concluded that the study was not well randomized and that survival rather than mortality was a better marker of successful screening. This view has also been disputed. LDCT ScreeningRegardless of the interpretation of the original Mayo Lung Project, the ability to detect much smaller nodules, and so presumably much earlier disease, with low-dose radiation by LDCT has sparked a new era of controversy. The studies published to date are all observational cohort studies and survival data are as yet unavailable. The original studies were done in Japan17 and there are several trials ongoing in the United States and Europe.18-21 As expected, LDCT is capable of detecting many more smaller nodules than chest radiography. In fact, so many small nodules can be detected that evaluation of false-positive findings becomes a significant issue. Both the Mayo Clinic experience and the Early Lung Cancer Action Project (ELCAP) will be reviewed. The Mayo project is a prospective cohort study. Its most recent published report describes 1,520 enrolled people 50 years of age or older with a smoking history of 20 pack-years or more.18 Approximately equal numbers of men and women have been enrolled. The subjects have undergone three annual LDCT examinations of the chest and upper abdomen. Annual sputum cytologic examinations were also done. Two years after baseline LDCT, 2,832 uncalcified nodules were identified in 69% of those screened. Forty cases of lung cancer were found; 26 were prevalence cancers; 10 were incidence cancers; 2 were diagnosed by sputum only; 2 were interval cancers. Four of the incidence cancers were seen in retrospect on an annual LDCT. One criticism of this study is that only one of four radiologists read each CT. Interobserver and intraobserver variability was not assessed. On the other hand, reading by a single radiologist more closely corresponds to actual practice situations. This study used an advised algorithm to deal with the very large number of nodules, most of which were false positives. They recommended CT in 6 months for nodules <4 mm; CT in 3 months for nodules <8 mm but ≥4 mm; CT or positron emission tomography for nodules 8 to 20 mm; and biopsy for nodules larger than 20 mm. Ninety-six percent of participants returned for their second annual examination, which is an extraordinary follow-up rate. Four of the 40 cancers detected were small cell carcinoma, but were limited stage. Of the non-small cell lung cancers, one was stage IV; 5 were IIIA; and the remainder were 1A, 1B, or IIA. Potentially curative resection was performed in 31 patients. Eight underwent resection of benign lesions. A large number of abnormalities of clinical significance (n=696) were also found that required evaluation. Nineteen participants have died: three in year 1, six in year 2, and 10 in year 3. Five were lung cancer deaths. Long-term survival data are not available. The ELCAP study, based out of Cornell and New York University Medical Centers, has enrolled 1,000 people aged 60 years or older who have at least a 10-pack-year smoking history and are able to undergo a thoracotomy. 22 The design calls for baseline and annual LDCT examinations. Twenty-three percent of the prevalence screenings were positive vs only 2.5% in the subsequent annual incidence screenings. New nodules confirmed by high-resolution CT were considered positive. Five of 7 malignancies were stage IA. Apparently no benign nodules were resected, and the authors believe that a 1-year interval for LDCT follow-up of nodules is sufficient. No long-term survival data are available yet. An Assessment of LDCT to DateLDCT unequivocally detects more nodules than chest radiography. Many of the detected carcinomas are small and appear to be stage IA. As with any imaging study, lesions will probably still be missed and found in retrospect, as was seen in the Mayo Clinic study and has been found consistently in the past for chest radiography. The imaging problem presented by this new technology is not so much missed lesions, but an abundance of lesions. One could say that we are seeing more nodules than we know what to do with, particularly because most of them are benign. The Mayo Clinic and the ELCAP studies have different algorithms to deal with the plethora of new, mostly very small nodules, ranging from 3-month to 1-year follow-up schedules. In addition, evaluations of incidental, but potentially clinically significant, lesions found on scanning the chest and abdomen may become problematic because most of these turn out to be benign. If screening turns out to be efficacious, approaches to these problems will have to be developed to evaluate real coincidental disease while minimizing invasive and costly tests for those with benign disease. The most serious problem, however, is the assumption that improved survival will equate with improved mortality. We do not yet have incidence survival data from these cohort studies, but with the increased detection of very small nodules, lead-time bias alone can be expected to account for improved survival. Is there any reason to expect that this time, with LDCT nodule identification, survival will track with mortality when it did not in the earlier RCTs of chest radiography? We do not know, but some data suggest it may not. Although surgical series find improved survival with resection of earlier-stage lesions, there are recent studies questioning this finding. In 510 patients with pathologic stage IA (T1N0M0) lesions, there was no relation between tumor size and survival.23 Another study found no difference in stage distribution based on lesion size for lesions ≤3 cm.24 Why might this be? It seems very possible that we do not have a good marker of the biological aggressiveness of tumors, particularly in the less advanced anatomic stages. A powerful example of this is the finding of micrometastases in the bone marrow of more than half of 139 patients undergoing curative resection for non-small cell lung cancer.25 This finding of unsuspected distant spread disturbingly questions the basis of our current anatomic staging system, at least within stage I, but such questions need to be raised. If the incidence of early tumors can be increased by screening and survival can increase without an effect on mortality, one strong implication is that our therapy is not effective. Survival will not track mortality without effective therapy as previously described. Surgery alone for stage I disease may not be the answer. For now, with our current treatment options, to truly demonstrate efficacy of screening, mortality must be shown to decrease. This demonstration can only come from an RCT. One such trial (funded by the National Cancer Institute) is underway, with results expected in 2009. Disease-specific mortality is the end point and the study has a 90% power to detect a 20% reduction in mortality. Some feel that 2009 is too long to wait. However, the potential harm caused by overtreatment of possible pseudodisease or the cost of screening without benefit seem to more than justify the wait. The examples of needless surgery mentioned at the beginning of this article may serve as reminders of the potential harm of action based on inadequate data. If a mortality decrease is established by an RCT, more information will need to be collected about the best algorithm for following tiny nodules, the effectiveness of screening programs in community settings, the cost-benefit ratio, and who should be screened because the current programs have different entry criteria. Additionally, it will be helpful if both disease-specific and all-cause mortality can be examined.26 All-cause mortality may deal with the problem of the tendency to attribute cause of death to lung cancer if a patient was in a screening program even if the death was not from lung cancer, the so-called sticking phenomenon. All-cause mortality may also be a way to take into account the effects of diagnostic and therapeutic procedures as well as the effects of comorbidities seen in smokers (ie, coronary artery disease and emphysema) that may limit their life expectancy even with effective treatment of lung cancer. If no mortality difference is seen with detection of tiny nodules, perhaps we need to question our approach to treatment and assessment of distant disease. Biological and genetic markers may play crucial roles in the future.
References
|