Estimating Annual Charges for Ambulatory Care from Limited Utilization Data Barry G. Saver and Edward H. Wagner Objective. This study explores the types of utilization information needed to produce a reasonable estimate of annual charges for ambulatory care that could be used in the absence of charge or cost data as an aggregate utilization measure. Data Source. Charge and utilization data from the RAND Health Insurance Experiment were used. Study Design. Services provided to enrollees in the Health Insurance Experiment at each of the six sites for a one-year period were grouped into categories according to California Relative Value Studies (CRVS) codes. Using annual charges as the dependent variable, we evaluated linear regression models for their predictive accuracy, as indicated by adjusted R2-values. Categories of services were combined on the basis of clinical meaningfulness (e.g., all provider visits into one group), and predictive accuracy of models with these groupings of services examined. We examined model validity by applying the derived models to each of the 30 remaining site-years of data from the Health Insurance Experiment. Principal Findings. We were able to explain 84 percent of the variance in charges with a model containing counts of provider visits exclusive of mental health visits, mental health provider visits, days drugs were prescribed, days radiologic procedures were performed, procedural visits subdivided according to whether they were performed by a surgical or medical provider, days laboratory and/or pathology tests were performed, days a grouping of miscellaneous tests were performed, and days supplies were purchased. When applied to the validation data, this model predicted a mean of 77 percent of the variance and mean charges 102 ± 9 percent of actual mean charges. A model with only the first four of the listed categories explained 77 percent of the variance in charges. Conclusions. Models using only counts of several broad categories of services perform rather well in predicting annual charges for ambulatory care. Keywords. Ambulatory care; costs and cost analysis; models-theoretical
Ambulatory care accounts for an important part of total medical costs, and its share has been increasing with changes in Medicare reimbursement
HSR: Health Srvices Research 29:6 (February 1995)
for inpatient care (Schwartz and Mendelson 1991). Assessing the cost of ambulatory care is, however, frequently problematic, as true costs are not commonly known and are labor-intensive to estimate. Fee-for-service insurers can use charges as a surrogate for cost, despite the differences between them (Altman and Socholitzky 1981; Finkler 1982; McMenamin 1990), since charges frequently are their costs. Non-fee-for-service systems, such as many health maintenance organizations (HMOs) and the Department of Veterans Affairs (VA), usually do not generate charges or the detailed service-based records that would allow for the direct imputation of charges, and would tend to be more interested in true costs than in charges. Many practices now must deal with a variety of payers, including traditional fee-for-service insurance, preferred provider organizations, Medicare, Medicaid, and a variety of HMOs, so that even if charges are available, patients receiving identical services may generate quite different charges. Much of the research on costs of ambulatory care has focused on case-mix assessment and payment mechanisms (Altman and Socholitzky 1981; Epstein and Cumella 1988; Horn, Buckde, and Carver 1988; iUon, Vertrees, Malbon, et al. 1990; Newhouse et al. 1989; Rogerson et al. 1985; Schneeweiss and Hart 1988; Schneider, Lichtenstein, Freeman, et al. 1988; Tenan, Fillmore, Caress, et al. 1988; Weiner et al. 1991; Wouters 1991; Young, Joyce, Bivens, et al. 1988). However, we were not interested in prospective estimation of costs, but in measurement of costs associated with services already provided. Kukull, Koepsell, Conrad, (1986) et al. have developed a model for estimating hospitalization charges from limited, readily obtainable information, but we were not able to find any analogous models for estimating costs of or charges for ambulatory care. Chapko, Ehreth, and Hedrick (1991) address the issue of estimating costs of care in the VA system, and discuss four methods: measuring input costs, use of the VA cost accounting system, the VA reimbursement system, and use of charges from a surrogate health care facility. They qualitatively compare Part of this work was carried out while Barry G. Saver was supported by a National Research Service Award (Bureau of Health Professions, HRSA, Grant #5 T32 PE10001) to the Departments of Medicine and Family Medicine, University of Washington. An earlier version of this work was presented at the Agency for Health Care Policy and Research's Second Annual Primary Care Research Conference, San Diego, CA, January 1991. Address correspondence and requests for reprints to Barry G. Saver, M.D., M.P.H., Acting Assistant Professor, Departnent of Family Medicine, HQ-30, University of Washington, Seattle, WA 98195. Edward H. Wagner, M.D., M.P.H., is Director, Center for Health Studies, Group
Health Cooperative in Seattle. This article, submitted to Heakh Sermices Researai on March 21, 1994, was revised and accepted for publication on August 8, 1994.
Annual Charges for Ambulatory Care
these methods, but do not have measures of accuracy and cannot easily assess the incremental benefit of obtaining any particular piece of additional information. One of the most common reasons for examining costs of ambulatory care is to study practice patterns among primary care providers. There has been much less study of variations in primary care patterns than of variations in specialty care and hospitalizations, but the studies that have been carried out often use either counts of one or more specific services, charges for one or more services, or both (e.g., Eisenberg and Nicklin 1981; Greenfield, Nelson, Zubkoff, et al. 1992; Hartley et al. 1987). In one study, charges for care were estimated using counts of outpatient and emergency room visits along with charge data from one system for average provider fees, but the authors could not estimate what other costs might have been incurred, such as laboratory or radiology fees (Baldwin, Inui, and Stenkamp 1993). In another study, the authors classified services into three categories ("diagnostic, therapeutic, and disposition services") and counted the number of services provided in each category; they then summed these counts to create a "total activity level index" (Davis and Yee 1989, 192). This provides only a crude measure of total resource utilization and practice style. The need to look at use of multiple types of resources when studying practice style and resource utilization was explicitly recognized by one set of authors, who stated that "attention to the use of only one type of resource may result in a distorted picture of how physicians care for their patients and the costs that such care incurs" (Hartley et al. 1987, 565). Thus, there is clear need for a measure of overall ambulatory resource utilization for such studies; charges, the simplest scheme to aggregate utilization measures, are often unavailable. Actual cost measures are almost never available. Our goal was to create a parsimonious model to estimate charges for ambulatory care, as a cost surrogate, analogous to the Kukull model for inpatient charges. We have employed utilization records to construct and validate models using information that is likely to be available from administrative data systems in non-charge-based settings, and also likely to be obtainable from rapid chart review or even brief patient interviews. We have explored the types of services needed in such a model (i.e., are some services so highly correlated with others that, knowing some, it is unnecessary to know the others?) and evaluated whether exhaustive detail about rendered services was needed to obtain a reasonable estimate, or whether less-detailed information would suffice. We have sought to test the generalizability of our results by evaluating data from multiple geographic locations spanning a number of years.
HSR: Health Services Research 29:6 (February 1995)
METHODS Data Source Data used for this study were obtained from the RAND Health Insurance Experiment (HIE) (Newhouse, Manning, Morris, et al. 1981). This experiment spanned more than seven years (1974-1982), and the enrollees received services in six geographic locations (Dayton, Ohio; Seattle, Washington; Fitchburg and Franklin County, Massachusetts; and Charleston and Georgetown County, South Carolina) in a variety of settings. Enrollees in the HIE could not be Medicare-eligible or be expected to become so during the course of the experiment, so persons 62 years of age and older at the time of enrollment and those with end-stage renal disease were excluded. Military personnel, veterans with service-connected disabilities, institutionalized persons, and those with family incomes over $25,000 in 1973 dollars were also excluded. Other than these exclusions, persons offered enrollment in the HIE were a random sample from the population at each site. While participants were enrolled in a number of plans with varying costs, copayments, and deductibles, covered benefits were uniform and quite comprehensive for all participants. The Seattde HMO data were not used since no drug or supply charges existed for these participants, and other charges were created with the use of an outside fee schedule. Participants enrolled in the HIE at one of six sites for up to a maximum of five years. Enrollment commenced at different times at different sites, ranging from November 1974 for Dayton, Ohio to November 1976 for the two South Carolina sites. Each participant's utilization was aggregated on an annual basis, with the date of enrollment serving as the dividing line between one year and the next. The data for all enrollees at a site during one year of the study is referred to as a site-year. The two South Carolina sites had new groups of enrollees start participation in the third year of the study (starting November 1978 instead of November 1976), for a maximum of three years, and annual information about these enrollees was kept separate from information about enrollees whose participation started in November 1976. This resulted in a total of 36 site-years of data being available for use. Several HIE data files were used. The fee-for-service annual visit file contains annual expenditures for outpatient medical services exclusive of mental health services, outpatient mental health services, outpatient drug costs, and outpatient supply costs, totaled from individual bills submitted to the HIE (Peterson, Nelsen, and Bloomfield 1986a). Visit counts for outpatient mental health and all other health provider visits were constructed from the fee-for-service outpatient visits file, which contains information aggregated from a single visit with a single provider on a single day (Peterson et al. 1986b). A record representing a visit in the outpatient visits file could contain
Annual Charges for Ambulatoty Care
up to four California Relative Value Studies (CRVS) codes (California Medical Association 1975), and we used these CRVS codes to classify services. The provider identifier was linked to the provider file to classify physicians as primary care, medical subspecialist, or surgical subspecialist. The fee-forservice claims line-item files contain information about every billed service, drug, or supply (Peterson et al. 1986c). These files were used to remove drugs and supplies prescribed by dentists and to retrieve additional CRVS codes not found in the outpatient visits file when more than four were present for a visit. In addition, outpatient service charges that could not be associated with a specific visit (referred to in HIE documentation as "file 1 1-only visits") were deleted. All data with coding errors that could be systematically detected (e.g., mismatch in identification information between files) were excluded; isolated, nonsystematic coding errors detected incidentally during examination of the data were left unchanged. All analyses were performed with SAS (SAS Institute Inc., Cary, North Carolina). Inflation adjustment of charges from different years was necessary to make them comparable. The documentation provided by RAND includes medical inflation-adjustment factors relative to 1967 from the U.S. Bureau of Labor Statistics for all months during which the HIE operated (D'arc Taylor et al. 1987). For each participant, each year's charges were approximately converted to constant 1967 dollars by dividing the annual charges by the factor corresponding to the month and year in which the contract year for that person started.
Conceptual Approach Our goal was to determine a minimum dataset needed for reasonable prediction of annual charges for ambulatory care. We used per person annual charges for outpatient services as the dependent variable in linear regression models and counts of resources used as predictors. CRVS codes were used to classify all services into categories. Categories of resources ("predictor categories" in Table 1) were chosen that were clinically meaningful and, in the absence of itemized lists of all provided services, could be obtained from chart review. We did not transform either our dependent or independent variables to make their distributions more normal, as any such transformations would have distorted the inherently additive dependence of cost on the numbers of services provided. We sought to maximize the adjusted proportion of variance explained (R2) with as few predictors as possible. The adjusted R2 was used to correct for the number of predictors in the models but, given the sample sizes, was little different from the unadjusted R2. Some categories were divided into the groups of services listed in the second column of Table 1 to determine whether these finer groupings would yield appreciably better estimates. In addition, physician office visits other
HSR: Heatkh Servics Research 29:6 (February 1995)
Table 1: Predictor Category Composition, with Service Types and CRVS Codes Pred Cateoy Seroices CRVS* Codes Provider visits other than Office visits 90soo-90088 mental health visits
Mental health visits Procedural visits
Total prescriptions filled Total supplies
Dietary Home visit, any provider Skilled nursing facility/Nursing home Emergency room visit Consultation Well-child; developmental Dialysis, misc. gastrointestinal procedures Optometry/Ophthalmology visit Occupational thrapy/Physical therapy Radiation therapy Psychotherapy, etc. Major and minor therapeutic and diagnostic procedures Plain films, computerized tomography, contrast studies, ultrasounds, etc. Nuclear medicine studies/therapy Laboratory tests Laboratory tests Pathology tests/procedures Misc. eye, ear, nose, and throat tests/procedures Misc. cardiac tests Multiple misc. tests and procedures Count of all filled prescriptions Count of all supplies purchased
CRVS codes not used in models: Immunizations and injections Cardiac catheterization/angiography (none in data set) Miscellaneous charges *CRVS = California Relative Value Studies.
90092-90098 90100-90194 90305-90470 90500-90570 90600-90645 90751-90778 90900-91299 92001-92019 97000-97799 77000-77999 90803-90899 10000-69660 70002-76999
78000-79499 80104-87211 89000-89999 88104-88399 92020-92799 93000-93279 93700-96901
than with mental health providers were further classified based on type of provider (primary care physician, medical subspecialist, or surgeon), new versus established patient, and short versus long visit (<8 versus >8 CRVS units, so that anything more than a limited visit with a new patient or an intermediate visit with an established patient would be a long visit). Some categories, such as laboratory tests, were not subdivided. For laboratory tests,
Annual Charges for Ambulatory Care
this was for several reasons. First, many different laboratory tests exist, and the only easy way to classify them would be with a price list, which would be extremely time-consuming in the absence of a computer database of all tests; the CRVS codes for laboratory tests are not ordered in any fashion that is helpful for estimating relative values of the tests. Second, whether and which tests are bundled together is idiosyncratic and varies widely between labs, so that one "test" at one lab might be three, or five, or even more tests at another lab, further complicating attempts at classification. Analogous situations apply to other categories. In addition to testing models with counts of services in each category, we examined models in which all counts of services other than face-toface provider encounters were replaced by counts of the number of days on which at least one of these services was obtained, to evaluate whether knowing the actual number of such services provided was important for estimating charges, or whether knowing only the number of days any such services were provided would suffice. This would make data acquisition far less labor-intensive if detailed computerized records were not available, since, rather than having to tally the number of services provided on a given day, only an indication of any versus no services would be needed.
Model Development We merged the year 1 data from the six sites to create a file for model derivation. This still left 30 site-years of data for validation, but yielded a sample size sufficient (N = 4,518) to ensure that results should not be overly sensitive to pricing quirks at a particular site or to any particular outlier. This also allowed using data from all sites and multiple years for validation, providing a more robust test of validity than, for example, a split-half design would allow. We chose to include persons with minimal or no charges for two reasons: first, such persons represent valid data and the regression line should, in fact, pass through the origin; and second, removal of such persons had little effect in models containing more than a few predictors. Stepwise and forward linear regressions were performed with all of the services shown in the second column of Table 2 and the further classification of office visits described earlier as predictors to evaluate their relative importance and allow comparison of the sizes of the regression coefficients. Following this, natural groupings of these services were examined to see whether important explanatory power, as measured by R2, was lost by aggregation. This was necessarily a subjective decision, since most variables representing delivered services were statistically significant. We were able to group services together into the small number of categories shown in column 1 of Table 1 without appreciable loss of predictive
0Q 0t !b 8
m wU 9 o e4 1* o)o N
C4 co N 0) w Ns oo~ mCcsi C54 1
-4 -4 -
O _ : l)_
CC) n IrCv
-1* t%oo 4 oo
w C' '
0 -4 -
(m 00 0) t -
v 0 0
8 8 A V I
Annual Chargesfor Ambulatory Care
accuracy in models containing services from more than one or two of the categories. For example, classifying provider visits on the basis of length, provider type, and new versus established patient was useful in models containing only provider visits, but contributed no appreciable predictive accuracy when models including several other predictors were tested. Following initial model derivation, outliers and influential observations, as judged by jackknife residuals and Cook's distance (Kleinbaum, Kupper, and Muller 1988), were examined to guide refinement of the models. The primary goals in this process were to evaluate whether these observations could be fit better by altering the classification scheme and if their presence appreciably altered the results obtained. Not surprisingly, we found that models performed better when nearly all services were represented in some fashion. For example, we created a category consisting of a miscellaneous grouping of tests, even though none of the smaller groupings of these tests occurred frequendy enough to be important. Without this category, the rare person with a large number of one or more of these tests became an oudier. Adding this category improved the estimates for these oudiers by allowing some charge, even if imprecise, to be attributed to those who received one or more of these services, and it slighdy improved the overall fit of the model.
Model Validation We report data on evaluations of the most complete of the models we present, as this indicates the best that one is likely to do with this technique. We will refer to this as the test model (later referenced as Model 9). Only one year of data from each of the six sites had been used to develop this model, leaving 30 unused site-years of data for validation. We evaluated potential overfitting of variable selection by performing regressions on each of these 30 site-years of data, using the variables from the test model as predictors. This yielded 30 models, each derived from one site-year of data, all using the same nine predictors, but each having different coefficients based on the "best fit" model yielded by the regressions. The individual R2-values for each of these models was examined. We next assessed the potential overfitting of the coefficients of the test model (given the nine predictors) by computing R2-values for each of the 30 site-years using both the variables and the coefficients from the test model, and comparing these R2-values to the R2-values obtained from the 30 previous site-year-specific regressions. We also compared the mean annual charges estimated using the test model's coefficients to the actual mean charges for each of the 30 site-years of data.
HSR- Health Servics Researc 29:6 (February 1995)
RESULTS We classified services into the eight broad categories shown in Table 1. These categories are: all provider visits1 other than mental health visits; all mental health provider visits; the number of visits in which one or more procedures were performed; all radiologic procedures; all pathology and laboratory tests/procedures; a grouping of cardiac, pulmonary, ophthalmologic, otolaryngologic, and neurologic tests ("miscellaneous tests"); prescriptions filled; and supplies purchased. Table 2 shows the regression coefficients and R2-values for models utilizing counts of the number of days on which at least one service in a category was provided for non-face-to-face visits. There was little difference in performance between models employing counts of services and models employing counts of days any services were provided, with the most complete model using counts of days (model 9) explaining 84 percent of the total variance and the equivalent model using counts of services explaining 86 percent of the variance. Outlier and influence diagnostics also suggest similar performance of the two types of models (data not shown). Therefore, we have chosen to show data only for the less information-intensive models based on counts of days services were provided. Figure 1 shows the fit of model 9 to the six site-years of data used in deriving the models. Classification of provider visits based on length, new versus established patient status, and primary care versus specialist provider increased the R2 of a model containing only physician visits exclusive of those for mental health purposes from .40 to .43, but in the most complete models, the R2 was increased by only .002. Similarly, littde accuracy was lost in more complete models when nonphysician visits were combined with physician visits. We subdivided procedural visits into two groups based on whether the procedure was done by a medical or surgical practitioner, having found that in the more complete models, a significant number of outliers resulted from the inaccurate estimation of procedure values. The models were somewhat improved by this classification, which uses information likely to be available in the absence of charges; subclassification of medical providers as primary care or specialty providers was not helpfil. Another model, containing counts of billed CRVS procedure codes from medical and surgical providers rather than counts of procedural visits, slighdy reduced the number of extreme oudiers but did not improve the overall R2 of the model (data not shown). To evaluate the most that could be achieved using these techniques, we evaluated a model using counts of all services shown in Table 1, before collapsing them into broader categories and using the visit and procedural classifications just described. It yielded anl R2 of .88, confirming that the
I CU 0Cd V-
I Q V*_
biD 6e V2
I A .1 .98 ,I.E
HSR: Health Services Research 29:6 (February 1995)
process of collapsing into the categories we employed resulted in little loss of predictive accuracy. We also evaluated the utility of adding demographic variables to the models. Patient age and sex alone yielded an R2 of .08, but their explanatory power decreased dramatically when utilization variables were added: the R2 for model 1 was only .03 higher when age and sex were added, and it increased by only .003 for model 9. We therefore chose not to include these variables in our models. Collinearity did not appear to be an important issue in these models. For the independent variables in model 9, the largest bivariate correlation was .52 (between provider visits and prescription drug use), and the largest variance inflation factor (Kleinbaum, Kupper, and Muller 1988) in model 9 was 1.57. To help evaluate the importance of each of the categories of model 9, R2-values for regressions with each predictor removed from the model are shown in Table 3, along with R2-values for each category as a single predictor of charges. It can be seen in Table 3 that only mental health visits, and, to a lesser extent, provider visits other than mental health visits, appreciably decreased the R2 when removed as single predictors. The reason for the importance of mental health visits, despite their having less power as a single predictor, is that there is little correlation between them and other services, and mental health visits have a more skewed distribution with a much longer tail than other services. Examination of Tables 2 and 3 also suggests that, as long as minimization of a few outliers is not a priority, relatively simple models such as models 4-6 perform nearly as well, and little precision is lost by dropping the additional categories. When we evaluated variable selection in model 9 with regressions on the remaining 30 site-years of data, containing from 175 to 1,223 Table 3: R2-Values for Models with Each Category Deleted and for Single Predictor Models
Predictor Provider visits other than mental health visits Mental health visits Procedural visits Number of days x-rays performed Number of days prescriptions filled Number of days path/lab tests performed Number of days supplies purchased Number of days miscellaneous tests performed
Model 9 with
.79 .67 .82
.81 .81 .83 .83 .83
.41 .25 .13 .24 .39 .31 .13 .06
Annual Chargesfor Ambulatory Care
observations, the mean R2 (±1 standard deviation) was .84 ± .05 (range .76-.94). We then tested the constancy of the coefficients of model 9. Using these coefficients with the 30 site-years of data, the mean R2 was .77 ± .08 (range .57-.90). The three site-years with the lowest R2-values in this test all had only about 200 observations. Finally, we compared the estimated mean annual charges predicted by model 9 for each of the 30 site-years to the true means. Figure 2 shows a histogram of the ratios of the estimated means to the observed means. The mean estimated/observed ratio was 1.02 ± .09 and the range was .87-1.2 1. Figure 2: Year
Ratio of Estimated to Actual Means for Each Contract
0.90 0.95 1.00 1.05 1.10 1.15 Rati of Predicted to Actual Mean Coat
HSR Health Serices Research 29:6 (February 1995)
DISCUSSION We have demonstrated the feasibility of using rather limited utilization data to create models with quite good predictive accuracy for annual charges for ambulatory care. Our evaluation indicates that one can make good estimates of charges even with simple counts of the number of days on which categories of services are provided, and that knowing the number of each type of service provided on a day is relatively unimportant. In the absence of any other charge or cost information, use of the models we have developed provides a rational and consistent method of combining different utilization measures into a unitary measure, and gives an idea of the relative influence of information about different types of service utilization on the accuracy of the measure. While models based on cost rather than charge information might be preferable in many applications (in that charge biases resulting from the idiosyncrasies of fee-for-service medicine in the United States would be eliminated), costs even for the same services vary from place to place and time to time. This is due to the wide regional variations in labor costs, facility costs, and many other costs. It may be, therefore, that charge estimates produced by these or similar models will prove useful as cost surrogates for many comparisons across systems, locations, and time. Assignment of a value to procedural visits remains a source of inaccuracy. The subdivision of procedural visits on the basis of provider type yielded only a modest increase in the R2 of .01 in model 9, and examination of outliers revealed that a sizable number still resulted from imprecise estimation of procedure charges. We evaluated how much the model would be improved with even a very crude measure of procedure value by classifying procedural visits into three categories on the basis of charge: <$20, $20-$ 100 and >$100. Substituting this classification of procedures into model 9 resulted in a noticeably better fit, with an R2 of .88. Thus, even this simple classification scheme results in a significant improvement in estimation of charges and suggests that, if possible, procedures should be classified by a means more accurate than provider specialty. For systems in which CRVS codes, CPT codes (for Current Procedural Terminology) or the like are available, use of a table of procedure-value codes would probably be the best approach. In the absence of such information, one could presumably classify a priori most if not all procedures into categories of "small," "medium," or "large" without knowing actual charges, as long as the type of procedure performed is known. Otherwise, persons undergoing costly procedures will have their expenses significantly underestimated by models such as these. As procedures that were formerly performed on an inpatient basis are increasingly moved to
Annual Charges for Ambulatory Care
the outpatient sector, reasonably accurate classification of procedures will become even more important. The valuation of medical supplies purchased is also a source of inaccuracy. In the 30 site-years of data used for validation, the best fit coefficients for supply counts were noted to be far more variable from one site-year to another than coefficients for the other variables; in several site-years with the worst fits, a sizable proportion of extreme outliers resulted from poor estimation of supply costs and these observations were responsible for the poor fit. We found that models with supply charges and counts removed performed as well as or better than models containing them (data not shown). Given that many plans do not cover the cost of supplies, in many cases it may be appropriate to use models that do not contain supply information. If supplies are included and are not somehow classified according to value, then the occasional person who obtains very expensive supplies will have his or her charges seriously underestimated. We found that models using counts of days in which categories of services were provided were almost as good as models using counts of services provided. Given that unbundling of services is sometimes used to increase payment without any real increase in services provided, models using counts of days rather than counts of services may produce a more valid cost measure, as they will be insensitive to such unbundling. Examining why such simple models work as well as they do is useful for understanding some of their strengths and weaknesses. For example, why is it of no use to distinguish between a minimal visit by an established patient and an initial, comprehensive history of a new patient, even though the latter has a value (based on CRVS units) over seven times that of the former? The answer is that the ability to group services into categories and the values of coefficients in the models depend on what is common in the data. In this data set, approximately 90 percent of office visits were classified as "short"; the higher value of longer visits was outweighed by their relative infrequency. Similarly, despite the much higher price of CT scans, counting them separately from plain films did not improve the models, since only one CT scan was in the data used to create the models. Given a different mix of services, the pooling of short and long visits, physician and nonphysician visits, or other classifications merged in these models might be inadvisable. In the most complete models, immunizations and injections were the only reasonably common type of service not included in any category, as they did not appreciably improve the fit when included and their exclusion did not result in the occurrence of any extreme outliers. This might not hold true in more recent data, given the marked increase in the cost of immunizations over the past few years.
HSR: Health Services Research 29:6 (February 1995)
These or similar models are likely to perform well when applied to patients similar to those enrolled in the HIE-under age 65, noninstitutionalized civilians who are not chronically disabled. Effects of secular trends since the time of the HIE, such as increases in major outpatient procedures and CT and MRI scans, should be evaluated and new models derived taking these factors into account, if possible. If cost or charge data are available for any patients in a study, we would recommend using these data to derive new models, starting with the categories in our models but adding categories for CT and MRI scans to see if this significantly improves results, and if possible classifying outpatient procedures more accurately. Similar models may have even greater predictive accuracy for patients in a specialty clinic or with specific conditions, given the greater homogeneity within such populations. However, these populations would likely receive a rather different mix of services from the HIE population, and new models would need to be derived for them. Models such as these should be useful for picking up broad differences in physician practice styles, such as propensity to order lab tests, prescribe drugs, or refer patients to specialists. However, they will not be sensitive to use of more versus less expensive services within a category, such as prescription of propriety versus generic drugs. The nature of the HIE data, in which insured persons could see any practitioner they chose, did not allow for examining the performance of the models in comparing provider practice patterns. Having acknowledged these potential limitations, what are the strengths of such models? Models were developed using over 4,000 observations and then validated on a much larger sample spanning seven years from six different sites, and the models were found to be robust across location and time (see Figure 2). Their performance was not affected by the over twofold variation in mean per person annual charges among the sites. The sample population used is fairly representative of the population at large, with the exception of the absence of the Medicare-eligible population. Thus, although they were derived from charges, these or similar models appear likely to be useful for producing surrogates for cost of care when charges are not available; these estimates will be independent of many pricing factors that may not reflect true differences in resource utilization, although they do incorporate the reimbursement biases present in the United States during the time of the HIE. Potential applications of such models include studies of costs of care in non-charge-based systems, comparison of costs of care between different medical care systems with differing or absent charges (e.g., international comparisons and comparisons between different insurers), and comparisons of costs of care within a practice for patients with different payers (e.g.,
Annual Chargesfor Ambulatory Care
independent practice association HMO patients versus preferred provider organization patients versus Medicaid patients versus traditional fee-forservice patients).
NOTE 1. This includes "procedural visits" given a CRVS modifier code of 1 by the RAND researchers, indicating a low- or no-charge visit that was part of a procedural "package."
ACKNOWLEDGMENT We would like to thank Willard G. Manning for his very helpful comments on an earlier version of this work.
REFERENCES Altman, S. H., and E. Socholitzky. 1981. "The Cost of Ambulatory Care in Alternative Settings: A Review of Major Research Findings." Annual Review of Public Health 2: 117-43. Baldwin, L. M., T. S. Inui, and S. Stenkamp. 1993. "The Effect of Coordinated, Multidisciplinary Ambulatory Care on Service Use, Charges, Quality of Care and Patient Satisfaction in the Elderly." Journal of Community Health 18 (2): 95-108. California Medical Association. 1975. California Relative Value Studies, 1974 revision, 5th ed. San Francisco: CMA. Chapko, M. K,J. L. Ehreth, and S. Hedrick. 1991. "Methods of Determining the Cost of Health Care in the Departnent of Veterans Affairs Medical Centers and Other Non-Priced Settings." Evaluation & the Health Professions 14 (3): 282-303. D'arc Taylor, C., S. M. Polich, C. E. Peterson, and E. M. Sloss. 1987. HIE Reference Series, Vol. 3: User's Guide to the HIE Data. Santa Monica, CA: The RAND Corporation, N-2349/3-HHS. Davis, P. B. and R L. Yee. 1989. "The Influence of Practitioner and Practice Characteristics on Service Activity Levels in a New Zealand General Practice Sample." Community Health Studies 13 (2): 191-99. Eisenberg, J. M., and D. Nicklin. 1981. "Use of Diagnostic Services by Physicians in Community Practice." Medical Care 19 (3): 297-309. Epstein, A. M., and E. J. Cumella. 1988. "Capitation Payment: Using Predictors of Medical Utilization to Adjust Rates." Health Care Financing Review 10 (1): 51-69.
HSR: Health Servcs Research 29:6 (February 1995)
Finkler, S. A. 1982. "The Distinction between Cost and Charges," Annals ofIntenal Medicine 96 (1): 102-109. Greenfield, S., E. C. Nelson, M. Zubkoff, W. Manning, W. Rogers, R L. Kravitz, A. Keller, A. R Tarlov, andJ. E. Ware. 1992. "Variations in Resource Utilization among Medical Specialties and Systems of Care: Results from the Medical Outcomes Study." Journal of the American Medical Association 267 (12): 162430. Hartley, R M.,J. R. Charlton, C. M. Harris, and B.Jarman. 1987. "Patterns of Physicians' Use of Medical Resources in Ambulatory Settings." American Journal of Public Health 77 (5): 565-67. Horn, S. D., J. M. Buckle, and C. M. Carver. 1988. "Ambulatory Severity Index: Development of an Ambulatory Case Mix System."Journal ofAmbulatory Care Management 11 (4): 53-62. Kleinbaum, D. G. , L. L. Kupper, and K E. Muller. 1988. Applied Regression Analysis and Odwr Multivariabk Methods. Boston: PWS-KENT Publishing Company. Kukull, W. A., T. D. Koepsell, D. A. Conrad, V. Immanuel, J. Prodzinski, and C. Franz. 1986. "Rapid Estimation of Hostalizon Charges from a Brief Medical Record Review: Evaluation of a Multivariate Prediction Model." Medical Care 24 (10): 961-66. Lion, J., J. Vertrees, A. Malbon, A. Collard, and P. Mowschenson. 1990. "Toward a Prospective Payment System for Ambulatory Surgery." Health Care Financing Review 11 (3): 79-86. McMenamin, P. 1990. 'What Price, Medicare? Geographic Variability in Medicare Physician Payment Levels." Inquiry 27 (2): 138-50. Newhouse, J. P. , W. G. Manning, E. B. Keeler, and E. M. Sloss. 1989. "Adjusting Capitation Rates Using Objective Health Measures and Prior Utilization." Health Care Financing Review 10 (3): 41-54. Newhouse, J. P., W. G. M g, C. N. Morris, L. L. Orr, N. Duan, E. B. Keeler, A. Leibowitz, K H. Marquis, M. S. Marquis, C. E. Phelps, and R H. Brook. 1981. "Some Interim Results from a Controlled Trial of Cost Sharing in Health Insurance." The New EnglandJournal ofMedicine 305 (25): 1501-507. Peterson, C. E., M. Nelsen, and E. S. Bloomfield. 1986a. Codebooksfor Fee-for-Service Annual &penditures and Vsit Counts. N-2360/1-HHS, vol. 1. Santa Monica, CA: The RAND Corporation. Peterson, C. E., M. Nelsen, D. L. Wesley, and E. S. Bloomfield. 1986b. Codebooks for Fee-for-Service Vurits. Outpatient, Inpatient and DentaL N-2360/2-HHS, vol. 2. Santa Monica, CA: The RAND Corporation. Peterson, C. E., M. Nelsen, D. L. Wesley, E. S. Bloomfield, and S. M. Polich. 1986c. Codeboc for Fee-for-Service Clains N-2347/1-HHS, vol. 1. Santa-Monica, CA: The RAND Corporation. Rogerson, C. L., D. H. Stimson, D. W. Simborg, and G. Charles. 1985. "Classification of Ambulatory Care Using Patient-Based, line-Oriented Indexes." Medical Care 23 (6): 780-88. Shneeweiss, R, and G. L. Hart 1988. "Diagnostic Content of Ambulatory Primary Care: Implications for Resource Utlization."Journal of Ambulatory Care Management 11(3): 13-22.
Annual Charges for Ambulatory Care
Schneider, K. C., J. L. Lichtenstein, J. L. Freeman, R C. Newbold, R. B. Fetter, L. Gottlieb, P.J. Leaf, and C. S. Portlock. 1988. "Ambulatory Visit Groups: An Outpatient Classification System." Journal ofAmbulatory Care Management 11
(3): 1-12. Schwartz, W. B., and D. N. Mendelson. 1991. "Hospital Cost Containment in the 1980s: Hard Lessons Learned and Prospects for the 1990s." The New England Journal ofMedicine 324 (15): 1037-42. Tenan, P. M., H. H. Fillmore, B. Caress, W. P. Kelly, H. Nelson, D. Graziano, and S. C.Johnson. 1988. "PACs: Classifying Ambulatory Care Patients and Services for Clinical and Financial Management."Journal ofAmbulatory Care Management 11 (3): 36-53. Weiner,J. P., B. H. Starfield, D. M. Steinwachs, and L. M. Mumford. 1991. "Development and Application of a Population-Oriented Measure of Ambulatory Care Case Mix." Medical Care 29 (5): 452-72. Wouters, A. V. 1991. "Disaggregated Annual Health Services Expenditures: Their Predictability and Role as Predictors." Health Services Research 26 (2): 247-72. Young, W. W., D. Z. Joyce, G. D. Bivens, S. A. Lander, and D. P. Macioce. 1988. "Incorporating the Cost of Ambulatory Care into Case Mix-Based Hospital Reimbursement."Journal ofAmbulatoty Care Management 11 (3): 54-67.