The following is technical and possibly of interest only (and just barely) to those conducting applied economics research. You have been warned.
I have been thinking about the common support problem. Specifically, I’ve been thinking that economists conducting applied research on self-chosen behaviors or objectives do not pay enough attention to it. I have always suspected this, but two recent seminars I attended have caused me to be a bit more certain that my suspicions are correct. One seminar involved estimation of hospitals’ decisions to simultaneously acquire both diagnostic and therapeutic radiology technology. The other (which I have also written about here) involved estimation of individuals’ decisions about calorie consumption and optimal body mass index (BMI).
What is the common support problem? It arises in observational data where assignment to or participation in an intervention or program is self-chosen (D=1 for those who choose assignment; D=0 for those who do not choose assignment). [1, 2, 3] When modeling self-chosen assignment or participation based on observed covariates (p(Di =1)|Xi), observations located in the central portion of the probability of assignment distribution are more likely to be “close” to each other in covariate space, regardless of their assignment status, i.e., to be similar to each other on observables. Those at the extremes of the assignment choice probability distribution (i.e., those observations with very high or very low probabilities of choosing assignment are often observed not to have counterparts with opposite assignment but with similar covariates. In extreme cases, there may be no overlap in covariate space between those who choose assignment and those who do not, i.e., they are very different on observables and therefore have no similar counterpart with the opposite assignment choice.
Notice that the graph on the left indicates that for every observation with D=1 (on the X axis), there is likely to be a matching observation with similar covariates and similar propensity for assignment among those with D=0. In the graph on the right, there are no matching observations for D=0 or for D=1, i.e., no common support, no similarities on covariates, no positive probability of assignment for persons with similar covariates.
In empirical investigations using observed data from surveys, one usually observes some overlap in propensities for assignment (and covariates). The less overlap, the more reason for concern. This is because comparing outcomes when there is little or no common support in assignment is an important source of bias in observational studies [1].
In the two seminars I recently attended, both presenters seemed to have a sense that the extremes of the technology adoption distribution and the obesity distribution might be somehow qualitatively or behaviorally different from observations in the center of the those distributions.However, neither appeared to have investigated this empirically (other than some simple comparisons of selected covariates, like caloric intake or BMI preferences, between those with D=1 and D=0).
One advantage of the propensity score as a diagnostic tool is that it provides a mapping of all covariates to a single score. It may, therefore, pick up important differences on observables between those who choose assignment and those who do not that could not be detected by simply stratifying on or comparing means of variables believed to correlate with tech adoption or obesity and decisions about food consumption.In the case of tech adoption, which was being modeled as a
simultaneous decision that depended on the tech adoption decisions of other
players in the market, one can easily imagine that the range of the propensity
to adopt includes what can be thought of as “always takers” (high p(D=1|X)) ~ 1)
and “never takers” (low p(D=1|X)) ~ 0) [12, p. 158]. Always adopters might be Stackleberg leaders
or they may simply have objective functions that are very different from those
of other hospitals in the area. For example, always takers could be large, high
bed-count, tertiary care medical centers with affiliated medical schools, where
the teaching objective and the need to attract and retain high quality faculty
imposes minimum constraints on available technology. Never takers may be small hospitals, which
for historical or other reasons have tended not to acquire cutting edge
technology that serves relatively small portions of total market patient mix.
A simple diagnostic that might shed light on this would be to estimate the probability of joint adoption of diagnostic and therapeutic technology. Then simply compare box plots of the resulting propensity scores between those who adopt and those who don’t. If I’m right, the extremes will have few matches (overlaps in covariate space or propensity to adopt). Then the issue for the researcher would be to investigate whether or not the presence in a market of what might be an always taker produces Stackleberg-like behavior among the other market players or are the always takers so different that they don’t really enter into the competitive decisions made by the remaining market players. Similarly, effort could be devoted to empirically investigating the lower extreme of the distribution to determine if they respond to the behavior of other firms in the market. If the data support the conjectures that the two groups’ behaviors are relatively independent of market characteristics and other market players, then elimination of observations with no common support should result in a sample that would allow investigation of hospital behavior among the hospitals that are most likely to be making the hypothesized trade offs about investment in new technologies given other market players’ adoption decisions at the margin.
In the case of the obese, I would expect that a similar
diagnostic exercise might provide information that would allow the investigator
to determine whether or not the very obese and the very overweight are very
different in their covariates from individuals in the center of the obesity
propensity distribution. As above, if they are, then the observations of policy
interest may well be those who are more alike on covariates even though they
differ in weight. Restriction of the
sample to those who may be amenable to policies aimed at reducing caloric intake
or increasing energy output may yield results with greater policy relevance.
What little evidence there is seems to suggest that estimation of average treatment effects on the treated without correction for failure of common support is an important source of bias [1] or that it doesn’t matter very much in some cases and matters somewhat in others [12, p. 86-91, 13] or that it matters but in more nuanced ways [14]. I'm suggesting that even in the absence of interest in a treatment effect per se, determination of common support is an easy diagnostic that may help clarify the conceptualization of the underlying self-chosen behavior that is being modeled by economists. At a minimum, it may point to previously undetected behavioral heterogeneity when assignment is self-chosen, which could have implications for the hypothesized underlying structure and for the policy question of interest. Behavioral heterogeneity could also have implications for that darling of applied econometrics, instrumental variables regression [15].
IMHO, all of these would be good things for applied economists to think about more rigorously.
1.
Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an
econometric evaluation estimator: Evidence from evaluating a job training
programme. Review of Economic Studies, 64: 605-54.
2. Heckman,
J. J., Ichimura, H., Smith, J., & Todd, P. (1998). Characterizing selection
bias using experimental data. Econometrica, 66: 1017-1098.
3.
Heckman, JJ, Smith, JA (1995) Assessing the Case for Social Experiments, J Econ
Perspectives, 9(2): 85-110.
4. Rosenbaum,
P. R. & Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70(1): 41-55.
5.
Rosenbaum, P. R. & Rubin, D. B. (1984). Reducing bias in observational
studies using subclassification on the propensity score. Journal of
the American Statistical Association, 79(387): 516-24.
6. Rubin, D. B. (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 66(3):688-701.
7. Chamberlain, G.
(1980), “Analysis of Covariance with
Qualitative Data,” Review of Economic Statistics, 48, 225-238.
8. Dehejia,
R. H. and Wahba, S. (1999) Causal Effects in Non-experimental Studies:
Reevaluating the Evaluation of Training Programs. Journal of
the American Statistics Association. 94(448): 1053-1062.
9. Smith,
J, Todd, (2001) Reconciling conflicting evidence on the performance of
propensity- score matching methods. Amer. Econ. Rev. 91: 112-118.
10. Dehejia, R. (2005)
Practical propensity score matching: A reply to Smith and Todd. J. Econ. 125: 355-364.
11. Diaz,
J.J., Handa, S. (2006) An assessment of propensity score matching as a nonexperimental
impact estimator. J. Human Resour. 41: 319-345.
12. Angrist, JD,
Pischke, J-S (2009) Mostly Harmless Econometrics: An Empiricist’s Companion,
Princeton, NJ: Princeton University Press, pp. 69-94.
13.
Crump, RK, Hotz, VJ, Imbens, GW, Mitnik, OA (2009) “Dealing with Limited
Overlap in the Estimation of Average Treatment Effects. Biometrika, http://biomet.oxfordjournals.org/cgi/content/abstract/asn055v1
14. Lechner,
Michael, A Note on the Common Support Problem in Applied Evaluation Studies
(November 2000). Univ. of St. Gallen Economics, Disc. Paper 2001-01. Available
at SSRN: http://ssrn.com/abstract=259239
or doi:10.2139/ssrn.259239
15. Heckman, J. J. (1997). Instrumental variables: A study of implicit behavioral assumptions underlying one widely used estimator for program evaluations. Journal of Human Resources. 32: 441-61