'Dar He Blogs!: statistics

Showing posts with label statistics. Show all posts

Friday, August 10, 2012

Do Men and Women React to Information on HIV Risk Differently?

This is the meta-subject of a new paper in PLoS ONE by myself and my buddy/co-author Brendan Maughan Brown. Here is the abstract, which mostly explains everything:

Objectives
We examined whether knowledge of the HIV-protective benefits of male circumcision (MC) led to risk compensating behavior in a traditionally circumcising population in South Africa. We extend the current literature by examining risk compensation among women, which has hitherto been unexplored.

Methods
We used data on Xhosa men and women from the 2009 Cape Area Panel Study. Respondents were asked if they had heard that MC reduces a man’s risk of contracting HIV, about their perceived risk of contracting HIV, and condom use. For each gender group we assessed whether risk perception and condom use differed by knowledge of the protective benefits of MC using bivariate and then multivariate models controlling for demographic characteristics, HIV knowledge/beliefs, and previous sexual behaviors. In a further check for confounding, we used data from the 2005 wave to assess whether individuals who would eventually become informed about the protective benefits of circumcision were already different in terms of HIV risk perception and condom use.

Results
34% of men (n = 453) and 27% of women (n = 690) had heard that circumcision reduces a man’s risk of HIV infection. Informed men perceived slightly higher risk of contracting HIV and were more likely to use condoms at last sex (p<0.10). Informed women perceived lower HIV risk (p<0.05), were less likely to use condoms both at last sex (p<0.10) and more generally (p<0.01), and more likely to forego condoms with partners of positive or unknown serostatus (p<0.01). The results were robust to covariate adjustment, excluding people living with HIV, and accounting for risk perceptions and condom use in 2005.

Basically, our results show that women react to information on circumcision's protective benefits in a manner consistent with risk compensation, which is a phenomenon where individuals undertake risky behaviors if they feel that they are protected from its consequences somehow. Why do women respond differently from men? We aren't sure, but we theorize that it could be related to (some combination of):

-Misinformation among women about the protective benefits of circumcision as far as male-to-female HIV transmission (circumcision has only been shown to impact female-to-male transmission).
-Lack of opportunities for women to discuss circumcision, sex, and HIV in public places.
-Higher prior probabilities of contract HIV in women (which means risk information will be more likely to shift beliefs about HIV on the margin - good ole' Bayes's Theorem)
-A sense of reduced need among women to have to negotiate condoms, which could be tricky in a world where there is imbalance in power across genders.

Some new research by other groups (more on this in a later post) provides evidence that the first of these could be at play. Watch this space for more, as we attempt to find some explanations!

Thursday, May 17, 2012

Global Health Focus in This Week's JAMA

Some interesting new op-ed and research pieces on global health in the latest issue of JAMA. Perhaps the most interesting is this piece by Eran Bendavid and co-authors, who examine PEPFAR (The United States President's Emergency Plan for AIDS Relief) and its impact on all-cause mortality in Africa. This is important because some have argued that intense spending on HIV in this manner has crowded out spending on other important health problems. The authors use a differences-in-differences strategy (comparing mortality rates within countries before and after PEPFAR, with some areas getting funding and others not as much, holding context fixed factos across countries) and find that increased PEPFAR funding was associated with lower all cause mortality rates. (HT: Paula Chatterjee)

Monday, February 6, 2012

Does Discrimination Make You Sick? And How?

Interesting new paper, forthcoming in the Journal of Health Economics, that uses 9/11 as a quasi-experimental source of variation to try and get at the causal effect of discrimination (here, against Muslims in the UK) on health outcomes. It also goes a bit further than this and tries to get at some of the mechanisms. The findings are, sadly, along the lines of what I expected:

The attitudes of the general British population towards Muslims changed post 2001, and this change led to a significant increase in Anti-Muslim discrimination. We use this exogenous attitude change to estimate the causal impact of increased discrimination on a range of objective and subjective health outcomes. The difference-in-differences estimates indicate that discrimination worsens blood pressure, cholesterol, BMI and self-assessed general health. Thus, discrimination is a potentially important determinant of the large racial and ethnic health gaps observed in many countries. We also investigate the pathways through which discrimination impacts upon health, and find that discrimination has a negative effect on employment, perceived social support, and health-producing behaviours. Crucially, our results hold for different control groups and model specifications.

So in addition to the deadweight loss of underutilizing potentially talented men and women, as well as increasing social unrest and the potential political costs that might have, we can now add health to the slew of negative impacts from discrimination.

In a later post, I'll go over a paper that Sonia Bhalotra and I are working on that looks at how discrimination can prevent children who have better childhoods into tapping into that wellspring as adults.

Sunday, November 27, 2011

Do Financial Incentives Induce Physicians to Provide More (Unnecessary) Care?

About two years ago, I posted something on my now non-existent Facebook account about how medical tests and treatments, especially those that are elective, are more likely to be offered if doctors are reimbursed well for them. My point was that there was a strong financial incentive to test and treat, even in cases where doing so would confer only little benefit to the patient's health, at best. A bunch of people (mainly physicians) responded on my wall pointing out how misguided I was. It was actually a bit more vociferous than this, but I digress.

Anyway, it turns out that I was right (in this case, NOT shocking). I just came across a great study by Joshua Gottleib, an economics job market candidate from Harvard. His study uses a natural experiment in physician incentives to examine whether payment drives care offered. Specifically, he takes advantage of a large scale policy change by Medicare in 1997. Previously, Medicare created different fee schedules for each of around 300 small geographic areas. This was done because production costs and other realities of providing a given service obviously varied across space. In 1997, they decided to coalesce these regions into 80 larger areas. For some smaller areas, there may have been large payouts for certain services which fell after 1997 because the average payout for their new larger group was lower. For others, it went the other way. In any case, comparing pre and post 1997 gives you a nice experiment as to what would happen to health services provision when payouts are changed for reasons other than local health outcomes or demand for care.

Whether you hold my priors or shared those of my misguided Facebook friends, the results remain astounding. Across all health services, Gottleib finds that "on average, a 2 percent increase in payment rates leads to a 5 percent increase in care provision per patient." Predictably the price response of services with an elective component (such as cataract surgery, colonoscopy and cardiac procedures - don't huff and puff, I said elective COMPONENT!) but not so much for things like dialysis or cancer care, where it is easy to identify who needs it and you need to do it no matter what. Furthermore, in addition to disproportionally adjusting the provision of relative intensive and elective treatments as reimbursements rise, physicians also invest in new technology to do so; this is beautifully illustrated by the examination of reimbursement rates and MRI purchases.

So what's the upshot of all this? Is this a good thing? Probably not. Despite scaling up technology, Gottleib is unable to find any impacts on health outcomes or mortality among cardiac patients (for which he explored more deeply the relationship between payouts and treatment). Furthermore, he asserts that "that changes in physician pro fit margins can explain up to one third of the growth in health spending over recent decades."

Ultimately, some good lessons here. First, if we are interested in bring down costs and increasing health care efficiency, we need to pay for things that actually help maintain and increase health. Second, we can't rely on physicians do be the gatekeepers of rising costs as it is clear that, given incentives, they may not always behave in a way that actually improves health outcomes (thankfully, for cases like fractures, cancer or end-stage renal disease treatment, docs aren't sensitive to prices and do the right thing clinically). Finally, we need to stop universally and blindly lauding the US health care system as a bastion of health care technology if that technology does little to improve outcomes.

Saturday, November 5, 2011

Infections and IQ

A well known fact about our world is that there are great disparities in average IQ scores across countries. In the past, some have tried to argue that this pattern be explained by innate differences in cognition across populations - some people are just innately smarter than others. Others have tried to attribute these to cultural factors. However, genetics and culture are likely not driving these differences in any meaningful sense. After all, another stylized fact is that average IQ scores have been going up markedly, within one or two generations, within any given country. These changes, also known as the Flynn Effect after the researcher who painstakingly documented them, speak against the genes story because they occurred far more quickly than one would expect from population-wide changes in the distribution of cognition-determining genes. The have occured too quickly to be explained by paradigm shifting social changes, as well.

So what gives? Enter Chris Eppig, a researcher at the University of New Mexico. In a recent piece in The Scientific American , he proposes that cross-country differences in IQ, as well as changes in IQ rates within a country over time, can be explained by exposure to infectious diseases early in life. The story goes something like this: infections early in life require energy to fight off. Energy during this age is primarily used for brain development (in infancy, it is thought that over 80% of calories are allocated to neurologic development). So if energy is diverted to fend off infections, it can't be used to develop cognitive endowments, and afflicted infants and children end up becoming adults that do poorly on IQ tests.

In the piece, Eppig cites some of his work linking infectious disease death rates in countries to average IQ scores. His models control for country income and a few other important macroeconomic variables. His evidence, while not proof of a causal relationship, is certainly provocative. So provocative in fact that I ended up trying to build a stronger causal story between early childhood infections and later life cognitive outcomes. In a recent paper (cited in the above Scientific American article), I examine the impact of early life exposure to malaria on later life performance on a visual IQ test. I use a large-scale malaria eradication program in Mexico (1957) as a quasi-experiment to prove causality. Basically, I find that individuals born in states with high rates of malaria prior to eradication - the area that gained most from eradication - experienced large gains in IQ test scores after eradication relative those born in states with low pre-intervention malaria rates, areas that did not benefit as much from eradication (see this Marginal Revolution piece for a slightly differently worded explanation).

My paper also looks at the mechanisms linking infections and cognition. One possibility is the biological model described above - infections divert nutritional energy away from brain development. However, I also find evidence of a second possibility: parents respond to initial differences in cognition or health due to early life infections and invest in their children accordingly. In the Mexican data, children who were less afflicted by malaria thanks to the eradication program started school earlier than those who were more afflicted. Because a child's time is the domain of parental choice, this suggests that parents reinforce differences in the way their children are (- erhaps they feel that smarter children will be smarter adults, and so investments in their schooling will yield a higher rate of return - and that this can modulate the relationship between early life experiences and adulthood outcomes.

Wednesday, August 10, 2011

Great Paper on the Impact of Cancer Screening

A good number of health care professionals, though perhaps not enough, are obsessed with screening to catch various diseases early - "when we can do more about it." Cancer screening, in particular, is an area of great interest, one that the public has really latched on to over the last 20 years or so. As the row over breast cancer screening illustrates, there is a huge debate on when to screen people. Should we screen at, say, age 40 for breast cancer, or start yearly mammograms at age 50?

It turns out that the evidence to motivate one set of screening guidelines over another isn't all that great. That is, while we know that certain kinds of screening (breast and colon, for example) work, randomized clinical trials of screening tests have too few people in each age group to definitively assess the best age cutoff for these modalities.

Enter this neat paper by Srikanth Kadiyala and Erin Strumpf. They utilize our existing screening guidelines as "natural experiment" to study the effectiveness of screening at the population level. Specifically, while the person aged 39 years old and the person aged 41 years old may be similar in terms of their cancer risk, our national guidelines lead to one person being screened and the other not. Is the 41 year old better off as a result?

U.S. cancer screening guidelines recommend that cancer screening begin for breast cancer at age 40 and for colorectal cancer and prostate cancers at age 50. What are the marginal returns to physician and individual compliance with these cancer screening guidelines? We estimate the marginal benefits by comparing cancer test and cancer detection rates on either side of recommended initiation ages (age 40 for breast cancer, age 50 for colorectal and prostate cancers). Using a regression discontinuity design and self-reported test data from national health surveys, we find test rates for breast, colorectal, and prostate cancer increase at the guideline age thresholds by 78%, 65% and 4%, respectively. Data from cancer registries in twelve U.S. states indicate that cancer detection rates increase at the same thresholds by 25%, 27% and 17%, respectively. We estimate statistically significant effects of screening on breast cancer detection (1.3 cases/1000 screened) at age 40 and colorectal cancer detection (1.8 cases/1000 individuals screened) at age 50. We do not find a statistically significant effect of prostate cancer screening on prostate cancer detection. Fifty and 65 percent of the increases in breast and colorectal case detection, respectively, occur among middle-stage cancers (localized and regional) with the remainder among early-stage (in-situ). Our analysis suggests that the cost of detecting an asymptomatic case of breast cancer at age 40 is approximately $100,000-125,000 and that the cost of detecting an asymptomatic case of colorectal cancer at age 50 is approximately $306,000-313,000. We also find suggestive evidence of mortality benefits due to the increase in U.S. breast cancer screening at age 40 and colorectal cancer screening at age 50.

This is a neat, well-crafted study. The methodology the authors use, called regression discontinuity, utilizes sharp cutoffs in decision rules/policies/etc in the context of otherwise inconsequential changes in the variable used to determine this cutoff (i.e., in the immediate neighborhood of the cutoff). It is a useful way to get quasi-experimental evidence where there is no other way to get it. Indeed, regression discontinuity is now considered second below the gold standard randomized clinical trial in the pantheon of statistical approaches.

Of course, the one problem with this study is that, while we know our existing cutoffs are useful, we don't know if there is some other cutoff that would be better (one of the motivating questions of the paper). Here is a weakness of their research design: unless there is a policy change to a different age cutoff, regression discontinuity will only allow us to evaluate our current guidelines. Either we'll have to turn to evidence from other countries, evidence from different eras - both of which have problems insofar as that the epidemiology and treatment of cancer likely varies across time and space - or turn to larger randomized controlled trials where we can be sure to have large numbers of individuals around the cutoff ages we want to test. Alternatively, we could perhaps exploit current confusion over breast cancer screening guidelines, taking advantage of the fact that some providers may choose age 40 and others 50 to commence yearly mammograms, to assess whether one or the other is better.

Saturday, August 6, 2011

Can Research on Measurement Provide Insights into the Poverty Experience?

Great paper, forthcoming in the Journal of Development Economics on how the length of recall periods in surveys leads to different measurements of health, wellness and health care seeking behavior. Also interesting is how the recall period length effect differs by income status. The authors use their findings to suggest that experiences with illness have become disturbingly become the normal among the poor vis-a-vis the rich:

Between 2000 and 2002, we followed 1621 individuals in Delhi, India using a combination of weekly and monthly-recall health questionnaires. In 2008, we augmented these data with another 8 weeks of surveys during which households were experimentally allocated to surveys with different recall periods in the second half of the survey. We show that the length of the recall period had a large impact on reported morbidity, doctor visits; time spent sick; whether at least one day of work/school was lost due to sickness and; the reported use of self-medication. The effects are more pronounced among the poor than the rich. In one example, differential recall effects across income groups reverse the sign of the gradient between doctor visits and per-capita expenditures such that the poor use health care providers more than the rich in the weekly recall surveys but less in monthly recall surveys. We hypothesize that illnesses--especially among the poor--are no longer perceived as "extraordinary events" but have become part of “normal” life. We discuss the implications of these results for health survey methodology, and the economic interpretation of sickness in poor populations.

Sunday, July 10, 2011

Global Health Data Exchange [!]

For your viewing and researching pleasure. The data exchange is courtesy of the University of Washington's Institute for Health Metrics and Evaluation. The goal is to collect all the random and not-so-random datasets floating around out there, thereby creating a "one-stop shopping" space for those interested in both tabulated and raw (census, survey, macro-health) data.

I found out about this just today while reading Sanjay Basu's latest blog post (a good one on global health data sources), and spent a better part of the browsing the site. At a first pass, the data exchange seems really comprehensive. As a grad student, I prided myself on knowing about every random dataset out there, something that took a lot of effort and time. Now, there is a nice, comprehensive external brain for such an endeavor. I hope this project continues along its current trajectory because it has a ton of promise. I would say that even in its current state it will prove quite useful for interested lay-people, policymakers, and hard-core researchers alike.

Thursday, July 7, 2011

Bad Epidemiology

While in South Africa a few months ago, an irritating yet clever radio announcer, during a joke-based interlude between songs, made the following comment:

"Research has shown that insomnia leads to depression. Other research has shown that depression leads to insomnia. Still other research has shown that research leads to more research."

Seems like a great indictment of some of less-than-careful, data mining-y studies that often find their way into decent journals and on the evening new. (Note: I'm not anti-epidemiology.)

Sunday, June 26, 2011

Comparative Effectiveness Research - What is it Good For?

One oft floated solution to rising health care costs is the use of comparative effectiveness research (CER) to guide use of more efficient/efficacious therapies from the outset, reducing the need for costly readmission, diagnostic tests and trials of different therapies. CER involves a set of tools that help compare two or more different treatment strategies with each other, often in the context of a randomized clinical trial. An added wrinkle to all this is the the (in)famous Cost Effectiveness Study (CEX), where the outcome returns to different treatments are scaled/compared by their cost.

While proponents of CER are gung-ho about its clinical and policy utility, there are potential downsides to such research. In general, most of our clinical trials recover average effects for a population of interest. That is, we compare drug X against drug Y in randomized groups of 15-75 year olds with certain manifestations of disease Z. This is great for getting an average effect estimate for a particular population. That is, if we randomly draw a 15-75 year old with certain manifestations of disease Z, on average we can expect drug X and Y to work a certain way.

However, there is an increasing realization that drugs work differently for different people. Individuals may vary in the manner in which they metabolize certain drugs or the nature of their underlying illness, while equivalent to the average clinician, may differ in its responsiveness to treatment (see here for a great discussion on this.) If this is the case, widespread use of CER and CEX may not make people better off. In some cases, it might make some people worse off. For example, if some people are better off with drug X, but the average person benefits more from drug Y, the use of the latter will make some people worse off.

In a very interesting paper (see here for a non-gated, older version), Anirban Basu, Anupam Jena,and Tomas Philipson provide a real clinical example of this latter point from psychiatry. They build a model where CER and CEX information is used by insurers/payers to guide clinical care. That is, when a study comes out showing that drug Y > X, these parties are only willing to pay from drug Y. They then show that, in the case of schizophrenia, overall health may have been reduced because people who were formally doing well on drug X were forced to take drug Y, which was actually worse for their health and well-being. The authors go on to call for a more nuanced understanding of how CER and CEX research can be used to guide treatment, especially in an era where individualized treatments are becoming more popular (Basu has a great essay on this point here; see here for a technical paper on how CER can be individualized). Certainly, a regime where CER/CEX can be maximally useful will involve directed clinical trials that take heterogeneous treatment effects into account in the a priori design.

(PS: A great summary essay on CER/CEX, which covers many of the above points, can be found in a recent issue of the Journal of Economic Perspectives. Also, hat tip to AKN for bringing several of these papers to my attention.)

Monday, February 15, 2010

Non-Technical Introduction to Causal Inference/Methods

Hi everyone, the blog is back in business.

I recently came across a great working paper seeking to introduce econometric methods geared towards understanding causality to a non-expert audience. I'm particularly excited about this because I think many of these methods could really be useful in medical care/clinical questions where it is either unethical or technically difficult to randomize patients (and yes, there are still plenty of those!). For whatever reason, these methods are, in my estimation, rather underutilized in medicine.

Obviously, while the linked piece is geared towards education policy, the methods can be used in any context. Here is the abstract:

Education policy-makers and practitioners want to know which policies and practices can best achieve their goals. But research that can inform evidence-based policy often requires complex methods to distinguish causation from accidental association. Avoiding econometric jargon and technical detail, this paper explains the main idea and intuition of leading empirical strategies devised to identify causal impacts and illustrates their use with real-world examples. It covers six evaluation methods: controlled experiments, lotteries of oversubscribed programs, instrumental variables, regression discontinuities, differences-in-differences, and panel-data techniques. Illustrating applications include evaluations of early-childhood interventions, voucher lotteries, funding programs for disadvantaged, and compulsory-school and tracking reforms.

Enjoy!

Saturday, May 9, 2009

Antidepressants and Suicide

Whether antidepressant use increase suicide risk in the short-term is an ongoing debate in the clinical medicine and health policy worlds. A few years back, based on some evidence that antidepressant use was correlated with a higher risk of suicide, the FDA issued a "black box" warning, forcing manufacturers to acknowledge the increased risks on packaging and materials related to the drugs. The public responded predictably: antidepressant use dropped notably after the warning. (See this 2006 article for more on the issue)

The biomedical model that links antidepressant use to suicide is the following. Depressive symptoms involve both mood and reduced activity. Antidepressants, it is thought, start working by increasing activation before mood. As a result, the hypothesis is that, in the short term, people who have suicidal thoughts may actually carry it out because they are now "activated."

But is there another explanation that could explain the link between anti-depressants and suicide? An important possibility is selection: anti-depressants are taken by people with depressive symptoms, who are more likely to commit suicide. The fact that the association between anti-depressant use and suicide only exists in the short-run could be explained by this selection model as well: those who would commit suicide would do so, and those who are left may have been unlikely to do so in the first place or were prevented from doing so by the medication.

The overall literature on anti-depressants and suicide gives some support to the selection hypothesis. First off, the relationship between use and suicide seems to vary from study to study and across countries. We would not expect this if the biological model were correct. Second, the "black box" warning provides an interesting time series test. In several countries, the use of anti-depressants dropped after the public was informed about the potential risks, and the incidence of suicides actually increased. This runs counter to what we would expect from the biological mechanism model.

A recent paper (forthcoming in the Journal of Health Economics) provides what I think is the most careful analysis of the causal relationship between anti-depressant use and suicide, taking explicitly into account the potential selection bias issue. The authors, Jens Ludwig, Dave Marcotte and Karen Norberg, utilize an instrumental variables (IV) approach:

In this paper we present what we believe to be the first estimates for the effects of SSRIs on suicide using both a plausibly exogenous source of identifying variation and adequate statistical power to detect effects on mortality that are much smaller than anything that could be detected from randomized trials. We construct a panel dataset with suicide rates and SSRI sales per capita for 26 countries for up to 25 years. Since SSRI sales may be endogenous, we exploit institutional differences across countries that affect how they regulate, price, distribute and use prescription drugs in general (Berndt et al., 2007). Since we do not have direct measures for these institutional characteristics for all countries, we use data on drug diffusion rates as a proxy. We show that sales growth for SSRIs is strongly related to the rate of sales growth of the other major new drugs that were introduced in the 1980s for the treatment of non-psychiatric health conditions. This source of variation in SSRI sales helps overcome the problem of reverse causation and many of the most obvious omitted-variables concerns with past studies. Our research design may also have broader applications for the study of how other drug classes affect different health outcomes.

Using this strategy, they find that a 12% increase in anti-depressant sales is associated with a 5% decrease in suicides. Interesting stuff.

While the main innovation in the paper is the use of instrumental variables, this may also the main weakness. First, as discussed in previous posts, in order for the IV approach to work, the instruments should only affect the outcome through the exposure of interest. The authors in this paper go through some trouble to establish the validity of their IVs. Its all carefully done and compelling, but, depending on your priors about institutional differences in pricing strategies, you may still have qualms about the IV.

The other issue with IVs, is that the effect it computes applies to those people (or here, groups of people) that are most affected or sensitive by the instrument (see this earlier post for more on this). Thus, it is very important to note that the finding in this paper does not rule out the possibility that anti-depressant use might have adverse impacts on some populations. I think this is of particular interest to clinicians, and there are new methods in econometrics that can help uncover heterogeneity in treatment effects (see this paper on the heterogeneous impacts of treatment on breast cancer, utilizing methods developed by Heckman and co-authors).

Friday, April 3, 2009

Correlation Does Not Imply Causation. And...?

Since I started grad school four years ago, I've noticed that the general public is much more attuned to idea of correlation not always implying causation. Of course, the indoctrination is not complete just yet, and there are plenty of instances where an association is mistaken for something more, but the fact that people are becoming better consumers of statistics is gratifying. I attribute this to the spate of popular press economics and statistics books/blogs in the last few years (though I might be in danger of confusing correlation and causation myself by saying this!)

The standards in empirical research reflect how seriously people are taking this motto: finding a clever instrumental variable or even experimental variation is no longer good enough. Papers without extensive "robustness" checks and falsification tests have less credibility now than they would have even five years ago. This, like the trend in the general public, is a good development.

However, with these positives come some more troubling tendencies. Specifically, I have a beef with the overuse of the causation-correlation dictum. Now, anybody can bring down a paper simply by saying "correlation does not imply causation" without having to provide a reason why this might be the case. For example, I am working on a paper looking at the long-run causal effects of birth year exposure to a clean water and sanitation efforts (I'll post a link to this paper in a month or so when a good draft is ready). I have a plausible identification strategy, and also include all sorts of controls, trends and falsification checks in my analysis to further establish causality. My results check out.

However, someone recently remarked told me that I should be concerned about omitted variables. When I pressed her on what these might be, she wasn't sure but commented that "there are always omitted factors."

Clearly, this isn't helpful. It's really easy to look/sound clever and point out that correlation does not imply causation: it is technically a true statement! But I think people who make this claim should talk about how it applies to the analysis at hand (i.e., have some kind of model or story that makes more explicit the nature of the potential biases and where they come from). Otherwise, the statement by itself is pretty uninformative and does little to advance our knowledge.

Thursday, February 5, 2009

Workshops at Yale's StatLab

Here is the schedule for statistical package workshops offered through Yale's StatLab for Spring 2009. On the bill are workshops for HTML, Stata, SPSS and R, as well as tutorials on data management and survey data. I highly recommend these workshops for anyone interested in doing empirical work. I took the beginner Stata and R workshops during my first year of graduate school and both served as a great platform upon which to play around and add new knowledge.

There are some excellent web-based tutorials for statistical packages as well. My favorite is the UCLA Academic Technology Services Statistical Computing page, which offers a plethora of links (including pages for Stata, SAS, SPSS, and R).

Monday, June 16, 2008

The P-Value Contest

Many of you have likely come across one of the following sentences while perusing empirical work (doesn't matter what discipline or field):

"...the effect was statistically significant (p = 0.0401)..."
"...as per convention, we define statistical significance as a p-value of 0.05..."
"...income was significant at standard levels of confidence (p = 0.049)..."

0.05, or 5%, is the magic p-value (loosely, the probability that the given estimate or test statistic is due to random chance) that denotes the threshold separating statistical significance from insignificance. You don't need to be a Bayesian to realize that this designation is completely arbitrary: why not 0.04 (4%) or 0.06 (6%)?

I spent some time today (read: procrastinating) trying to find the story behind the origin of this convention. The most I could find was that R.A. Fischer, the founding father of statistics, decreed this as an "acceptable value" and everyone started to copy it.

I'm not satisfied: I'm guessing there is a fascinating story here. Why did Fischer choose 0.05 in the first place? How did the convention spread? Was it ever challenged? How do conventions develop in general?

This brings us to this summer's contest: whoever finds the best story (link/paper/book) behind the 5% convention will receive a copy of a recent popular press economics book of their choice. The prize is a small price for me to pay to have someone else do the work and bring the story to me - everyone wins.

Tuesday, April 8, 2008

Experiments and Corruption

A recurrent theme of this blog has been how clever empirical analysis can be used to recover causal effects in observational data. Such statistical tools are important because, in many cases, individuals or regions cannot be randomly assigned to various states that we are interested. At the same time, however, the use of field experiments, both to assess causal effects and to test various theories, is becoming more prevalent in economics, political science and public health. In an earlier post, I talked about the Jameel Poverty Action Lab, a group set up by economists primarily based at MIT and Harvard which uses experiments to test out different local development interventions.

Another benefit of the experimental method is that it allows one to quantify things that are very hard to measure in even the most detailed surveys. This doesn't just include behavioral concepts such as individual preferences, risk aversion, or discount rates, but also phenomenon like corruption, where survey respondents may be loath to reveal their true history of actions due to fears of legal ramification.

Experiments looking at corruption are slowly growing in number. Just recently, I read a paper (no links available) by two Yale political science graduate students looking at the effects of the Indian Right to Information act versus corrupt practices versus the by-the-books apply and wait method in helping slum dwellers in New Delhi receive government ration cards, which serves as a form of ID as well as a means for accessing fair price food shops. In order to get a ration card, individuals have to fill out an application and prove that they meet a particular means test (i.e., their income must be below some threshold). The authors found that nothing greased the wheels quite like corruption, but the Right to Information request did allow individuals to get their ration cards reasonably quickly. Not a single member of the control group, who merely filled out an application for the car, received a ration card even some 7 months after the experiment began.

This experiment is similar in spirit to one recently published in the Quarterly Journal of Economics, which considers the influence of corruption in obtaining a driver's license in India. From the abstract:

We study the allocation of driver's licenses in India by randomly assigning applicants to one of three groups: bonus (offered a bonus for obtaining a license quickly), lesson (offered free driving lessons), or comparison. Both the bonus and lesson groups are more likely to obtain licenses. However, bonus group members are more likely to make extralegal payments and to obtain licenses without knowing how to drive. All extralegal payments happen through private intermediaries ("agents"). An audit study of agents reveals that they can circumvent procedures such as the driving test. Overall, our results support the view that corruption does not merely reflect transfers from citizens to bureaucrats but distorts allocation.

Some self-promotion: Paul Lagunes, Brian Fried (both of Yale political science) and yours truly recently conducted an experiment in a large Latin American city. The idea was to explore the intersection between corruption and inequality: do public officials behave more corruptly with upper class or lower class individuals? Our experiment involved traffic police and driving infractions, and we found that, conditional on being stopped, lower class individuals were much more likely to be asked to pay a bribe than upper class drivers, who were typically merely warned not to drive in that fashion in the future. Interestingly, not a single traffic ticket was given out during the duration of our experiment!

To help interpret the differential class results, we interviewed police officers in the city to get a sense of what might be driving differential treatments. Based on these interviews, our hypothesis is that upper class drivers were treated differently because of the cops' fear that the former's bureaucratic influence could get them in trouble with their supervisors.

We hope to have a draft of our paper out to a political science or criminology journal soon. I'll keep you posted.

Sunday, February 10, 2008

Clemens-ometrics

Are you sick of the Roger Clemens-Brian McNamee flap? I am, but the infusion of some new statistical analysis has renewed my interest in the issue. A group of statisticians and an economist at the University of Pennsylvania have used some pretty simple analysis to address (and refute) Clemens' lawyers' claims that the statistical record exonerates Roger from allegations of taking performance enhancing drugs.

The take home point of the whole exercise is simple. The statisticians allege that the analysis in the "Clemens Report" is highly prone to selection bias: the "control group" for Roger Clemens is comprised of freaks of nature like Nolan Ryan. This new analysis looks at a different control group (all pitchers who make a given number of appearances over a reasonably long career) and finds that Clemens is certainly an outlier. Obviously, this does not suggest that Clemens used performance enhancers anymore than it implicates Nolan Ryan, but it does suggest that Clemens statistical record does not hold him above suspicions in these investigations, which is what the pitchers' lawyers are arguing.

For more information, see here. The Freakonomics blog has a guest post by economist Justin Wolfers, one of the analysts on the study, talking about the whole exercise. Also, I heard sometime before that statistical methods were used to analyze steroid use among hitters, as well. I have not been able to find any reliable links on that. If you have one, please post it.

Wednesday, February 6, 2008

Resources for Causal Inference

In the last 15-20 years or so, statisticians and econometricians have made great strides in thinking about how to establish causality in the absence of a clean randomized experiment. In the economics literature, the bar as been raised quite high as far backing up causal statements, and the same sort of rigor is now developing in other disciplines such as medicine and public health, political science, and sociology. The increasing availability of technically accessible resources for those who are interested in applying cutting-edge statistical techniques to address causal questions, while making minimal assumptions on things that cannot be observed, should hasten the convergence in methodological rigor across these disciplines.

A good example of a rigorous, but still applied and down-to-earth, treatment of causal inference is Counterfactuals and Causal Inference: Methods and Principles for Social Research by Morgan and Winship. I am currently working through this book and have found it fantastic. The book begins by tackling the problem of causal inference using both the algebraic potential outcomes model as well more intuitive graph theoretic approaches. After building this foundation, the book goes through different approaches to establish causality, offering clear expositions on the utility of these estimators given certain data structures and, most importantly, the kinds of assumptions that are needed to justify making causal statements with each of the methods. Methods covered in the book include matching, IV, panel data approaches such as fixed-effects and difference-in-differences, and partial identification/bounding. I think any serious applied researcher, regardless of primary discipline, should have these methods in his/her toolkit. This book offers excellent introductions into each and will put said researcher on his/her way.

In a monograph explicitly designed to have broad appeal, it is to be expected that some important points are underdeveloped or glossed over, entirely. For example, the section on weak instruments in IV estimation left a lot to be desired. However, the authors cite all of the important theoretical and empirical literature up to 2006: if you want to learn more, it's not hard to find out where you need to go.

Some other (more advanced) resources that I have found interesting and useful:

1) Identification for Prediction and Decision, by Charles Manski: Manksi has a different take on causal inference than most. Rather than try to find point estimates of a causal parameter, Manksi begins with very weak assumptions about the underlying social process and studies the extent to which one can create treatment effect bounds with these assumptions. As he notes in the introduction, some people find this to be unnecessarily conservative. However, in many situations, perhaps it is best to recognize that explicit causal inference may not be possible and/or may require very strong assumptions about things we cannot observe. It is in these situations where the utility of bounding is most apparent. Manksi makes a concerted effort to tie in his methodology with efforts to recover policy relevant parameters that, as the title suggests, help motivate prediction and decision. Here is a health care application (looking at Swan Ganz catherization) of the Manksi's approach - good stuff.

2) NBER Summer Course on Econometrics: This site supposedly has some 18 hours worth of high-quality lecture video and slides on topics ranging from program evaluation and causal inference, to different estimators getting at causal inference, to panel data, to GMM, etc. The talks are given by Guido Imbens and Jeff Wooldridge, both excellent econometricians. (The latter has written some phenomenal introductory and advanced econometrics textbooks.) I'm about halfway through the lecture series now and they are quite good. Most require prior knowledge of econometrics.

3) Handbook of Econometrics: James Heckman has will have some articles about program evaluation in the forthcoming version of the Handbook of Econometrics he is editing. As you'll see from the Morgan and Winship book, Heckman's name comes up a lot in the causal inference and identification literature. He's a superstar who has done some amazing work. I'll post a link once these resources become available.

Friday, December 14, 2007

When Bad Inference Happens to Good People

I turned on ESPN yesterday night for some pre-game analysis on the Houston Texans-Denver Broncos contest. I wasn't paying too much attention until I heard the following statement: "One of Houston's keys to victory tonight is to give Ron Dayne 20-25 carries."

I did a double take. Ron Dayne? An NFL journeyman with decent size, but limited speed, who is best utilized in a platoon of running backs? It turns out that the analyst was basing his comment on the following fact: the Texans are 4-1 in the past two seasons when Ron Dayne gets more than 20 carries.

This is another example of people confusing correlation for causation. There are two (or maybe more) explanations for the Ron Dayne tidbit:

1) Ron Dayne is a game changing talent who, if given the ball, will more often than not win the contest for you.

2) Ron Dayne getting 20 carries or so is a symptom of things working right offensively for the Texans. When the Texans are firing on all cylinders, Dayne's rushing opportunities and totals may reflect the fact that linebackers and safeties are playing off the line of scrimmage and drop back into coverage, allowing Dayne to get his 5-10 yard runs, or that offensive line play is so dominant that Dayne is able to run clear through the woods.

I think the second explanation is probably the more likely one. After all, can you imagine a defensive coordinator thinking before a game, "wow, we need to get 8 in the box to stop Ron Dayne"? Ron Dayne is a good player, but he's not LaDainian Tomlinson or even Frank Gore - the 2007 version.

In any case, this innocuous episode reflects the danger of attributing causal stories to what are only correlations. Unfortunately, fates of entire policies have hinged on bad inference, that too in arenas less trivial than professional sports.

Friday, November 30, 2007

The Deep Structure of the Universe? (Some Friday Fun)

I went to an interesting talk last week on why individuals don't choose to buy long-term care insurance. While the speaker did not have an answer to that question just yet (it's a work in progress), she did have a lot to say on the use of qualitative methods as a set up for and complement to statistical methods. In the past I've made fun of qualitative work as a "be all, end all" strategy in attacking a given research question ("its not representative," "it takes forever to do," "I don't believe things if there is no math or hard data," etc). But I do agree that qualitative work can help give direction to statistical analysis and aid in the interpretation of sempirical results.

Anyway, in explaining the mechanics behind the qualitative methodology, the speaker pointed out how sampling in qualitative methodology essentially obeys the law of diminishing returns (called the saturation principle, here). Basically, you keep adding to the sample until you stop learning anything new - when the marginal returns to interviewing approach zero. Apparently the majority of qualitative studies saturate somewhere between 25 and 30 subjects/interviewees.

I had a sudden flash of insight when I heard these numbers. The Central Limit Theorem (the big one) posits that the sum of a vector of random variables distributed with finite variance approaches a normal distribution for large values of n. Imagine a really screwed up probability distribution. Draw n numbers from it, and take the sum. Do it over and over and look at the distributions of the sums. Boom! Its looks normal! Its a beautiful result, and the proof is quite elegant, too.

The Central Limit Theorem appears to kick in around n of 30 or so. Thats whyI got so excited about the saturation numbers: its just interesting that two apparently distinct phenomena attain right around the same sample size. Obviously, I could be exhibiting a classical behavioral economics bias and attributing patterns to what are basically unrelated phenomena that just so happen to agree with one another. On the other hand, maybe there is more to it - something fundamental and deep.

Some more: If you like these kind of oddities and puzzles, you should read Fermat's Last Theorem by Simon Singh. It's one of the best books I've read in a long time. Basically it covers the 300+ year history of this annoying and outstanding math problem that kids can understand but adults (including Hall of Famers like Euler) could not solve for several centuries. The beast was laid to rest in 1993 by a Princeton mathematician who spent something like a decade of his life working on this problem and this problem alone. While the problem for a long time looked like some trivial curiousity, the Last Theorem ultimately speaks to some deep connections between branches of mathematics thought to be distinct.

The book is chock full of short biographies of all the mathemeticians who made contributions to solving the problem, interesting nuggets about number theory and the fundamental importance of prime numbers, irrational numbers, etc, and insights into how seemingly trivial mathematics could have mighty big things to say about the natural and physical world. It's a great read, and allows us normal folk to catch a glimpse into the beauty of mathematics.