Skip to content Skip to navigation

Big Data Approach to Reducing Poverty

Principal Investigator
Geoffrey Cohen, Professor in Organizational Studies in Education and Business, Professor of Psychology and, by courtesy, of Organizational Behavior at the Graduate School of Business

Co-Principal Investigators
Jure Leskovec, Associate Professor of Computer Science
Emma Brunskill, Assistant Professor of Computer Science
David Grusky, Professor in the School of Humanities and Sciences and Senior Fellow at the Stanford Institute for Economic Policy Research
David Rehkopf, Assistant Professor of Medicine (Primary Care and Population Health) and, by courtesy, of Health Research and Policy (Epidemiology)

Abstract
The U.S. has had extremely high rates of poverty for decades. The government spends almost $1 trillion per year, nearly one-fourth of the total federal budget, on means-tested programs to reduce economic hardship and improve social welfare. It is altogether unclear that we are securing anything approaching an optimal return on this investment. To the contrary, the U.S. compares unfavorably to other well-off countries on such key outcomes as poverty, economic mobility, and educational outcomes.

What accounts for this state of affairs? The premise of this proposal is that substantial headway can be made once we have built credible predictive models of poverty that tell us who is at risk and who will benefit from interventions. In conventional social science models of poverty, a shockingly small amount of the variability is explained, yet no one believes that poverty is a truly random affair. The field lurches from one flavor-of-the-day account to another, and fails to deliver the comprehensive model we need. Meanwhile, scholars study the problem within their independent disciplinary silos and seldom build synergistic collaborations. We thus propose to harness the power of machine learning to analyze new troves of big data at multiple disciplinary levels of analysis and develop the first powerful predictive models of poverty.

We will do so by building a new data set, the National Poverty Study (NPS), that brings together evidence on the main sources of poverty. The NPS, which will represent 100 U.S. sites, is an innovative combination of qualitative transcript and audio data (measuring linguistic skills, soft skills, anxiety, stress), survey data (e.g., demographics), genetic and biological data (both genetic expression and genotyping), network data (e.g., survey rosters of networks, social media), neighborhood data (e.g., census data, Google Street View analyses of neighborhood characteristics), psychological data (e.g., depression, grit, stress, well-being), administrative data (e.g., earnings, program participation, employment), and key social psychological interventions. The interventions take the form of psychologically leveraged experiences designed to integrate the three key states that catalyze adaptive coping —a sense of potential, belonging, and adequacy. The centerpiece of this project thus entails deploying the tools of machine learning to combine many sources of big data that have not been adequately analyzed in and of themselves, let alone in combination.

The project team includes two computer scientists with expertise in machine learning (Brunskill, Leskovec), three social scientists with expertise in poverty and mobility (Grusky, Reardon, Edin), three experts in genetics and epigenetics (Cole, Freese, Rehkopf), and two psychologists with expertise in psychological processes tied to educational and economic success (Cohen, Schwalbe). We will first use transfer and deep learning artificial intelligence to identify the combination of predictors most associated with positive outcomes and with treatment heterogeneity. We will also use a bottom-up approach, in which we input samples of data for each participant, and create algorithms that are able to distinguish between low and high mobility participants. These predictive models, once developed, will allow us to spend our safety net dollars more effectively, discern key causal events that prevent mobility, and at long last bring the tools of the Silicon Valley to bear on the country’s most costly problem. The estimated budget for this interdisciplinary project is $ 2,971,080 distributed over 3 years.