Blog post based on an article published in Journal of Demographic Economics.

The dataset (Goujon et al. 2016) that we present in the article aims to fill one major gap: provide long time series of harmonized data on education stocks – the educational attainment of adult population — from 1970 to 2060, across 171 countries. The keyword there is “harmonized”. Most people, researchers included, do not know that data on education does not fare very well in terms of quality, and the deficiencies become even more visible when we look at time series.

Why is that? There are two major causes. The first one pertains to the data collection within a country where categories are not always consistently collected across time, sometimes in relation to changes in the education system. The second concerns the aggregation of different available categories to standardized levels — for instance to the levels used nowadays by the International Standard Classification of Education. Such aggregation is not always done consistently by the different institutions in charge, be it the statistical offices themselves who have very often collected the data, or the international organizations that present the data e.g. UNESCO. The problem gets worse the further back we look.

Hence the idea to use a reconstruction methodology based on back projections of a given, most recent, and harmonized population by age, sex and levels of education. It takes advantage of the hierarchical structure of education – people cannot lose levels of education – and of the fact that education is largely acquired in young ages. If we know the proportion of 50-year olds with post-secondary education in 2015, their share is a valid estimate of the proportion of 40-year olds in 2005 taking into consideration mortality differentials (and education transition matrices for younger age groups where people are still moving up the education ladder). The reconstructed data is then validated against historical data but not polluted by them if those lack in consistency. And this is the main difference with other existing datasets, the main one being the one developed by Barro and Lee (2013, latest edition).

The base-year dataset which is used to reconstruct the past is also used to project the future levels of educational attainment to 2060 based on several scenarios of demographic and educational development. The WIC dataset has been used already in a number of scientific and action papers, for instance by the modeling communities of the International Panel for Climate Change (IPCC) to assess the relationships between socioeconomic development and climate change (KC and Lutz 2014). The back-projections have been used to show the importance of education for economic growth (Becker 2012), over demography as shown by Crespo Cuaresma et al. (2014) in an analysis of the demographic dividend.

Read the full article here.


Barro, Robert J. and Jong Wha Lee (2013) A New Data Set of Educational Attainment in the World, 1950–2010. Journal of Development Economics 104: 184–98. doi:10.1016/j.jdeveco.2012.10.001.

Goujon, A., K.C., S., Speringer, M., Barakat, B., Potancokova, M., Eder, J., Striessnig, E., Bauer, R. and Lutz, W. (2016) A Harmonized Dataset on Global Educational Attainment between 1970 and 2060 – An Analytical Window into Recent Trends and Future Prospects in Human Capital Development. Journal of Demographic Economics, 82 (03). pp. 315-363.

Becker, Gary (2012) Growing Human Capital Investment in China Compared to Falling Investment in the United States. Journal of Policy Modeling 34(4): 517-24.

Crespo Cuaresma, Jesús, Wolfgang Lutz, and Warren Sanderson (2014) Is the Demographic Dividend an Education Dividend? Demography 51(1): 299-315.

K.C., Samir and Wolfgang Lutz (2014) The human core of the shared socioeconomic pathways: Population scenarios by age, sex and level of education for all countries to 2100. Global Environmental Change,

Leave a reply

Your email address will not be published. Required fields are marked *