Can we monitor and possibly predict the socio-economic development and well-being of our societies just by observing human behavior, like human movements and social relationships, through the lens of Big Data? This fascinating question has been stimulated by the United Nations in recent reports, attracting the interest of researchers from several disciplines. In this story we describe an example of how Big Data sources, like the so-called CDRs (Call Detail Records) of calling and texting activity of users, can be used to answer the above challenging question.
Call Detail Records (CDRs) are gathered for billing and operational purposes by each mobile phone operator. We used a dataset provided by Orange recording the geographic location of 87,000 phone towers and 5.7 billion calls made during 45 days by 20 million anonymized mobile phone users, resulting in a total size of 900GB of information. CDRs collect geographical, temporal and interaction information on mobile phone use and show an enormous potential to empirically investigate human dynamics on a society-wide scale. Each time an individual makes a call the mobile phone operator registers the connection between the caller and the callee, the duration of the call and the coordinates of the phone tower communicating with the served phone, allowing to reconstruct the user's time-resolved trajectory.
CDR data have been extensively used to study human mobility due to the following advantages: They provide a means of sampling user locations at large population scales; they can be retrieved for different countries and geographic scales given their worldwide diffusion; they provide an objective concept of location, i.e., the phone tower. Mobile phone data can be retrieved in every country due to their worldwide diffusion: There are 6.8 billion mobile phone subscribers today over 7 billion people on the planet, with a penetration of 128% in the developed world and 90% in developing countries. Moreover, CDR data have proven to be a hi-fi proxy for individuals' movements and social interactions.
We use CDRs to define four individual measures which describe different aspects of individual human behavior: the volume of mobility, the diversity of mobility, the volume of sociality and the diversity of sociality. Though just a subset of the many possible behavioral aspects that can be extracted from mobile phone data, the four measures we consider in this paper are widely accepted by the scientific community and have been proven to capture important aspects of both human mobility and social relationships.
Each individual measure is computed for every user in our dataset based on locations and calls as recorded in the mobile phone data. In a second stage, we aggregate the four individual measures at the level of French municipalities and explore the correlations between the four measures and two external indicators of socio-economic development (per capita income and deprivation index). We find that the average mobility diversity of individuals resident in the same municipality exhibits a superior correlation degree with the two socio-economic indicators.
Next, we build regression and classification models to predict the external socio-economic indicators from the population density and the social and mobility measures aggregated at municipality scale. The diversity of human mobility significantly adds a predictive power in both regression and classification models, substantially more than the diversity of social contacts and demographic measures such as population density, a factor that is known to be correlated with the intensity of human activities.
The analysis presented above is based on a six-step analytical framework. Starting from the collected mobile phone data (a) a set of measures are computed which grasp the salient aspects of individuals' mobility and social behavior (b). As generally required by policy makers, official statistics about socio-economic development are available at the level of geographic units, e.g., regions, provinces, municipalities, districts or census cells. Therefore, the individuals in the dataset have to be mapped to the corresponding territory of residence, in order to perform an aggregation of the individual measures into a territorial measure (c, d). The phone tower where a user makes the highest number of calls during nighttime is usually considered her home phone tower. Then with standard Geographic Information System techniques it is possible to associate the phone tower to its territory.
The obtained aggregated measures are compared with the external socio-economic indicators to perform correlation analysis and learn predictive models (e). The predictive models can be aimed at predicting the actual value of socio-economic development of the territory, e.g., by regression models, or to predict the class of socio-economic development, i.e., the level of development of a given geographic unit as done by classification models. Finally, the predictions produced by the models are the output of the analytical framework (f).
Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli, Zbigniew Smoreda, Dino Pedreschi, Fosca Giannotti, An analytical framework to nowcast well-being using mobile phone data, International Journal of Data Science and Analytics, 2 (1), pp. 75--92, 2016. doi:10.1007/s41060-016-0013-2