The 2019 Insight for Wellbeing Task: Multimodal personal health lifelog data analysis

Task description
Participants receive a set of weather and air pollution data, lifelog images, and tags recorded by people who wear sensors, use smartphones and walk along pre-defined routes inside a city and develop approaches that process the data to obtain insights about personal wellbeing.

Participants in this task tackle two challenging subtasks:

Segment Replacement: Task participants develop a hypothesis about the associations within the heterogeneous data and build a system that is able to correctly replace segments of data that have been removed.

Personal Air Quality: Task participants develop approaches to automatically predict personal AQI (Air Quality Index) at specific positions and time durations using either the underspecified data or the full data from a subset of data sources. The aim of Personal AQI is to measure the wellbeing of individual people with respect to the quality of the air that they are breathing.

Task background and motivation
The association between people’s wellbeing and properties of the surrounding environment is an important area of investigation. Although these investigations have a long and rich history, they have focused on the general population. There is a surprising lack of research that investigates the impact of the environment at the scale of individual people. At personal scale, local information about air pollution (e.g. PM2.5, NO2, O3), weather (e.g. temperature, humidity), urban nature (e.g. greenness, liveliness, quietness), and personal behavior (e.g. psychophysiological data) play an important role. It is not always possible to gather plentiful amounts of such data. As the result, a key research question remains open: Can sparse or incomplete data can be used to gain insight into wellbeing? In other words, is there a hypothesis about the associations within the data so that wellbeing can be understood by using a limited amount data? Developing hypotheses about the associations within the heterogeneous data contributes towards building good multimodal models that make it possible to understand the impact of environment on wellbeing at the local and individual scale. Such models are necessary since not all cities are fully covered by standard air pollution and weather stations, and not all people experience the same reaction to the same environment situation. Moreover, images captured by the first-person view could give important cues to help understand that environmental situation in cases in which precise data from air pollution stations is lacking.

Target group
This task targets (but is not limited to) researchers in the areas of multimedia information retrieval, machine learning, AI, data science, event-based processing and analysis, multimodal multimedia content analysis, lifelog data analysis, urban computing, environmental science, and atmospheric science.

Data
The Insight for Wellbeing task introduces a novel dataset, namely SEPHLA created by the data collection campaign, namely DATATHON organized in Fukuoka City, Japan (datathon.jp) in 2018 and 2019. The SEPHLA is dataset at the individual scale contained walking routes (e.g. street names, GPS, time), psychophysiological (e.g. footsteps, heart rate), pollutant concentrations (e.g. PM2.5, NO2, O3), weather variables (e.g. temperature, humidity), first-person view images, urban perception tags (e.g. lively, greenness), and emotional tags (e.g. excited, depressed) data collected via wearable sensors, lifelog-cameras, and smart-phones attached to each data collector. The data come with a series of csv and jpg files indexed with the IDs of data collectors. All individual information, especially in images, is blurred for privacy purposes. The copyright of SEPHLA belongs to the National Institute of Information and Communications Technology, Japan (NICT) and will be released for participants only for research purposes.

Ground truth and evaluation
The ground truth for the dataset of the two subtasks is collected as follows:
  • For the Segment Replacement subtask: The correlation among data types collected along a route during a special time duration is manually calculated. All data segments with high correlation are extracted and labeled. Some of data types in these segments will be hidden and the rest is released for participants. For images data, concepts, categories, and scene are automatically detected using Google Visual API.
  • For the Personal Air Quality subtask: A set of specific time segments along the routes is labelled with information based on global AQI provided by Fukuoka City plus local AQI calculated by individual sensing data, as well as with tags contributed by the datathon participants that reflect their perceptions of the urban environment and experienced emotions. Images are also semi-automatically annotated with labels relating to the impact of air pollution and weather on vision such as cloudy, fog, windy, and sunny.
The main evaluation metric will be a convent metric used to evaluate prediction tasks.

Recommended reading
[1] Sato, T., Dao, M.S., Kuribayashi, K., and Zettsu, K.: SEPHLA: Challenges and Opportunities within Environment – Personal Health Archives, MMM 2018.
[2] Zhao, P. and Zettsu, K.: Convolution Recurrent Neural Networks for Short-Term Prediction of Atmospheric Sensing Data, The 4th IEEE International Conference on Smart Data (SmartData 2018), pp.815-821
[3] Dao, M. S. and Zettsu, K.: Complex Event Analysis of Urban Environmental Data based on Deep CNN of Spatiotemporal Raster Images, 2018 IEEE International Conference on Big Data (BigData 2018).
[4] datathon.jp
[5] Song, H.J. et al.: Association between Urban Greenness and Depressive Symptoms: Evaluation of Greenness Using Various Indicators, Int. Journals of Environmental Research and Public Health, 16(173).
[6] D. Santani, S. Ruiz-Correa, and D. Gatica-Perez: Looking South: Learning Urban Perception in Developing Cities, ACM Transactions on Social Computing, 1(3), Article 13, Dec. 2018
[7] Dang-Nguyen, D.T., Piras, L., Riegler, M., Zhou, L., Lux, M., and Gurrin, C.: Overview of ImageCLEFlifelog 2018: Daily Living Understanding and Lifelog Moment Retrieval, CLEF2018 Working Notes, Avignon, France, 2018.

Task organizers
Minh-Son Dao, National Institute of Information and Communications Technology, Japan (NICT)
dao (at) nict (dot) go (dot) jp
Tomohiro Sato, National Institute of Information and Communications Technology, Japan (NICT).
Duc-Tien Dang-Nguyen, University of Bergen, Norway (UiB)
Cathal Gurrin, Dublin City University, Ireland (DCU)
Thanh Nguyen, University of Information Technology, Vietnam (UIT)

Task schedule
Development data release: 31 May
Test data release: 1 July
Runs due: 20 September
Results returned: 23 September
Working Notes paper due: 30 September
MediaEval 2019 Workshop (in France, near Nice): 27-29 October 2019