Data collection is the systematic gathering of information from various sources to answer research questions, test hypotheses or make informed decisions. It is a crucial step in the research process, as the quality and relevance of the data collected directly impacts the validity of the findings.

Effective data collection requires careful planning, appropriate methods and adherence to ethical guidelines. By following a structured approach, researchers can ensure that the data collected is reliable, representative and suitable for their analysis.

 

SOSfood project aims to leverage data and AI to promote sustainability and inclusion across the food system. The project seeks to create a robust data ecosystem to support decision-making and innovation in agricultural production, food processing and distribution.

 

To achieve such a challenging purpose, an effective Data Collection Strategy is necessary to provide guidance on defining the scope, selecting tools and methods, developing a data collection plan and managing the collected data.

 

SOSfood project has a multi-factor approach on the food system that means to seek and explore a wide range of Data: some of these data are already available within the food system, but the multi-factor point of view requires to search also through new types of unexplored data.

 

Concerning the food chain, data encompasses agricultural production, food processing, transformation, distribution, sales, and consumer behaviour. Beyond these areas, it is necessary to find and collect data related with environment, health, sustainability, social and economic aspects.

 

Here comes a challenge: where and how can relevant data be found, gathered and used?

 

The first possibility is to search for open data: digital information that can be freely used, reused and redistributed, with the only condition of attributing the source or acknowledging its authorship and, in some cases, sharing the result under the same conditions. They are designed to be accessible to all, promoting transparency, innovation and the efficient use of information. 

Open data are accessible through online platforms, that in some cases have APIs: an API (Application Programming Interface) is like a bridge that connects different programs or systems so that they can talk to each other, making easier to gather and use the available data.

 

Examples of food related open data platforms are:

  • European Data Portal (EDP): offers over 79,000 datasets related to food.
  • FAOSTAT: provides free access to food and agriculture data for over 245 countries and territories and covers all FAO regional groupings from 1961 to the most recent year available. 
  • OpenFoodFacts: a food products database made by everyone, for everyone.

 

Another opportunity is to leverage the EU projects network, making connections with “twin” projects, such as FoodDataQuest. This kind of collaboration allows to share resources and cooperate for common objectives. In fact, FoodDataQuest will develop ground-breaking data-driven solutions based on an integrated methodological framework that explores new types of private and public data sources, data from “unconventional players” and non-competitive data, and leverages data sharing mechanisms in order to provide the EU food chain stakeholders with increased insights and enhance the transition towards sustainable healthy diets.

 

Last, but not least, the European data strategy sets out the vision of creating a single market for data where it can flow freely within the EU and across sectors. The creation of EU-wide, common, interoperable data spaces in strategic sectors is a pillar of the data strategy and will help overcome existing technical and legal barriers to data sharing and unleash the potential of data. These data spaces will bring together relevant data infrastructures and governance frameworks to facilitate data pooling and data sharing.

 

From a technical point of view, data spaces are a concept of data management: they put technology systems and rules in place to integrate and exchange data. What emerges is a federated data ecosystem based on shared policies and rules. Data is distributed across storage points and integrated on the basis of what is needed. Tools are provided to discover, access, and analyse data that is distributed across industries, companies and entities.

 

Open data and data spaces represent an extremely valuable resource for developing a more sustainable society: for this reason we warmly encourage all the government and administration to foster the participation in data spaces and incentivise the creation of APIs for data sharing.

 

Once the data is available, the strategy focuses on Data Quality, that refers to the state of qualitative or quantitative pieces of information and it measures how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose. The criteria to meet the required Data Quality has been defined in the strategy and ensured through Data Characterisation process, aiming to create a useful and reliable dataset for analysis and effective decision-making:

 

A Data Characterisation Template has been established for the Data Collection process:

 

In this stage of the project, Data Collection is ongoing, covering the scope of pilot User’s stories, that represent relevant scenarios for the sustainability in the food system.

 

In the next stage, the first prototype of a predictive Dashboard will be developed, requiring a big amount of data to train the model.  

Everyone can help to overcome this challenge: if you want to contribute, we will be very grateful if you can contact us to report sources of open data related to food system or facilitate the use of the available data creating APIs. A more sustainable food system will be the best reward!