MCP2 - Distributed Information system

MCP2 - Consistent distributed information system

MCP2 leaders: Cyril Pommier (INRA - URGI) and Pascal Neveu (INRA - MISTEA).

Phenomic experiments are expensive and cannot be repeated because the environmental conditions in which an experiment was carried out will never be observed again. Hence, it is essential to analyze jointly several experiments, and to reanalyze them with novel hypotheses. For that, researchers who did not perform the experiment need to recover all necessary information. The Phenome Hybrid Information System (PHIS) was developed for that purpose.

Developing an information system (PHIS)

PHIS allows organizing and storing datasets originating from field and controlled conditions phenomic experiments. It uses ontologies and semantic graphs for non-ambiguously identifying and for relating all objects, events and traits in an experiment. This original architecture is a powerful tool for integrating and managing data from multiple experiments and platforms, for creating relationships between objects and enriching datasets with knowledge and metadata. It interoperates with external resources via Web services (e.g. breeding API), thereby allowing data integration into other systems, e.g. modelling platforms or genetic databases. It is progressively deployed in Phenome-Emphasis local infrastructures, but also over Europe and non-European countries.

  • Identification. Our open data strategy requires that all objects in platforms have unambiguous and unique identification (plants, devices or sensors). A URI (Universal Resource Identifier) system is used and combined with QR codes.

An example of the use of Unique Resource Identifiers (URIs) for identifying all objects present in single images taken in (a) greenhouse and (b) field experiments

  • Ontologies and semantic graphs relate  objects and events (cultivation practices, errors, etc.) and phenotypic traits. This is done.  in interaction with international partners such as PODD (Australia), the Crop Ontology, the Agroportal and Planteome. The figure below presents a semantic graph relating leaf samples, plants, events, experiment and users.

Establishing the architectures of the information systems in each local infrastructure

  • PHIS is being implementation in all installations, field or controlled conditions. In order to achieve scalability, abstraction models manage various objects (plants, images or sensor outputs) usable in controlled and field installations.
  •  The system includes the technical requirements for open data strategies, e.g. access right for members of the consortium, the academic community and private companies4. It also includes requirements for sharing published experiments for reuse and reanalysis of data.

A global archiving system

We ensure long term conservation of volumes of 300 Tb/year. This requires determining the life cycle of different data categories, develop an archiving strategy, improve data integrity, and ensure their completeness and authenticity. We use France Grille, related to the European Grid Initiative, and IFB (ELIXIR). The storage and archiving system relies on iRods.

In the next years, a priority will be given to develop a "big data strategy"

  • Enable multiscale integrative analysis using linked data, identification and ontologies. Currently, the Phenome information system involves data at different scales (organ, plant, canopies). Trans sale analyses rely on ontologies able to model integrated traits (e.g. 'leaf area' at canopy level) and their relation to related elemental traits (e.g. 'individual leaf area', 'leaf number per plant', 'responses to environmental cues').
  • Statistical and process based models to extend phenomic results for simulation of genotype behavior in current and future climatic scenarios. (2022) We aim at favoring the exchange between phenomic datasets and plant/crop models. For that, we will formalize and trace the calculation of genotype-specific parameters from plant traits, in such a way that they are usable by plant/crop models for predicting genotype behavior in diverse climatic conditions.
  • We will develop and organize a user community for PHIS, in order to share the workload of support and computer developments, including a sustainable support service.

Modification date : 31 August 2023 | Publication date : 26 July 2013 | Redactor : Pamela LUCAS