Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing Knowledge in Dataspaces #8

Open
rohitadeshmukh13 opened this issue Feb 7, 2025 · 1 comment
Open

Representing Knowledge in Dataspaces #8

rohitadeshmukh13 opened this issue Feb 7, 2025 · 1 comment
Assignees

Comments

@rohitadeshmukh13
Copy link

Challenge Description

  • Dataspaces adopt a data co-existence paradigm, allowing heterogeneous, distributed, and loosely integrated data sources to be part of the same ecosystem.
  • That said, the ultimate goal of dataspaces is to enable seamless discovery and sharing of data among its participants, while ensuring sovereignty and trust.
  • Therefore, at this stage in the evolution of dataspaces, the lifecycle management and effective reuse of knowledge becomes immensely important.
  • Many dataspace initiatives, including IDSA, Gaia-X, and DSSC, advocate for the use of Semantic Web technologies in dataspaces to effectively achieve these goals.
  • However, the concept of Knowledge in the context of dataspaces remains poorly defined. For example, should "Semantics" themselves be considered "Knowledge", or what are the requirements for Semantics to be considered Knowledge?
  • In addition, the definition of explicit requirements for knowledge representation in dataspaces is currently missing.

Impact and Importance

  • The definition, representation, and management of knowledge play a key role in dataspaces.
  • Without well-defined knowledge structures, interoperability within and across dataspaces remains a challenge, leading to increased data silos rather than seamless integration.
  • Various types of knowledge are required for enabling provision of high-quality data, modeling of comprehensive metadata, and enabling advanced data discovery and querying capabilities.
  • Without these prerequisites, the vision of a digital single market within the EU cannot be fully realized.

Desired Solution

Proposed approach

  • Definition of the concept of Knowledge in the context of dataspaces.
  • Systematic identification of comprehensive (why, what, where, how, etc.) requirements for representing knowledge in dataspaces.
    • Starting on top of existing work such as DSSC Design Principles & Building Blocks [3], IDS RAM Functional Layer [4] and/or IDSA Dataspaces Protocol [5].
    • For each functional requirement of dataspaces, define the corresponding knowledge representation requirements.
  • Identification of existing artifacts, tools and technologies that can be reused to fulfill these requirements.
  • Identification of gaps in research, artifacts, methods, availability/functionality of tools and technologies, etc.
  • Long-term goal: Creation of standardized mechanisms and artifacts (frameworks, methodologies, ontologies) for representing knowledge across dataspaces.

Preliminary example

  • Starting with an already identified functional requirement of dataspaces (broad >> to >> specific):
    Data sharing >> Dataset discovery >> Specifying metadata while registering a dataset as an Asset through a Connector to a Metadata Broker
  • Corresponding requirements for knowledge representation in dataspaces:
    • What (knowledge is required in this scenario): Semantic metadata about the dataset:
      • domain agnostic (API endpoint, HTTP method, data format/media type, pricing, etc.) as well as
      • domain specific (E.g., for a paintings dataset provided by a museum: Provider museum name, Museum ISIL, Artist name, Artist ULAN ID, Paintings Genre, Dating of the work, etc.)
    • Why (is knowledge relevant in this scenario): High-quality semantic metadata enables advanced querying, inference, and reasoning capabilities.
    • How (should knowledge be represented for this scenario): By creating and instantiating semantic models (ontologies, vocabularies, and application profiles)
    • When (or at which stage should knowledge be represented in this scenario):
      • At design-time, creation of semantic models;
      • At run-time, instantiation of the semantic model while registering the dataset
    • Where (should knowledge be represented in this scenario):
      • Semantic models can be created using an IDE;
      • Semantic models can be stored in a Vocabulary Hub;
      • Semantic models can be instantiated for the given dataset and registered through a Connector to a Metadata Broker
    • Who (should create/manage knowledge in this scenario): For example,
      • The dataspace governance body is responsible for providing semantic metadata templates (application profiles).
      • The dataset provider is responsible for instantiating the relevant model, creating metadata, and registering it for their dataset.
  • Identification of reusable artifacts, tools and technologies: RDF, RDFS, OWL, SHACL; Semantic Web IDE (Protégé, TopBraid Composer), Vocabulary Hub (IDS Vocabulary Provider, Semantic Treehouse), Connector (EDC, TNO Security Gateway) Metadata Broker (XFSC, IDS Metadata Broker)
  • Identification of gaps (where known/documented):
    • Usability challenges: Semantic Web is perceived as complex by developers and domain experts [1, 2]. There is a need for user-friendly Open Source tools such as Semantic Web IDEs in dataspaces [2].
    • Technology gaps:
      • The Connector implementation EDC does not support a proper representation of semantic models.
      • The Metadata Broker implementation XFSC currently does not use an RDF database, but uses Postgres. XFSC does not support SPARQL.

Acceptance Criteria

Short-term: A peer-reviewed publication that

  • Defines knowledge in the context of dataspaces.
  • Identifies requirements for representing knowledge in dataspaces, using the already documented functional requirements for dataspaces by the existing initiatives (DSSC, IDSA, Gaia-X).
  • Identifies existing artifacts, tools and technologies that can be used as well as gaps.

Long-term: Development of standardized mechanisms and artifacts (frameworks, methodologies, ontologies) for representing knowledge for the next-generation of cross-domain dataspaces.

References and Resources

[1] Bader, Sebastian, et al. "The international data spaces information model–an ontology for sovereign exchange of digital content." International Semantic Web Conference. Cham: Springer International Publishing, 2020.
[2] Deshmukh, Rohit A., et al. "Challenges and Opportunities for Enabling the Next Generation of Cross-Domain Dataspaces." The Second International Workshop on Semantics in Dataspaces, co-located with the Extended Semantic Web Conference. 2024.
[3] DSSC Design Principles & Building Blocks. https://dssc.eu/page/knowledge-base
[4] IDS RAM Functional Layer. https://docs.internationaldataspaces.org/ids-knowledgebase/ids-ram-4/layers-of-the-reference-architecture-model/3-layers-of-the-reference-architecture-model/3_2_functionallayer
[5] IDSA Dataspaces Protocol. https://docs.internationaldataspaces.org/ids-knowledgebase/dataspace-protocol

@TallTed
Copy link

TallTed commented Feb 7, 2025

  • However, the concept of Knowledge in the context of dataspaces remains poorly defined. For example, should "Semantics" themselves be considered "Knowledge", or what are the requirements for Semantics to be considered Knowledge?

The answer will depend on the question being addressed; that is, these concepts should remain loosely (which differs from poorly) defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants