The Pitch

What we want to do

Develop a generic keyword search and question answering infrastructure for distributed and structured enterprise data. Our framework will exploit the distribution of the data to improve both the interpretation and the federated execution of user queries while abiding by the users’ access restrictions. We will deliver an open-source version of the framework that implements keyword search functionality as well as prototypical extensions of the partners’ product suites and use case studies.

Why we want to do it

The main barriers to harness the full power of the data available in companies are twofold: data accessibility and data integration. DIESEL exploits the distributed nature of enterprise data to address these drawbacks. To this end, we will develop a novel federated and scalable approach to enable policy-aware search through distributed enterprise data sources. Moreover, we will fuel our approach by deploying and adapting scalable solutions for generating Linked Data in a non-invasive way.

Project Goal

Data accessibility and integration belong to the main barriers to harnessing the full power of data in companies. In large companies, business-relevant data can be distributed across thousands of data silos in different formats. Most of the existing semantic search solutions rely on extensions of text search (e.g., with domain-specific thesauri) and fail to make use of the semantics exposed in the data while interpreting queries.

DIESEL addresses these two drawbacks. 1) DIESEL will develop a novel scalable approach to enable search through distributed LD. Instead of relying on enhanced text search, our approach will use the semantics of the user input to generate formal queries (i.e. SPARQL) out of keywords. Users will be allowed read natural-language renditions of the interpretation of their query, know how the system understood their query and choose from different possible interpretations of their queries. 2) To facilitate the deployment of the search solution over all enterprise data, DIESEL will deploy and adapt non-invasive scalable solutions for generating Linked Data (LD) out of all types of data sources (i.e. unstructured, structured and semi-structured). Our approach will enable companies to quickly integrate the LD paradigm into their information landscape without having to alter existing sources of information.

The final output of DIESEL will be a novel scalable search paradigm for searching through Linked Enterprise Data. Using DIESEL in data-driven companies thus promises to increase the reuse of company internal knowledge, the productivity of employees, the reduction of parallel development and a better use of company-internal resources. Several companies (including BBC, New York Times, Best Buy and Renault) and governmental organizations are already using LD on a daily basis. With the current uptake of LD, these numbers promise to grow rapidly over the next years. Our aim is to address this niche in the current market with the DIESEL technologies.