In TheyBuyForYou we have been working on a layered architecture of data services, ontologies, core APIs and tools that allow different levels of access and use of our procurement knowledge graph. As shown in the figure, five main layers have been defined, those corresponding to data, tools, schemas, core APIs and added-value services that will be explained below.

Figure 1. TheyBuyForYou tools scheme

Data

In this bottom layer we can find the data that feeds both the knowledge graph and the document database. The knowledge graph data is obtained from the OpenOpps and OpenCorporates datasets and, through the data ingestion tool, they are transformed into RDF. The platform has the following data:

  • TBFY knowledge graph: It is a database that contains the information about tenders, contracts, awards, organisations and contracting processes, used by the API Gateway.
  • Document repository: It is a repository that contains all the legal documents, both the descriptions of the tenders and the JCR.

Schemas

This layer contains the vocabularies of our domain. These vocabularies are the intermediaries that get the knowledge graph to be understood with tools like SPARQL GUI or R4R. The platform has the following schemas:

  • TBFY ontology: It imports the OCDS ontology (for procurement data) and the euBusinessGraph ontology (for company data). In addition, it contains a few extensions in order to represent additional meta information needed for the TBFY KG.
  • euBusinessGraph ontology: It is an ontology for company data, originally developed in the euBusinessGraph project.

Tools

This layer contains tools built or used to create the Knowledge Graph and provide access to it. Among the types of tools, there are those tools that feed databases to those ones that query the TheyBuyForYou SPARQL endpoint. The platform has the following tools:

  • Harvester: It downloads articles and legal documents from public procurement sources (OpenOpps, JRC-Acquis or TED) and indexes them into SOLR to allow performing complex queries and visualising results through Banana.
  • R4R: It allows building and deploying RESTful services from SPARQL queries. The core API uses it to browse the TBFY knowledge graph.
  • KG data ingestion pipeline: Data ingestion pipeline downloads OCDS releases in JSON format and reconciled supplier-company records in JSON format, enriches and transforms the data to RDF (using RML), and publishes the data to the TBFY KG database.
  • SPARQL GUI for TBFY KG: It uses YASGUI (Yet Another SPARQL GUI), as a web application to query any SPARQL endpoint.
  • OptiqueVQS: OptiqueVQS enables end users with no technical background and skills to transform their information needs into SPARQL queries visually.

Core APIs

This layer contains the set of core APIs built or used in the project. These core APIs are implemented with the basic resources to extract information from the knowledge graph, from the document repository or even from external data sources. The platform has the following core APIs:

  • Knowledge graph API: It allows obtaining information about tenders, organisations, awards, contracts and contracting-processes from the RDF triple store.
  • Public procurement OCDS API: It allows obtaining information about public procurement based on the OCDS standard, currently applied to the data from the Zaragoza City Council.
  • OpenCorporates companies API: It provides access to data about 135 million companies from primary public sources.
  • OpenCorporates reconciliation API: It allows OpenRefine users to match company names to legal corporate entities getting more information about companies.
  • OpenOpps API: It provides access to tender and contract data from a range of European government bodies, formatting according to OCDS.
  • librAIry API: it creates topic-based representations of documents (e.g., tenders) to relate them semantically.
  • Wikifier Web service: It takes a text document as input and annotates it with links to relevant Wikipedia concepts.

Added-value services and tools

In this top layer we find non-basic services and tools, which go beyond standard ones and have extended features and add-ons to basic core functions. The platform has the following added-value services and tools:

  • API Gateway: It provides a flexible abstraction layer and a single-entry point to manage the communication between TBFY clients and online tools.
  • Search API: It explores collections of multilingual public procurement data through a Restful API.
  • Storytelling: It is a client-side JavaScript framework designed for the purpose of supporting authors of data stories.
  • Streamstory: It is a tool, which is intended to help with analysis and interpretation of time- varying data.
  • Anomaly detection: It is an online toolkit exploring public spending and tender data and detecting anomalies in them.
  • Average payment period to suppliers: It is an indicator that measures the delay in the payment of commercial debts in economic terms for entities associated to the Zaragoza city council.
  • Compra Pública Inclusiva (COPIN): COPIN aims at providing better understanding on how public administrations specify and evaluate public tenders.

A simple usage example

María works in a Spanish company that sells office furniture and consumables. She wants to have access to public procurement data in Europe, since her company is planning to apply to a tender in the United Kingdom about “office furniture and consumables”. She wants to show to business managers at her company a report that includes all foreign companies that have been awarded in similar tenders.

María has several ways to solve this problem through the different services offered by the TheyBuyForYou platform. One way would be to use the Search API to find documents that contain “office furniture and consumables”.

(curl -X POST “https://tbfy.librairy.linkeddata.es/search-api/items” -H “accept: application/json” -H “Content-Type: application/json” -d “{ \”lang\”: \”en\”, \”size\”: 100, \”source\”: \”tender\”, \”terms\”: \”office furniture and consumables\”, \”text\”: \”office furniture and consumables\”}”)

Once María has obtained the best scored documents, she obtains related tenders and contracting processes. To do this, she can use the knowledge graph API and obtain the tender and contracting process associated with each document. For example, the highlighted document (identifier ocds-0c46vo-0001-04f6012f-76e8-4095-b605-5a446ad0cad5_ocds-b5fd17-1273acb5-4a24-4274-b1e4-ba2ec108f5a8-escc—023858—award):

Now, she has obtained the identifier of the related contracting process “ocds-0c46vo-0001-04f6012f-76e8-4095-b605-5a446ad0cad5”. Using this identifier in the knowledge graph API, the list of awards can be obtained:

In this step, she obtains the identifier of the related award “ocds-0c46vo-0001-04f6012f-76e8-4095-b605-5a446ad0cad5_5bae0886-0c7d-4c11-8fb4-fb839aaf23b6”. Using this identifier in the knowledge graph API, the list of suppliers can be obtained:

Finally, if she wanted more information from the awarded organisation (identifier gb-04082871), she may use again the knowledge graph API like this:

This process can be repeated as many times as organisations need to be included in the report. And of course, the process is thought to be automated, by using or creating a customised added-value service that would execute all the operations described. For this example, the platform has example notebooks (e.g., the one described in [1]) of how to use all these resources.

 

More information about the catalogue of tools and links can be found at: https://tbfy.github.io/platform/

 

[1] https://colab.research.google.com/github/TBFY/api-gateway/blob/master/notebooks/similarity_organisations_ex.ipynb