The increasing amount of open procurement data enables us to analyse public spending to deliver better quality and more economical public services, prevent fraud and corruption, and build healthy and sustainable economies. In this respect, the TheyBuyForYou (TBFY) project aims at building a technology platform consisting of a set of modular web-based services and APIs, to publish, curate, integrate, analyse, and visualise an open, comprehensive, cross-border and cross-lingual procurement Knowledge Graph (KG), including public spending and corporate data from multiple sources across the European Union.
Figure 1. Main concepts of the OCDS ontology.
We have integrated two high-quality datasets according to an ontology network, procurement (e.g., tenders and contracts) and company (i.e., legal entities) data, to form an interconnected knowledge graph for public procurement. The ontology network includes an ontology for representing procurement data based on Open Contracting Data Standard (OCDS) and another ontology for representing company data, namely the euBusinessGraph ontology. Figure 1 depicts the main concepts of the OCDS ontology, including entities such as contracting process, tender, and award.
Data is ingested from two main providers: OpenCorporates for supplier data (i.e., company) and OpenOpps for procurement data. OpenOpps has gathered over 3,000,000 tender documents from more than 685 publishers (worldwide) through web scraping and by using open APIs, while OpenCorporates currently has 140,000,000 entities collected from national registers. The data collected from OpenOpps and OpenCorporates is openly available under the Open Database License (ODbl) on GitHub.
Figure 2. TheyBuyForYou architecture.
The TheyBuyForYou architecture is depicted in Figure 2 including the data ingestion process for the KG. The data ingestion process comprises several steps using data APIs of both providers, including data curation (e.g., handling missing values and duplicates), matching suppliers appearing in tender data against canonical company records obtained from the OpenCorporates data set (i.e., reconciliation), and translating datasets into the underlying graph data representation (i.e., RDF) with respect to the ontology network. The current release of the Knowledge Graph (KG) covers data from January 2019 onwards. New data is onboarded every night. As of June 2020, the Knowledge Graph consists of more than 112 million triples (i.e., records) and contains information about 1,18 million tenders, 1,32 million awards and more than 90 thousand companies that have been matched to suppliers in awards.
The KG is available as open data. The data is available online and can be explored through several services. We also provide monthly data dumps of the KG. See http://data.tbfy.eu for more information.
|KG SPARQL endpoint||http://data.tbfy.eu/sparql||SPARQL endpoint that allows users to query the KG using SPARQL. We are using Apache Jena Fuseki as the SPARQL server and Apache Jena TDB as the underlying database.|
|KG API||http://data.tbfy.eu/kg-api||REST API that allows you to query the KG. See https://github.com/TBFY/knowledge-graph-API/wiki for more info.|
|KG data dump||http://dump.tbfy.eu||Monthly data dump of the Knowledge Graph in RDF format containing procurement data and company data (for matching suppliers) for the period January 2019 onwards. In addition to the KG data dump in RDF, we have also made available the source data input for the KG, i.e., procurement release data from OpenOpps and company data (for matching suppliers) from OpenCorporates in JSON format.|
|YASGUI for KG||http://yasgui.tbfy.eu||YASGUI (Yet Another SPARQL GUI) is a 3rd party web application that we have set up that allows users to query the KG using SPARQL.|
The KG is available as open source software. All software components and ontologies developed in the TBFY project to create the Knowledge Graph have been released as open source on GitHub.
|Knowledge Graph (KG)||https://github.com/TBFY/knowledge-graph||Repository where all source code, information and documentation for creating, deploying and using the KG are found.|
|OCDS Ontology||https://github.com/TBFY/ocds-ontology||Repository for the OCDS ontology.|
|KG API||https://github.com/TBFY/knowledge-graph-API||Repository for the core API. It also contains the API documentation and SPARQL queries of the TheyBuyForYou project.|
|KG API Gateway||https://github.com/TBFY/api-gateway||Repository for the API Gateway. It provides a flexible abstraction layer and a single entry point that securely manages communication between TBFY clients and online tools via API.|
In addition to the KG and data ingestion components, we are developing a set of online toolkits including analytics components, such as a Cross-Lingual Search API; anomaly detection components; a comprehensive set of guidelines for data visualisation and interaction design; and a design for a story-telling tool (see an overview here). A series of real-life business cases being implemented on top of the knowledge graph and online tools offered by the platform. Do you want to explore TheyBuyForYou KG and platform? You can:
- follow the guidelines here to create a local test bed: https://github.com/TBFY/platform
- have an overview of all the platform components: https://tbfy.github.io/platform/