Patterns in complex data such as public procurement are notoriously difficult to spot. As TBFY project partners, we have developed an approach which can help users identify and investigate recurrent behaviour in public spending data in order to uncover the dynamics of public spending and to spot regularities and anomalies in spending data. To do this, we have built the StreamStory tool, parts of which have also been modelled by JSI in other FP7 and H2020 research projects.

Public procurement and public spending usually evolve over time with some discernible pattern. That means, public spending data can be understood as event sequence data. StreamStory is designed to help users search for recurrent patterns by representing the data as a diagram of states and transitions, where recurrent patterns stand out visually.

To use the tool, we first have to prepare the data by extracting them from the Knowledge Graph and then turn them into appropriate feature vectors. This can be done through our web platform developed for the TheyBuyForYou project. Data is then imported into the StreamStory tool, which can detect structures and regularities within it.

 

Two examples: hospitals in Slovenia

In the following examples we are going to demonstrate how StreamStory, together with other tools developed for the TheyBuyForYou project can be used to explore the complexity of public procurement data in Slovenian hospitals.

We’ve selected all public procurement data from all Slovenian hospitals and exported them into a CSV file with sparse feature vectors using our web analytics platform. Data was imported into the StreamStory tool and we then built our model. For this model, we’ve used the following attributes: date of public procurement publication, criteria for selection, publication of tender in the EU, number of received bids, final value of the contract. The time unit is months, and time attributes were included in the model construction also. The number of states was set to 18.

Figure 1: The initial visualisation of the model.

In the tool, users can highlight different parameters. First, we’ve highlighted the final value of the contract. StreamStory recolours states in the central panel by the mean value of the highlighted attribute in each state – in our case the final value of the contract. This allows us to correlate the structure of the dataset with the final value of the contract. States with high average values are coloured orange while those with low values are coloured blue.

Figure 2: Revealed states with high final values of contracts.

The visualisation in Figure 2 shows the visual correlation of the structure of the dataset with the final value of the contract.

It clearly reveals one state (coloured orange), with the following average properties: criteria for selection was the price, tenders have been published in the EU, average number of received bids was 1 and the final value of the contract was 84.8 million EUR.

Figure 3: The most prevalent state in the “Slovenian hospitals” model.

Comparing with the most prevalent states shown in Figure 3, marked as “Criteria: price” (in Slovenian language: “Merila cena”), this state highly deviates from other states. This most prevalent state has the following average properties: criteria for selection was the price, tenders have not been published in the EU, average number of received bids was 2 and the final value of the contract was around 57,443 EUR.

Another most prevalent state, dubbed as “Published in EU: yes” also has a low number of bids. In that state, the criteria for selection were the price in 96% of cases, all tenders have been published in the EU, the average number of received bids was 1.7 and the final value of the contract was around 618,881 EUR.

Introducing further recolouring mechanisms based on selected attributes reveal another state (Figure 4), which plots final values of the contract above average. This state has the following average properties: criteria for selection was in 92% the price, all tenders have been published in the EU, the average number of received bids was 9.4 and the final value of the contract was around 25,1 million EUR.

Figure 4: State with the second highest average value of contracts.

The distribution of numbers of received bids shows that in general, hospitals in Slovenia only receive a low number of bids. This could be an indicator of low competition. However, this is not always the case. By selecting colouring based on this attribute – number of received bids – we can quickly spot one state, with an average number of 57,18 received bids (Figure 5). This state has the following average properties: criteria for selection was always the price, 82% of the tenders have been published in the EU and the final value of the contract was around 3,6 million EUR.

Adding to this, we can spot another two states with above average numbers of received bids, with the following properties: in the first, the criteria for selection were in all cases price, 85% of the tenders have been published in the EU, the average number of received bids was 32,2 and the final value of the contract was around 1,2 million EUR. In the second case, the criteria for selection were in all cases price, 79% of the tenders have been published in the EU, the average number of received bids was 16 and the final value of the contract was around 1,6 million EUR.

Figure 5: State with the highest average number of received bids.

The complexity and the vast amount of public procurement data means that patterns in that data are hard to spot even for people who have a solid background in data analytics. But searching for recurrent patterns and anomalies is much easier if these patterns stand out visually. The two examples above show how the StreamStory tool can help users identify and investigate recurrent behaviour in this data as it visually exposes interesting patterns and also provides automatically generated suggestions of possible interpretations and descriptions.

StreamStory has been designed to be easy to use. Admittedly, it does require appropriate preparation of input data (i.e. a transformation of the dynamic network into sparse feature vectors). Fortunately, the export function on our web analytics platform solves this problem. Together with our web analytics platform, StreamStory is a very useful tool for domain experts such as public procurement officials and decision makers, investigative journalists or other interested members of the public. StreamStory, together with our web analytics platform, can raise significantly levels of transparency and can help detect unwanted behaviour.