Snapshots of data were taken from BRAD, HR, and HESA and dumped into csv & spread sheet format. I have read access to SITS data at table level (this made it extremely easy to get all the data required for the CERIF schema (SITS was used to extract data on Research students to enable reporting on supervision)
We used a sample report to determine which tables to use in the CERIF schema (time constraints determined that we wouldn’t be able to do a full mapping) see below the diagram of the tables used in our project.
I then modelled this data by creating a spread sheet model of the CERIF schema. This was the quickest way to get the data into the CERIF Schema format due to time restrictions on the project. If this were to be put into production then automatic data extraction from the data sources would be explored.
Lots of issues with Publications when translating into CERIF Schema
– Different field sizes
– Date formats needed tweaking
– Null values from the sample data needed to have dummy defaults set
The data was then imported into a MySql database (installed on the BRUCE server)
SolrEyes was installed from BitBucket (instructions detailed here)
Download and install Apache Solr 3.1. and initiate on the server
Initiated SolrEyes on the Server
It took about 10-15 minutes to install the software and get it up and running on the server.
BRUCE Data Model – modelled from CERIF schema – project time limits restricted how much of CERIF we could sensibly prototype
The Interface (displaying live data from Brunel) – facets on left hand side – used to refine the data displayed – sorting & paging also implemented
Solr Indexer – middle layer pulls data from MySQL – this is where the work is done – entities defined and the data modelled etc….
Report generated from BRUCE data (modelled on the CERIF schema) using Datavision (Open Source Report Generator) – can be exported in csv, xml, excel, pdf, word formats.
BRUCE CERIF Data Model modelled in Excel as spread sheets (a sheet per table)
Spreadsheet One – Staff
Spreadsheet Two – Research Students
Spreadsheet Three – Organisational Levels
Spreadsheet Four – Funding
Spreadsheet Five – Publications
How I found working with CERIF
CERIF is predominantly relational (but not pure – the semantics took a while to understand) the link tables & class tables was a bit Object-Orientated – and for me it didn’t quite make sense. But it incorporated flexibility into the schema and there is the scope to use in many ways. Because our time was restricted we didn’t spend too much time analysing how to use it. Richard & I came up with the BRUCE model in a couple of hours and went with it. Once the entities were defined in Solr – we found that we had to populate some of the link/class tables with unnecessary duplication of data just to get the interface to work – although this goes against my understanding of how relational data works the indexer was really fast – especially when we used the ‘test data’ – the Brunel data snapshot is quite small so wouldn’t determine speed efficiency.
If this prototype were to be developed into production, we would need to analyse the data structures & mappings in more depth. We have all the data on-line and automation from the various systems could be developed so the data amalgamation would be seamless. Time determined that reporting was just touched upon – I used an open-source tool ‘DataVision’ to connect directly to the database and a very simple report was produced.
Reporting would be a separate component using data extracted from the queries in the interface – pushed into a temp table – would need more development and thought to achieve this.