BRUCE to CERIF Data Mapping

Data Sources

Snapshots of data were taken from BRAD, HR, and HESA and dumped into csv & spread sheet format.  I have read access to SITS data at table level (this made it extremely easy to get all the data required for the CERIF schema (SITS was used to extract data on Research students to enable reporting on supervision)

We used a sample report to determine which tables to use in the CERIF schema (time constraints determined that we wouldn’t be able to do a full mapping) see below the diagram of the tables used in our project.

I then modelled this data by creating a spread sheet model of the CERIF schema.  This was the quickest way to get the data into the CERIF Schema format due to time restrictions on the project.  If this were to be put into production then automatic data extraction from the data sources would be explored.

Lots of issues with Publications when translating into CERIF Schema

          Different field sizes
          Date formats needed tweaking
          Null values from the sample data needed to have dummy defaults set

The data was then imported into a MySql database (installed on the BRUCE server)

SolrEyes was installed from BitBucket (instructions detailed here)

Download and install Apache Solr 3.1. and initiate on the server

Initiated SolrEyes on the Server

It took about 10-15 minutes to install the software and get it up and running on the server.

BRUCE Data Model – modelled from CERIF schema – project time limits restricted how much of CERIF we could sensibly prototype

 

 

CERIF model

 

The Interface (displaying live data from Brunel) – facets on left hand side – used to refine the data displayed – sorting & paging also implemented

SolrEyes

The SolrEyes interface

Solr Indexer – middle layer pulls data from MySQL – this is where the work is done – entities defined and the data modelled etc….

Report generated from BRUCE data (modelled on the CERIF schema)  using Datavision (Open Source Report Generator) – can be exported in csv, xml, excel, pdf, word formats.

BRUCE CERIF Data Model modelled in Excel as spread sheets (a sheet per table)

Spreadsheet One – Staff

cfPers

cfPersName

cf_Pers_Pers

cfPers_Class

cfPers_OrgUnit

cfPers_Fund

cfPers_ResPubl

Spreadsheet Two – Research Students

cfPers

cfPersName

cf_Pers_Pers

cfPers_Class

cfPers_OrgUnit

Spreadsheet Three – Organisational Levels

cfOrgUnit

cfOrgUnitName

cfOrgUnit_OrgUnit

Spreadsheet Four – Funding

cfFund

cfFundDesc

Spreadsheet Five – Publications

cfResPubl

cfResPublTitle

cfResPubl_Class

How I found working with CERIF

CERIF is predominantly relational (but not pure – the semantics took a while to understand) the link tables & class tables was a bit Object-Orientated – and for me it didn’t quite make sense.  But it incorporated flexibility into the schema and there is the scope to use in many ways.  Because our time was restricted we didn’t spend too much time analysing how to use it.  Richard & I came up with the BRUCE model in a couple of hours and went with it.  Once the entities were defined in Solr – we found that we had to populate some of the link/class tables with unnecessary duplication of data just to get the interface to work – although this goes against my understanding of how relational data works the indexer was really fast – especially when we used the ‘test data’ – the Brunel data snapshot is quite small so wouldn’t determine speed efficiency.

If this prototype were to be developed into production, we would need to analyse the data structures & mappings in more depth.  We have all the data on-line and automation from the various systems could be developed so the data amalgamation would be seamless.  Time determined that reporting was just touched upon – I used an open-source tool ‘DataVision’ to connect directly to the database and a very simple report was produced. 

Reporting would be a separate component using data extracted from the queries in the interface – pushed into a temp table – would need more development and thought to achieve this.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: