SolrEyes

August 19, 2011

SolrEyes is a basic but effective wrapper around Apache Solr which has been developed by the BRUCE project as a replacement for Blacklight.

As per our previous post, significant problems were had in stabilising Blacklight, so a brief exploratory was carried out in attempting to replicate that functionality which was necessary to the project (not all of the existing functionality was necessary for our needs). Having been successful we went on to introduce support for ranged facets, allowing us to limit by date or any other rangeable field.

Technology

SolrEyes uses the SolrPy Python library to communicate with Apache Solr and the Mako templating language to provide the User Interface. It presents the results of search requests to the user with facet counts and current search parameters alongside the search results themselves. It allows for facets to be added and removed, as well as allowing for sorting and sub-sorting and fully supports flexible paging over result sets.


Note that the data presented in these screen shots is artificial, and should not be considered indicative of anything.

It is strongly inspired by Blacklight, and provides all of the basic search and facet functionality from that system. In addition, the configuration has been done as a JSON document which makes it easier to separate from the application and to modify and extend.

Reporting features

A key difference between SolrEyes and Blacklight is that SolrEyes comes pre-prepared for some reporting features. Every search constraint and every search result field is passed through a processing pipeline which converts it from the value in the Solr index into the display value, and that pipeline is created in configuration by the user.

In the simplest case, this allows us to switch the indexed value M in the gender field for the word Male before it is displayed. This is done by specifying that the facet values for the gender field should be passed to a function called value_map which maps M to Male and F to Female:

“facet_value_functions” : {
    “gender” : {“value_map” : {“M” : “Male”, “F” : “Female”}},
}


This shows a configuration option which exchanges facet values for display values in the gender field.

This approach could also be used, though, to substitute date ranges for descriptions, such as “RAE 2008”, or other useful terms.

At the more powerful end of the spectrum, though, this feature can be used to process result values themselves to present information as functions of those values. An example of the way this is used in BRUCE is as follows:

We wish to present counts of the number of publications that researchers have published in the reporting period. The reporting period can be set by choosing the appropriate date range from the navigation (this constrains the publication_date field to contain values from that range). This means that we cannot index this data in advance, as it is dependent on the exact date range that the user selects, which could be absolutely anything. Instead we pipe the date range selected and a result field which contains the dates on which the user published to a function which can compare those publication dates with the constraint range and return a count of those publications which fall within it. In order to achieve this effect the documents in our index contain a list of the dates upon which the author published.

“dynamic_fields” :{
    “period_publications_count” : {
        “date_range_count” : {
            “bounding_field” : “publication_date”,
            “results_field” : “publication_date”
        }
    }
}


This shows a configuration of a “dynamic field” which presents the count of values in the index field publication_date which fall within the constraining facet publication_date.


This screenshot shows a single record which has been constrained to all publications from a single year (see the green box which displays the constraint). The final 6 result columns contain values which are dynamically generated by comparing the publication dates of the different publication types with that constraint. So, here, S W Burnham is seen to have published 2 items in 1880: 1 Book and 1 Conference Paper.

External Take Up

SolrEyes has proved sufficiently simple to operate and configure while providing useful functionality that it has also had some take-up outside of the project.

The functionality was designed deliberately to be flexible to other use cases (although the reporting use cases were the ones focussed upon by the project team), and as such it has also found use as a front-end for a bibliographic data index.

The Open Bibliography project (which provided the MedLine data that the BRUCE project built the CERIF test data from), the OKF and Cottage Labs are also involved in the development of the BibJSON standard and related BibServer software which powers the under-development BibSoup service. This service is using SolrEyes to operate the search and faceted browse features, and so the software is already getting feedback and enhancements from external developers.

We hope that SolrEyes fulfills a niche for a simple but powerful interface to Apache Solr. Its advantages over Blacklight and VuFind are in the simplicity of the environment and a generic approach to presenting the contents of a search index (both Blacklight and VuFind are more geared towards providing catalogue interfaces).

Using SolrEyes

The SolrEyes software can be downloaded here.

To use SolrEyes successfully, it requires the most recent development version of Solrpy (we found a bug in 0.9.4 and submitted a patch which was accepted, but which has not yet been packaged in a formal release). You can install the latest version with (all on one line):

sudo pip install -e hg+https://solrpy.googlecode.com/hg/#egg=solrpy

You will also need to install web.py and mako which you can do with easy_install:

sudo easy_install web.py
sudo easy_install mako

Next go into the directory where you downloaded SolrEyes and modify the config.json file with your custom configuration (documentation is inline).

Finally, you can start SolrEyes by executing:

python solreyesui.py 8080

This will start SolrEyes on port 8080 on localhost.

We are very interested in taking the development of SolrEyes forward so please contact us if you have any questions, feedback or suggestions.


Switching off the Blacklight

August 10, 2011

At the outset of the project we had planned to use Blacklight as the user interface to Apache Solr through which we would present our reporting interface. This post describes the reasons that we subsequently abandoned this approach and developed an alternative which met our requirements.

The principle issue that we had with using Blacklight was simply due to the instability of the install process. Although Blacklight is a Ruby on Rails application, and should therefore be highly portable, the technical team had significant problems getting it to work across all the relevant platforms. Much of the development work for the project took place on Linux (Ubuntu) and Mac OS X, but the primary deployment environment was to be Windows; as such, portability is very important.

Installation on Ubuntu was difficult, although not impossible, and we blogged a How-To guide which patched some holes in existing online guides. Results on Windows were variable, with issues of dependency version resolution being the primary difficulty (although this was not the only issue, and was also not limited to the Windows install). Installation on Mac OS X proved too error prone to complete at all.

While we anticipate that these installation problems would ultimately be resolvable, they reduced our confidence in Ruby on Rails as a workable environment and also held up progress on the interesting parts of the project!

Another limitation for Blacklight was that ranged faceting was not supported in the default install. Instead there was an experimental add-on available which would have offered this feature. Ranged faceting is a key component for the project as the reporting needs to be limited by date (for example, per academic year or RAE/REF period). Ultimately we decided that – given the difficulties getting started with Blacklight – adopting an experimental add-on would raise the risk of project failure to an unacceptable level (given only 6 months for the whole project).

For these reasons we embarked on a short experiment to explore the difficulty of providing a basic reporting UI from scratch which would meet the project requirements. We found that it took a very small amount of time to develop the basic facet viewing features, and so we continued to introduce ranged searching and a more appropriate report generating interface. Having found that we could provide a more stable application (written in Python) which would provide us with the desired functionality, the project therefore decided to abandon Blacklight and dedicate some development time to our own interface.

It is worth noting that the important features of the reporting approach actually lie in Apache Solr – this does all the hard work in indexing, searching and faceting the content. The User Interface exists purely as a presentation layer, so we do not lose anything by switching from Blacklight to a custom development.

A future post will provide more details about the custom development.


Technical Team Update

June 16, 2011

Objective

At this stage in the project our main objective is to implement a “vertical” slice of the research reporting process, by taking some source data, mapping it into CERIF, storing it in a CERIF compliant database and then indexing that data with Apache Solr for display and interaction via Blacklight, which will ultimately be used to generate reports on the research information. There are a number of challenges involved in this process:

  • How to map the data sources such as HESA, SITS, HR and Publications data into CERIF. In some cases there will be clear mappings, and in other some creativity may be required, and in yet others it may not be possible.
  • How to turn the complex relational schema that is CERIF into a flat, indexable, set of key/value pairs which can be used by Solr and make sense to the user of the reporting software
  • How to configure Solr
  • How to configure Blacklight

Status Update

At the moment we have the following technical outputs from the project:

  • A test CERIF dataset created using the Open Biblio project’s Medline dataset as the seed data
  • A MySQL CERIF schema which was acquired from euroCRIS
  • A theoretical mapping from the datasources to CERIF (not yet implemented)
  • A set of Solr configuration files and data importers which relate the MySQL CERIF database to a set of flat key/value pairs which meet the requirements of the project’s exemplar report. No general configuration has been produced for CERIF yet, as we are focussed on this specific vertical.
  • Some installation and configuration experience with Blacklight. We have done a number of demonstrations of Blacklight to investigate what the final interface will look like, but as yet no realistic data has been presented through it.
  • A high-spec dedicated project server with the capacity for storing and processing the large quantites of data that will be generated throughout has been installed and is ready to start working with the data.

Experiences with CERIF

Overall, mapping data to and from CERIF has not been too troublesome. It is a relational standard, which means that flattening it for Solr has been a bit tricky (more on that later). In addition, it does not always have clear ways of representing the data we want to represent, and it appears that the Semantic Layer is where most of the complexity will ultimately reside.

Experiences with Solr

Solr has been reliable (if complex to configure) throughout the process, and the project team is now comfortable and confident that it meets most if not all of the requirements that will be placed on it.

Experiences with Blacklight

Blacklight has so far been the weak link in the project. It is extremely difficult to install and configure, and no two installations go the same way so a large amount of time has been sunk in trying to make it work at all. It is partly for this reason that the project is not yet displaying the data from Solr in Blacklight.

Flattening CERIF for Solr

As CERIF is a relational format, flattening it for indexing by Solr has been a careful task for the project. We cannot represent all of the data in the CERIF database exactly as it appears in MySQL, since Solr does not strictly have the relational qualities of a database.

Instead we have begun to construct solr documents (effectively these are Object Classes) which are designed to meet the reporting requirements. That is, for our exemplar report (see linked presentation), which is focussed on the individuals, we create Solr documents which have the person as the key entity, and we add to the document extensive information about the organisational units that the person is part of, their publications, and so on.

Later we will construct documents which are designed to meet other reporting requirements, and may therefore be organisation or publication oriented. With a well designed Solr schema, all these different documents will co-exist comfortably side-by-side in the index, and we’ll be able to generate a variety of different kinds of report based on that data.

Next Steps

  • Finalise the datasource mappings to CERIF
  • Harden the CERIF to Solr indexing process based on the final datasource mappings
  • Get Blacklight to behave
  • Generate reports from search results. The the project is looking at Prawn, a rails application which can generate PDFs of the results.

Blacklight, Solr, Ruby and Rails on Ubuntu

April 27, 2011

The BRUCE project is using Apache Solr and Project Blacklight as its core technologies and at least some of the development is going on on Ubuntu. The Blacklight Ubuntu install has a couple of gotchas which aren’t covered by the standard Blacklight documentatio0n and aren’t so well serviced by all of the Rails on Ubuntu blog posts that we read. So here is a very short write up of an approach to Blacklight, Solr, Ruby and Rails on Ubuntu:

First, Some Useful Links

The Blacklight README

The Blacklight Pre-Requisites

How-To for Ruby 1.8 on Ubuntu

Install Ruby

The following commands install everything you need to run ruby (it’s more than is required just for Blacklight):

sudo aptitude install build-essential zlib1g zlib1g-dev libxml2 libxml2-dev libxslt-dev

sudo aptitude install sqlite3 libsqlite3-dev locate git-core libmagick9-dev

sudo aptitude install curl wget

sudo aptitude install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8

sudo aptitude install libruby1.8 libopenssl-ruby rubygems1.8

Ruby does not put everything on the system PATH so, set the following symlinks:

sudo ln -s /usr/bin/rdoc1.8 /usr/bin/rdoc

sudo ln -s /usr/bin/irb1.8 /usr/bin/irb

sudo ln -s /usr/bin/ri1.8 /usr/bin/ri

sudo ln -s /usr/bin/gem1.8 /usr/local/bin/gem

Install Java

Solr depends on java, so if you don’t have a JDK installed already we need to get one of those

sudo aptitude install openjdk-6-jdk

Add some new Ruby Gems sources

We need to add a couple of sites where gem will get the ruby gems from. These are the most useful ones, which will cover us for this install.

gem sources -a http://gems.github.com

gem sources -a http://gems.rubyonrails.org

gem sources -a http://gemcutter.org

Install the required Gems (including Rails)

Now we use gem to install the rest of the ruby dependencies for Blacklight. NOTE that we DO NOT install Rails using aptitude. This would be a disaster and nightmare rolled into one.

sudo gem install rubygems-update

sudo gem install rake nokogiri hpricot builder cheat daemons

sudo gem install json uuid rmagick sqlite3 fastthread rack

Finally install the specific version of rails recommended by the Blacklight documentation. NOTE that this command can take a while to initialise, but it will get there in the end, just be patient.

sudo gem install -v=2.3.11 rails

Put relevant Gems on the PATH

On Ubunutu, the gem installer does not place your executables on the PATH. This is they key step that all Rails on Ubuntu install documents miss out. They are all located in:

/var/lib/gems/1.8/bin/

You can execute them directly from there by using the full path to the binary, but this is no good for automated tasks such as the Blacklight installer which expect the executables to be available. Also, you can’t just add this directory to your user’s PATH variable, as this will not propagate up into the root user’s PATH when sudo is used; Instead, you must symlink the binaries you want to use into an existing PATH location:

sudo ln -s /var/lib/gems/1.8/bin/rails /usr/bin/rails

sudo ln -s /var/lib/gems/1.8/bin/rake /usr/bin/rake

Install Blacklight

Now we can follow the install parts of the Blacklight documentation. This basically comes down to

rails ./blacklight-app -m http://projectblacklight.org/templates/latest.rb

Although it depends on which version of Blacklight you are trying to install. See their README for more detail.

Use the following answers to the questions posed by the installer (or don’t – this is just what we did):

  1. yes
  2. no
  3. yes
  4. yes
  5. no
  6. yes

Some install options may fail; since Blacklight installs Solr for you, but doesn’t start it, it will fail to load the default marc data. If your PATH isn’t set up properly (as described above) it will fail to run all the rake command; this can be fixed by sorting out the PATH (as described above) and running

sudo rake gems:install

rake db:migrate RAILS_ENV=development

Initialising Solr

Start the solr web service in the jetty directory

./blacklight-app/jetty$ java -jar start.jar

To confirm that it is up, visit the URL:

http://localhost:8983/solr/

If you want to install the demo marc data, in the blacklight directory run:

rake solr:marc:index_test_data

Starting Blacklight

Start the ruby server:

./blacklight-app$ ./script/server

to confirm that it is up, visit the URL:

http://localhost:3000/

And that’s it. Blacklight and Solr should now be up and running. Have fun with it!