• Title of Primary Project Output: EP2DC
  • Screenshots or diagram of prototype: Here are some screenshots that show the EP2DC data deposit and retrieval screens
Picture
Users of EP2DC access the data service via an EPrints repository. We have appended the standard EPrints document workflow to include a data deposit step.  It also means that our plugin can just be dropped into any customised EPrints deployment. The aim is to use a workflow that is familiar to users, but simply add extra functionality in a seamless way.

Picture
This screenshot above shows the EPrints data deposit page that we have added. Here users can upload an XML data document. This is then sent, via Web Services, to a data centre. In this prototype we are using the Materials Data Centre  (www.materialsdatacentre.com) as our data repository. The federated architecture and REST services mean that we can support many-to-many relationships between EPrints and data repositories.

Picture
The EP2DC stage for uploading experimental data includes an option (collapsed in the previous figure) that allows metadata associated with the test data to be entered. The fields marked with a red star are mandatory. As shown in the adjacent figure, one of these madatory fields defines the access control. This field affects the data retrieval process, as follows:
  • Open—allows data retrieved through the EP2DC EPrints repository to be downloaded by anyone.
  • Restricted to registered users—allows data retrieved from the EP2DC EPrints repository to be downloaded by registered users.
  • On demand—data is supplied on request direct from the owner.

Picture
The last stage in the EPrints default workflow is to deposit the unit of work (meaning all of the documentation, figures, etc. together with the accompanying data). The data will be deposited in the remote data centre, which is responsible for validating the data against the corresponding XML Schema Definition. If the data sets are validated and deposited successfully, a page similar to that shown in the above figure is displayed. Note this shows all debug information, that will normally be hidden for the user but is included here to illustrate the service calls.

Picture
This screenshot above shows the Materials Data Centre (MDC) back-end that we are using. It is a Microsoft SharePoint site that includes document libraries to hold the XML schemas that we validate against (mdcschemas folder), and the data files uploaded through the EPrints interface (mdcdata folder). Users will be able to upload data into MDC directly through a separate web interface. EPrints talks to MDC via a REST interface directly so that the data centre can be made transparent to the end user if they just want to use an EPrints front-end. This federated approach means that many-to-many architecture is supported. i.e. multiple EPrints repositories can talk to multiple data centres.

Picture
This screenshot above shows the data retrieval screen from EPrints. When a user selects an article they can immediately see what datasets have been uploaded that relate to that document. This link to the underlying data allows researchers to much more readily access data that promotes data re-use, and should also encourage more citations, subject to the access control defined by the autorh at upload time.


  • Description of Prototype: The objective is to develop a prototype module to enable the EPrints repository to support the submission of XML-formatted experimental data together with the manuscript to which they correspond. The basic motivations for the work are to promote the conservation of experimental data and to link data to publications. As Semantic technologies become established, the ever-increasing body of experimental data will provide new opportunities for knowledge discovery within and across disciplines.
  • End User of Prototype: EPrints is used worldwide as a repository for scholarly publications. This prototype allows users to, probably for the first time, include datasets in a remote dtata repository that relate to the paper content and link to it. This is a new publishing model for data, as it provides seamless upload and retrieval of data using a federated architecture. This helps to elevate data as a first-class citizen in the open access world, accelerating research and also increasing citation rates - an important incentive for academics given current/future research audit processes. By supporting XML schemas, we hope to improve data quality and consistency and further develop semantic linking capabilities. The federated SOA architecture means that the system can support many-to-many repository/data centre relationships in an internet-scalable way.
  • A typical upload user scenario is:
  1. Scientist writes paperScientist uploads paper to EPrints repository.
  2. Scientist includes dataset (from 1) in EPrints upload, using EP2DC service developed here
  3. Scientist finalises deposit and is shown similar datasets and papers related to her uploaded paper/data. This disproportionate feedback is an incentive for scientist to continue to use the service.
  • A typical user search scenario is:
  1. Scientist searches EPrints repository for paper on topic of interest
  2. User finds paper
  3. User sees dataset is available and downloads it
  4. User uses downloaded dataset in their own research, citing paper found in EPrints

  • Link to working prototype: http://ep2dc-s1.soton.ac.uk - to try the demo you can browse papers and datasets by year or type. To try the upload functionality you need to create an account (simply click 'Create Account' at top of the screen) and follow the instructions.
  • Date prototype was launched: 11 December 2009
  • Project Team Names, Emails and Organisations:
          o Product Owner: Philippa Reed <p.a.reed@soton.ac.uk>
          o Project Concept and Design: Tim Austin <T.Austin@soton.ac.uk>
          o Project Advisors: Kenji Takeda <ktakeda@soton.ac.uk>, Leslie Carr <lac@ecs.soton.ac.uk>
          o Developers: Mark Scott <C.M.Scott@soton.ac.uk>, Tim Austin <T.Austin@soton.ac.uk>, Dr Steven Johnston <sjj698@zepler.org>
          o All team members are from the University of Southampton
  • Table of Content for Project Posts: listed on the right here as categories