EP2DC Final Progress Post 12/22/2009
![]() Users of EP2DC access the data service via an EPrints repository. We have appended the standard EPrints document workflow to include a data deposit step. It also means that our plugin can just be dropped into any customised EPrints deployment. The aim is to use a workflow that is familiar to users, but simply add extra functionality in a seamless way. ![]() This screenshot above shows the EPrints data deposit page that we have added. Here users can upload an XML data document. This is then sent, via Web Services, to a data centre. In this prototype we are using the Materials Data Centre (www.materialsdatacentre.com) as our data repository. The federated architecture and REST services mean that we can support many-to-many relationships between EPrints and data repositories. ![]() The EP2DC stage for uploading experimental data includes an option (collapsed in the previous figure) that allows metadata associated with the test data to be entered. The fields marked with a red star are mandatory. As shown in the adjacent figure, one of these madatory fields defines the access control. This field affects the data retrieval process, as follows:
![]() The last stage in the EPrints default workflow is to deposit the unit of work (meaning all of the documentation, figures, etc. together with the accompanying data). The data will be deposited in the remote data centre, which is responsible for validating the data against the corresponding XML Schema Definition. If the data sets are validated and deposited successfully, a page similar to that shown in the above figure is displayed. Note this shows all debug information, that will normally be hidden for the user but is included here to illustrate the service calls. ![]() This screenshot above shows the Materials Data Centre (MDC) back-end that we are using. It is a Microsoft SharePoint site that includes document libraries to hold the XML schemas that we validate against (mdcschemas folder), and the data files uploaded through the EPrints interface (mdcdata folder). Users will be able to upload data into MDC directly through a separate web interface. EPrints talks to MDC via a REST interface directly so that the data centre can be made transparent to the end user if they just want to use an EPrints front-end. This federated approach means that many-to-many architecture is supported. i.e. multiple EPrints repositories can talk to multiple data centres. ![]() This screenshot above shows the data retrieval screen from EPrints. When a user selects an article they can immediately see what datasets have been uploaded that relate to that document. This link to the underlying data allows researchers to much more readily access data that promotes data re-use, and should also encourage more citations, subject to the access control defined by the autorh at upload time.
o Project Concept and Design: Tim Austin <T.Austin@soton.ac.uk> o Project Advisors: Kenji Takeda <ktakeda@soton.ac.uk>, Leslie Carr <lac@ecs.soton.ac.uk> o Developers: Mark Scott <C.M.Scott@soton.ac.uk>, Tim Austin <T.Austin@soton.ac.uk>, Dr Steven Johnston <sjj698@zepler.org> o All team members are from the University of Southampton
Software release on Codeplex 12/11/2009
We are delighted to announce that EP2DC is now available for download from Codeplex @ http://ep2dc.codeplex.com/ SharePoint upgrade 11/25/2009
We're using Microsoft Sharepoint for our MDC back-end. This already has great out-of-the-box functionality. The team went to the latest conference in October and were delighted to see that SharePoint 2010 has a built in taxonomy engine. Ontologies here we come. Great Tech 11/25/2009
As with any new project, we love to use the latest and greatest tech to get the job done. EPrints. The core of our document repository work. Written in PERL it's a great platform for development. REST and SOAP/XML. Interoperable distributed systems architecture, SOA all the way. C# and Windows Communciation Foundation (WCF). The MDC back-end is written using .NET. WCF means we are able to simultaneously publish REST and SOAP/XMl interfaces from the same code base. SWORD. We didn't use this, but are planning to include it for the MDC middleware in the very near future. Inside our toolbag 11/25/2009
We've been using a wide variety of tools for teamworking and software development. These are: www.huddle.net - a great Web 2.0 collaborative teamworking site. Very usual given that the team is spread across 4 buildings at Southampton and Tim Austin in Amsterdam. Huddle was started by a Southampton graduate too!!! EPrints. Great repository software - need we say more! Microsoft SharePoint. Here we've used it for the Materials Data Centre back-end. Provides lots of out-of-the-box functionality, and the new version is even better! Visual Studio 2008 and 2010 Beta. The MDC software has been built on .NET, coded in C#. So Visual Studio is the natural IDE of choice. We love it for its productivity benefits. SVN. The University of Southampton centrally hosts our SVN server, meaning we don't have to worry about it. We're using the AnkhSVN plug-in for Visual Studio, so our version control is truly seamless. Skype. Great for video-conferencing, particularly between the UK and Holland. Means we don't have to cancel/postpone meetings when people are out of the office. Works a treat :) XML validator - www.validome.org - used to check out our XML schema validation. MDC uses the built-in XML validator in SharePoint, but we used this tool to double-check the validity of our MatDB schema. SWOT analysis 11/25/2009
Now we're getting to the end of the EP2DC project it's a good time to reflect. So here's a brief SWOT analysis.... STRENGTHS Users, users, users... That's what we're all about! We've been completely focussed on what is useful for our research users, materials scientists in this case. We've consulted with them online, in interviews and meetings. This has been particularly easy as the principal investigator, Philippa Reed, is a materials professor and not an IT person. So she's the one we have to convince with our software! Teamwork. We've had an awesome team on the project. Tim Austin came up with the idea and has been making sure that we've stayed focussed and been doing plenty of EPrints development. Steven Johnston and Mark Scott have been putting together the back-end data centre and web services, using their years of experience. Seb and Tim M-B in the EPrints Services group have delivered in spades, we expect nothing less from these gurus. Philippa Reed and Kenji Takeda have been managing the project and keeping things moving forwards at a real pace. Great people working together, that's what this project has been all about. Cashing in on JISC investments. We've been lucky enough to be able to build on other JISC proejcts, notably EPrints and the Materials Data Centre (MDC). We've also benefited from the extensive experience from the Microsoft Institute for HPC. Interoperability. We love mixing it up! We're using EPrints on Linux and MySQL, talking to SharePoint 2007 on top of SQL Server. It all talks seamlessly over REST and Web Services, SOA really does work. WEAKNESSES Short Timescales. With only a few months to complete this project it has been challenging. Once everyone was up to speed and working together, several members of the team were off at JISC and other conferences. All good stuff, and certainly worthwhile, particularly the Microsoft SharePoint and Tech Ed conferences. While this saved development time, it proved tricky to balance everyone's workloads. Distributed teamwork. The project was carefully split to minimise dependencies, so that work could be carried out by each team member independently. That said, Tim Austin was in Amsterdam, and the rest of the team scattered across the Southampton campus.Projects are always easier when everyone is in the same room! Dependencies. With so many interacting components we had to be careful about dependencies. The SOA approach helped this, but we did rely on the Materials Data Centre to be up. This project started a bit late, so the MDC wasn't ready when we wanted it to be. However, with lots of hours (and late nights by Mark!) we managed to catch up. OPPORTUNITIES Data deluge. This project is really just the tip of the iceberg for managing the data deluge starting to swamp scientists and engineers. By providing linked data, particualrly directly to publications, we can help users manage this brave new world. New models for research. This project hopes to help develop this new mode of research - linked data and publications. It is part of the Fourth Paradigm vision of the late, great Jim Gray. We really are entering a new era in scientific discovery. Extensibility. The generic nature of the implementation means that it can be extended to other disciplines, and not just science and engineering. Protecting research investments. By creating a usable framework for researchers to store and link their data we can help preserve the knowledge gained during years of research. This increases the longevity and usefulness of research, giving better value for money in the long-term, and reducing the danger of duplication by making existing research openly available. THREATS Driving uptake. As with any approach, getting users to take it onboard is a challenge. Making it usable is a key enabler, and this is why we are so focussed on what users want. Still, it is a challenge to drive uptake, but one which we are relishing. Overhead for users. Adding the feature to include data with EPrints uploads is great, but adds additional steps to the process. This may be off=putting for users, as they are busy and just want to get on with their research. We've therefore focussed on making it as painless as possible, and minimise the overhead to users. Federated security. To make the system usable we need to have seamless, single-sign-on. More difficult is the federated security we need to introduce so that the back-end services can talk to each other securely. This is a classic internet federated security problem. While we have not implemented a full, elegant solution, we now know how to do this :) Policy barriers. Technology is one thing, but policies are another. We believe we have implemetned a very usable system. However, organisational policies must be defined before wider uptake, particularly regarding data access and ownership. As time goes on and the idea of sharing data becomes more accepted, we are optimistic that policies will modernise to allow this. MatDB schema 11/10/2009
A review of materials schemas has identified 3 potential candidates for the MDC back-end, MatML @ www.oasis-open.org/committees/materials, EC MatDB @ odin.jrc.ec.europa.eu, and NMC MatDB @ www.nims.go.jp/vamas_twa10. These schemas are at various stages of development, each with their own benefits and limitations. Most promising appears to be the EC MatDB schema, about which you can read more @ www.jstage.jst.go.jp/article/dsj/7/0/7_179/_article. Shibboleth for EPrints- BIG win! 11/10/2009
In order to allow users from different organisations to access our EP2DC system, we have implemented Shibboleth for EPrints. This is a new feature, so thousands of EPrints users worldwide will benefit from this development :):):) Disproportionate feedback 11/05/2009
After the great meeting with David Flanders, we're now implementing our disproportionate feedback to the user. When the user makes a deposit in the repository, instead of a simple 'OK' message, we return a cascade of extra information that they might be interested in. Simliar to the Amazon's "Customer who bought this item also bought" feature. We return other datasets that are related to the one deposited. Hopefully this will delight the user and make them come back for more ;) Project review and burndown 11/02/2009
Our project review meeting today was great in pinning down the final tasks to complete the project. We're finishing off the MDC back-end and refining the front-end with end-user reviews. On the final stretch with the finish line in sight!!! |






RSS Feed