Steve Cassidy

The Days Run Away…

Author Archive

A RESTful interface to Annotations on the Web

without comments

Annotation data is stored and manipulated in various formats and there have been a number of efforts to build generalised models of annotation to support sharing of data between tools. This work has shown that it is possible to store annotations from many different tools in a single canonical format and allow transformation into other formats as needed. However, moving data between formats is often a matter of importing or exporting from one tool to another. This paper describes a web-based interface to annotation data that makes use of an abstract model of annotation in its internal store but is able to deliver a variety of annotation formats to clients over the web.

Presented at the The 2nd Linguistic Annotation Workshop (The LAW II) at LREC2008, Marrakech.
Download PDF

Written by Steve Cassidy

September 14th, 2008 at 10:22 pm

Posted in Annotation, RDF

Tagged with ,

Sparql Endpoint for Python WSGI

without comments

As part of DADA (and yes, that page is a bit out of date) I wanted to provide a Sparql endpoint to allow experimentation with querying the raw RDF annotation data. So far, we’ve built everything using Redland in Python but it seems there is no exsiting Sparql endpoint implementation for this combination. The Sparql protocol document is long but as far as I can tell the core of the protocol is a simple GET request with an encoded Sparql query, results are returned as raw XML in the special Sparql result format or as RDF/XML if the return type is a graph. This proves to be very easy to implement on top of Redland since it’s query operator returns exactly those result types.

So, I present SparqlEndpoint-0.1, a python module that provides a WSGI conformant implementation of a Sparql Endpoint for Redland. It almost certainly doesn’t implement all of the protocol standard and it can be improved no end, for example by making it independant of the RDF backend it queries (eg. using RDFlib).

I’m not putting up a demo endpoint just yet as I’m having severe performance issues with my development server in combination with Redland. The triple store is growing rapidly to the millions of triples and the result is a huge latency (tens of minutes) to perform some queries. Given some recent discussion on the Redland list I’m wondering whether a jump to one of the RDF specific stores is the thing to do. This would probably mean rewriting my code in Java but based on the Berlin Sparql Benchmark numbers, Sesame and Jena have the kind of performance I need (sub second query response times on 100M triples).

Well, enough of that. If you are interested in SparqlEndpoint please download and take a look. If there is interest I’m happy to share it and host development somewhere accessible.

Written by Steve Cassidy

August 21st, 2008 at 10:07 pm

Posted in Annotation, RDF

An Evaluation of Portfolio Assessment in an Undergraduate Web Technology Unit

without comments

One of the perennial issues that is raised in student surveys is that of effective feedback. As part of our ongoing review of teaching, we identified feedback on assessment as a target area for 2007; this paper describes the evaluation of one strategy for improving this feedback that was implemented as part of an undergraduate unit.

Paper to be presented at the National UniServe Conference 2007, Sydney, Australia. Download PDF.

Written by Steve Cassidy

September 3rd, 2007 at 10:04 am

Version Control for RDF Triple Stores

without comments

RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the need for some kind of version management becomes apparent. While there are many version control systems available for program source code and even for XML data, the use of version control for RDF data is not a widely explored area. This paper examines an existing version control system for program source code, Darcs, which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions of an RDF triple store.

Paper presented at ICSOFT 2007, Barcelona, Spain, July 2007. Download PDF

Written by Steve Cassidy

September 3rd, 2007 at 9:43 am

Screencasts in Teaching Web Technology

with one comment

I’ve been using screencasts again this year in COMP249 (Web Technology) and have settled on a fairly stable way of producing them using Camtasia on Windows. This post is here as a container for the videos that I’ve produced this year so they can have a life outside of COMP249 as that website is updated.

Read the rest of this entry »

Written by Steve Cassidy

June 8th, 2007 at 11:58 pm

Posted in screencast, teaching

PhD Scholarship in Semantic Web Technologies for Annotation

without comments

I have a PhD scholarship available for a project in applying Semantic Web technologies (RDF, Sparql, Annotea) to the Linguistic Annotation problem. Here’s an outline:

Shared collaborative distributed annotation using semantic web technologies.

The Semantic Web augments the current Web with machine-processable information enabling humans and machines to work in cooperation; in our context, we are using it as the basis of a linguistic annotation system that is used by language researchers to annotate language resources. This project will look at the issues raised when we allow many people to collaborate on authoring these annotations and making shared annotations available to a community of researchers. This crosses a number of existing areas of research including the semantic web and social computing, and extends the range of interactions available to researchers over the web.

Of course, as usual there is scope for variation on this theme, if you’re interested in this problem space and want to pursue a PhD in Australia, please get in touch. The scholarship is open to Australians and International students.

Written by Steve Cassidy

March 30th, 2007 at 8:44 am

Posted in Annotation, ProjectIdea

Welcome COMP249

without comments

This is just to welcome any COMP249 (Web Technology) students who might visit following my link from the lecture notes. You’re all welcome to look around at my truly random thoughts.

Read the rest of this entry »

Written by Steve Cassidy

March 1st, 2007 at 4:08 pm

Posted in teaching

The Machine is Us/ing Us

without comments

Here’s an excellent video talking about text, hypertext, touching on the internals of HTML and XML and how Web 2.0 has changed the role of the reader. The web is using us, to tag, classify and label the stuff we write so that we can find it.

Written by Steve Cassidy

February 7th, 2007 at 9:03 pm

Posted in Uncategorized

SCOPE

with one comment

So today I make my TV debut! A few weeks ago a film crew from Channel 10 came to shoot a segment for the CSIRO/Channel 10 kids science show SCOPE. The episode, on sound, airs today at 4pm.

I had great fun making the segment, I’ve never done anything like this before and it’s amazing how much work goes in to producing such a short piece. I can’t wait to see how it turns out.

If you watched the show and are interested in having a look at speech you might want to download one of the programs I used in the show. The WaveSurfer tool (you want to get the Binary release for windows from this page). Wavesurfer will let you record your voice and see the spectrogram patterns like the ones I was looking at on the show (you’ll need a microphone for your computer, a cheap headset will do). To get a good looking display, select “New” from the File menu and choose “Demonstration” when asked what configuration to use. Then press the red record button and speak into the microphone.

Here’s an experiment to try: record yourself saying “hid”, “hod”, “head”, “had”. Look at the spectrogram of each word and see if you can tell the difference. Look particularly for the brigher bands in the display — these are called formants and they’re different for every vowel sound.

Another experiment: record two children and an adult saying the same word, for example “SCOPE”. Can you tell the difference between them? Which looks more similar, the children’s voices or one of the children and the adult?

Please leave a comment if you’ve seen the show!

Written by Steve Cassidy

August 28th, 2006 at 7:23 am

Posted in Speech

Transcribed Podcasts and Audio Books

without comments

John Udell is taggins some of his del.icio.us links to podcasts with transcriptavailable, transcripts have been generated manually. This could be a nice source of data for experiments with information retrieval from podcasts.

Sort of relatedly, I just discovered LibriVox which hosts volunteer recordings of out of copyright literary works (eg. Project Gutenberg books). I sampled War of the Worlds and the quality seems great. Worth a browse.

Written by Steve Cassidy

August 23rd, 2006 at 8:28 pm

Posted in Blogging, Speech