Steve Cassidy

The Days Run Away…

Semantic Web Talk

without comments

I gave a talk last night at the MQ Technology Trends seminar series on the semantic web, my slides are here for those who wanted them.

In putting together the talk I think I understand my own view of SW a little better which is that the exciting thing is ubiquitous access to lots of useful data. The question of semantics keeps creeping in and while it’s important to be able to make inferences and use shared vocabularies, I’m not entirely convinced that there is any need for ‘real semantics’ to make interesting applications work. We need shared terminology and schema for sure, but there’s no more semantics there than when I use an XML schema to allow me to interoperate with others.

The big vision of autonomous agents crawling the web exchanging proofs and doing serious stuff for us will obviously face bigger problems than the vertical applications that can use agreed vocabularies. This is more like traditional AI applications and will obviously need some more knowledge representation which can be called semantics. However, I feel that all sorts of interesting things can be done without going this deep. Just look at what has become possible with RSS in the past few years; it’s the ubiquity of the data that makes things possible and without the ubiquitous data it’s hard to imagine the applications.

So, let’s continue the grass roots effort to populate the world with snippets of RDF metadata about anything that seems interesting. Get yourself a FOAF file, publish some RSS, add metadata to your photographs, publish your calendar. Now ponder what to do with all this glorious data!

Written by Steve Cassidy

September 13th, 2004 at 2:00 pm

Posted in RDF

Bloglines Thinks I’m Italian

without comments

So I’ve just started using Bloglines as a blog aggregator and it’s working well thanks to the Firefox plugin that makes subscription and seeing updates easy. If you care you could see my subscriptions. A nice feature is that the site will recommend new feeds to read based on your subscriptions but for some reason it has decided that I’d like to read all kinds of Italian sites! So what is it in my subscriptions that enables this inference? Answers on a postcard please…

Written by Steve Cassidy

September 13th, 2004 at 2:00 pm

Posted in Blogging

Putting Page Numbers in PDF

without comments

One of the annoying things that needs doing when organising a conference is to produce the proceedings. These days that means generating a CDROM filled with PDF files and the main problem with that is adding proper identifiers to the PDF. In the old days of print, every paper had page numbers and consequently we expect to be able to cite page numbers in a conference proceedings. In Australia, the government expects us to submit page numbers for each conference paper as part of the annual audit of research activity. Hence the need to add page numbers to soft-copy PDF versions of papers.

Of course one can probably do this manually with various Adobe products but that doesn’t help lower stress levels close to the conference when you have to insert a new paper and recalculate the page numbers. Hence the search for tools to do this automatically. This note is really just a placeholder for things I find on the road to building the SST2004 proceedings. Perhaps it will be useful to others too.

The general problem is known as PDF merge or PDF stamping, Google will tell you who the big players are.

acl02stamp promises to “Stamp bibliographic citations (including automatically-determined page numbers) onto a sequence of academic papers in PDF format.”. I’ve asked the author for a copy.

The iText Java PDF library provides an API in Java for generating PDF and can do stamping by virtue of being able to read existing PDF files and add them to it’s output.

Update Thanks to Jason Eisner, author of acl02stamp, I now know about the Perl Text::PDF module which provides an PDF generation API for Perl and includes a simple script pdfstamp.plx to achieve exactly what is needed. Jason’s acl02 script wraps this to integrate it with the ACL conference tools for generating tables of contents etc. For reference, the Debian package libtext-pdf-perl is all that’s needed.

Written by Steve Cassidy

June 29th, 2004 at 2:00 pm

Posted in Uncategorized

SpeechBot

without comments

SpeechBot is a is a search engine for audio & video content that is hosted and played from other websites. Recordings are indexed via speech recognition on the audio. A very interesting experiment which seems to work for some queries; the transcripts are obviously noisy but it’s obvious that if a particular term is repeated often in a piece then it will get indexed well. My search for ancient history troy returned a few relevant results but many false hits due to troy being an easily inserted token.

Written by Steve Cassidy

May 24th, 2004 at 2:00 pm

Posted in Speech

Another Giggle User

without comments

So now there are two giggle users (to my knowledge) since I’ve encouraged James to keep a blog of how his research project on Topic segmentation in meetings is going. Tomorow the world..

Written by Steve Cassidy

February 25th, 2004 at 1:00 pm

Posted in Blogging

More RDF Query/Path Stuff

without comments

The RDF query/manipulation proposals are coming out of the
woodwork on www-rdf-rules:

  • XR does RDF extraction from
    XML
  • Rx4RDF is ” a
    specification and reference implementation for querying, transforming
    and updating W3C’s RDF by specifying a deterministic mapping of the RDF
    model to the XML data model defined by XPath.”

Written by Steve Cassidy

November 10th, 2003 at 1:00 pm

Posted in RDF

Political Persuasion

without comments

Here’s an interesting test to while away a few minutes. According to the The Political Compass I’m a Leftist Libertarian, just like Ghandi, Mandela and the Dali Lama and diametrically opposite George W.: surprise surprise!

Written by Steve Cassidy

November 5th, 2003 at 1:00 pm

Posted in Uncategorized

RDF Path Languages

without comments

The world is moving quickly towards defining a path language for RDF and maybe for other more general directed graphs. Here’s a few references:

  • Pondering RDFPath which reviews a few proposals
  • This thread on www-rdf-rules which is where the above reference comes from, contains a bunch of other references and some encouragement.
  • A reference to my Extreme paper noting that IsaViz (an RDF visualisation/authoring tool) is an interesting use case for a graph path language.
  • Simon St Laurent talks about the need to work directly with graphs instead of forcing data into trees.
  • Treehugger is a Saxon extension to allow XPath expressions to be evaluated over RDF data. Another example of pretending the RDF is a tree, as in Normal Walsh’s RDFTwig.
  • RDFT and RDF Templating proposal that includes a NodePath expression language, eg:
    resource()/resource('http://ex.example.com/name')/literal()

Written by Steve Cassidy

October 16th, 2003 at 2:00 pm

Posted in Uncategorized

The problem with being popular…

without comments

While it’s nice being the number two ‘Steve Cassidy’ on Google (above the porn star but below the voiceover artist!) being well indexed can have it’s down side. I recently noticed an odd Google search bringing traffic to CANTCL — the phrase “problem opening zip file” turns up CANTCL as the number one hit! If you check the cached page (valid as of 25 Sept 03) it shows that Google came crawling while CANTCL was having trouble with one of the zip files in the archive. The shame, my bug (now fixed) is preserved forever.

PS. let’s see what kind of traffic the above mention of the ‘other’ Steve Cassidy brings to this site…stay tuned :-)

Written by Steve Cassidy

September 24th, 2003 at 2:00 pm

Posted in Uncategorized

XTMPath — XPath for Topic Maps

without comments

Robert Barta at Bond Uni has a paper on XTMPath, Manipulating Topic Map Data Structures which I should look at a little further. I enjoyed talking with Robert at the AusWeb conference this year, he seems to be one of the few people working on Topic Maps in Australia, unlike in Europe and the US where it seems that they’re taking over the world (at least that’s the impression I got at Extreme Markup.

Then there’s also TMPath which is aiming at a similar place. Better get my skates on if my proposal is to be a contender.

Written by Steve Cassidy

August 26th, 2003 at 2:00 pm

Posted in Uncategorized