Archive for the ‘Uncategorized’ Category
NIST Speaker Recognition Evaluations
Looking at the 2003 NIST SR evaluations, while we’re too late to enter this year there is some useful data available, for example the Automatically Generated word transcripts for some of their training data might be interesting to look at.
In their Rich Transcription track there is an interesting task which might be nice to attempt: meta data extraction. In the last evaluation it was just “Who Spoke When” annotation but they intend to add more target metadata in future rounds. This is quite close to some of our goals for the Meeting Room Project.
Quote from the NIST TREC-9 SDR page:
The results of the TREC-9 2000 SDR evaluation presented at TREC on November 14, 2000 showed that retrieval performance for sites on their own recognizer transcripts was virtually the same as their performance on the human reference transcripts. Therefore, retrieval of excerpts from broadcast news using automatic speech recognition for transcription was deemed to be a solved problem - even with word error rates of 30%.
Gosh!
And there’s more…Transcripts of meeting room data are available which seem to be manual. Meeting content doesn’t seem too exciting :-) but gives some idea of dialogue structure etc. in this kind of data.
OSCOM
An article on Advogato describes the efforts of OSCOM to unify Open Source content management systems. Mentions Twingle -
Twingle, an OSCOM project to build a common CMS authoring tool … it helped incent servers to move WebDAV support up on the priority list. Since Twingle uses Mozilla’s RDF engine, it brought CMS attention to the Semantic Web.
which looks like it might be worth finding out about…
Tim Bray on Good Web Citizenship
Tim Bray talks eloquently about what Apple could do to make their IMS service a good web citizen. Including:
- Don’t invent new URI schemes (Apple uses itms:) since the plumbing of the web doesn’t understand them
- Don’t use text/xml as the mime type for XML. This is something I wasn’t aware of. It seems that text/* is a license for a proxy to transcode the content, say from UTF-8 to US-ASCII. So the answer is to use application/xml instead avoiding the whole mess.
Postscript Apple also invented a new URI scheme (webcal:) for it’s web calendar service, eg you can get the Australian holidays in ical format at webcal://ical.mac.com/ical/Australian32Holidays.ics — here again, plain http also works. So Apple is using URI schemes as mime-type indicators…
JXPath
JXPath - JXPath is a java api for traversing object graphs using an XPath like syntax. The collapse the notion of axis down to only ‘child’ which really becomes ‘anything I can traverse to from here’.
Our generalisations of Xpath go much further though and could achieve what JXPath does relatively easily with the appropriate axis definitions.
It makes an interesting point though that a path like language is a very useful tool for accessing parts of data structures in programs/scripts. This is how they’ve been used in relational databases in the past (ie. making pointer chasing less verbose), so why not use them to locate data in more general data sources.
RDF model vs. Syntax
Don Box’s Spoutlet:
My love affair with RDF began in 1999 when I had to prepare a a tutorial on XML metadata formats for XTech. My RDF love affair was with the Model of RDF, mind you.
This touches on something I’ve been thinking lately about the anti-RDF arguments by Dave Winer and others in the RSS world. To me RDF is simply the triple based metadata model, however serialised. The arguments with the RDF-XML syntax aren’t really arguments with RDF itself. On the other hand I’ve not yet done anything large scale with the RDF model yet, so we’ll see…
Children’s Books
The International Children’s Digital Library Has 200 children’s books scanned for public access. I can’t see it because of the Java requirement though :-(
REST
There is a deep symmetry in being able to GET the same stuff you POST that should be exploited when possible. [Paul Prescod]
PIMs
Open Source Applications Foundation - Vista prototype is another outlook killer, perhaps interesting this time as it’s based on an RDF database underneath and is written in Python/Tkinter. Some nice ideas.
I’m getting annoyed with evolution afer only a few weeks of using it in anger, it’s too unstable and has quirks I can’t get used to. I mostly like the calendar and palm integration and the mail handler seems good but isn’t as polished as i’d like. Perhaps the next release will settle things a little.
Zero Install
Don Park’s Blog say’s the net needs zero install extensible client platforms which .Net and java webstart aren’t. perhaps CANTCL can be something like that, especially along with starpacks we can have a zero install client application which can be extended and updated via
web downloads each of which is relatively small. Why does the Java machinery have to be so damn big?
Overlapping trees in XML
xmlhack: One tree isn’t enough talks about a couple of proposals for encoding overlapping trees.
LMNL is a non XML markup language which allows for overlapping ranges.JITT (Just in time trees) is an XML based system which can derive different trees by parsing the file differntly. They have a page on Overlapping Hierarchies/Concurrent Markup.
This looks like it deserves some more attention.
