Author Archive
All in the Family
My brothers are way more productive than me when it comes to generating cool websites. Via various routes we’ve all ended up working on the web, Patrick on web design and more recently selling baby gifts, Mike on online music stores and other bits of new media goodness. Me, I just teach it and build sites and tools for relatively small groups of people. Between the three of us we could take over the world.
Speaker Tracking In Meetings
This is a potential project idea for an Honours or Masters student. It might also form the core of a PhD project.
I have an ongoing project looking at processing speech recorded in meeting rooms. There are a number of student projects which could be built around this data. Here are some possible projects suitable for Honours
- Tracking speakers through a series of meetings. Given a series of meetings with a stable but changing group of people, we would like to model the speakers in the meetings and for any meeting decide who is present and mark their speech turns. This would involve working with audio signals and building speaker models as well as working with the results of an existing speech segmentation system.
- Integrating multiple microphone signals to improve speaker segmentation. This will involve quite a bit of low level audio processing so would be suitable for someone who had an interest in numerical algorithms. There is some existing code to build upon so you wouldn’t be starting from scratch.
Annotation - Spoken Word Services
Annotation - Spoken Word Services is another project that is providing web based annotation of audio recordings, this time in a learning environment.
BBC Annotatable Audio
Tom Coates describes a currently internal BBC intitative to have everyone annotate audio content flikr style. This is a very cool application and is like a real world (as in non-academic) application of the kind of shared annotation we want to enable in our eResearch project. It’s probably what the people at Dart would like to enable, or at least that was my understanding of one of their goals.
What’s he building in there?
What's he building in there? What the hell is he building In there? He has subscriptions to those Magazines... He never waves when he goes by He's hiding something from the rest of us... He's all to himself... I think I know why...
The Real Macintosh Agenda…
Did you know…
Take for example Apple Computers, makers of the popular Macintosh line of computers. The real operating system hiding under the newest version of the Macintosh operating system (MacOS X) is called… Darwin! That’s right, new Macs are based on Darwinism! While they currently don’t advertise this fact to consumers, it is well known among the computer elite, who are mostly Atheists and Pagans. Furthermore, the Darwin OS is released under an “Open Source” license, which is just another name for Communism. They try to hide all of this under a facade of shiny, “lickable” buttons, but the truth has finally come out: Apple Computers promote Godless Darwinism and Communism.
Source: Evolutionism Propaganda (website has since been hijacked).
Accept all happiness from me
Listening to Bjork’s Medulla, one track turns out to be a poem by e. e. cummings:
it may not always be so; and i say that if your lips, which i have loved, should touch another's, and your dear strong fingers clutch his heart, as mine in time not far away; if on another's face your sweet hair lay in such a silence as i know, or such great writhing words as, uttering overmuch, stand helplessly before the spirit at bay; if this should be, i say if this should be- you of my heart, send me a little word; that i may go unto him, and take his hands, saying, Accept all happiness from me. Then shall i turn my face, and hear one bird sing terribly afar in the lost lands. -- e.e. cummings
Tcl Matrix Type
I’ve just implemented a matrix object type for Tcl, the sources are available here (matrix0.1-src.zip). The package implements a new object type for Tcl which looks like a nested list of doubles but is stored internally as a 2d matrix. Conversion to and from string values is only done as needed so that matrices can be passed by value efficiently in tcl scripts. Here’s what you can do in Tcl with this package:
package require cmatrix 0.1
# make a big square matrix as a list of lists
set n 1000
for {set i 0} {$i < $n} {incr i} {
set row {}
for {set j 0} {$j < $n} {incr j} {
lappend row [expr $j*$i]
}
lappend matrix $row
}
# add the matrix to itself, triggers conversion to matrix only once
puts sum:[time {set sum [cmatrix add $matrix $matrix]}]
# second time, no conversion needed
puts sum2:[time {set sum [cmatrix add $matrix $matrix]}]
# transpose this matrix
puts trans:[time {set trans [cmatrix transpose $sum]}]
# now get the first row of trans, triggers conversion from matrix
# back to list
puts lindex:[time {set row [lindex $trans 0]}]
Output from the above on my system is:
sum:1184447 microseconds per iteration sum2:109284 microseconds per iteration trans:191660 microseconds per iteration lindex:3736227 microseconds per iteration
Of course there’s probably more to do here. Currently I don’t expose many matrix operations to the tcl level since originally this was meant as a C callable library. More can be added later. Comments are welcome.
Gnowsis and RDF Desktop Systems
Gnowsis is a Semantic Web desktop System which means it aggregates various bits of personal data into an RDF store and provides a browser for the store. It is able to look at MP3-ID3 tags, email (in Outlook or Thunderbird), bookmarks in Firebird and at the file system. It will do full text indexing and provides a web server interface for browsing the store.
Which is weird because that’s just how Giggle is evolving at the moment. I’ve begun to build in an RDF store which in the first instance will just contain the individual weblog posts and their metadata (date, title, etc). The web pages and RSS feeds will then be generated from this store rather than from the tangle of variables in the current implementation. The page generation will still be template based but now the templates include references to the rdf store rather than simple variable references. One outcome of this is less of a need to precompute lots of variables for the templates (eg. the time of the post in minutes), instead letting the templates compute things from the store if they want to. Currently my new templates are coded as procedures which generate text (eg. HTML) and I’m using xmlgen to do the work internally.
Once I have an RDF store I can begin to put other bits of data in there. CANTCL already works on an RDF store populated from the package metadata. I could scan Tcl packages that I’ve written to generate something like my Tcl page. Drop some RDF annotated jpeg images into a directory and I could have a photoblog; scan my MP3 library, read my bookmarks, grok my calendar. The key to making this easy to use, which derives from the original blosxom idea, is to leave the data in it’s original format (or invent a simple text file format) and leverage the file system to build structure into the weblog.
Augmenting Conversations Using Dual Purpose Speech
Augmenting Conversations Using Dual Purpose Speech Kent Lyons, Christopher Skeels, Thad Starner Cornelis M. Snoeck, Benjamin A. Wong, Daniel Ashbrook College of Computing and GVU Center Georgia Institute of Technology.
In this paper, we explore the concept of dual purpose speech: speech that is socially appropriate in the context of a human to human conversation which also provides meaningful input to a computer. We motivate the use of dual purpose speech and explore issues of privacy and technological challenges related to mobile speech recognition. We present three applications that utilize dual purpose speech to assist a user in conversational tasks: the Calendar Navigator Agent, DialogTabs, and Speech Courier. The Calendar Navigator Agent navigates a user s calendar based on socially appropriate speech used while scheduling appointments. DialogTabs allows a user to postpone cognitive processing of conversational material by proving short term capture of transient information. Finally, Speech Courier allows asynchronous delivery of relevant conversational information to a third party.
