Thu, 06 Nov 2008
Computational linguistics blogs
I write here very seldom, but I am at least trying to expand my web reading in CL. Blogs I am reading include: Peter Turney, Hal Duame III and Jason M. Adams. Here's a note of interest from Duame, Using machine learning to answer scientific questions:
There are several issues with trying to answer such questions [of whether or not a particular feature is useful for a task]. One issue is that typically what you're actually looking at is the question: can I figure out a way to turn WordNet into a feature vector that is useful for the task of coreference resolution? This is probably a partial explanation for why our community seems not to like negative results to such questions: maybe you just weren't sufficiently clever in encoding WordNet as features. Or maybe WordNet features are useful but there is some other set of features that's more useful that's just swamping the benefits of WordNet.
My first question is whether we're even going about this the right way. The usual approach is to take some baseline system, add WordNet features, and see if predictive performance goes up (i.e., performance on test data). This seems like a bit of a round-about way to attack the problem. After all, this problem of "does some feature have an influence on the target concept" is a classical and very well studied area of statistic: most people have probably seen ANOVA (analysis of variance), but there are many many more ways to try to address this question. And, importantly, they don't hinge on the notions of predictive performance. (Which almost immediately ties us in to "my system is better than yours.")
posted at: 20:37 | path: /reading | permanent link to this entry
Wed, 14 Jun 2006
Statistics in language studies
At various points in my research I am going to be conducting experiments on
both human and computer competency at various linguistic tasks. Now, I have an
undergraduate mathematics major of indifferent competence. The main thing I can
say about it is that it gives some assurance that I can pick up mathematical
techniques to an undergraduate standard with the help of a sufficiently good
textbook, it doesn't say much about those techniques being stored ready-made in
my mind. And even given that I don't have a statistics background beyond very
elementary high school level probability (the probability of picking a
particular k objects from n total objects given equal chance
of any particular object being chosen, either ordered or unordered) thanks to
foolishly deciding that pure mathematics subjects were more worthy.
So, in order to have a reasonable grasp of the basics of experimental design I
decided to have a look around for books. This is a difficult search:
statistical textbooks range widely in readability, audience, quality and
difficulty. At the moment I've settled on:
Woods, Anthony, Paul Fletcher & Arthur Hughes. 1986, Statistics in Language Studies, Cambridge University Press, Cambridge, United Kingdom.Aside from dating from a time prior (probably just prior) to the assumption that computers knew everything and that one would never need tables of the normal distribution ever again I've found it quite useful in elucidating the basic terms of statistics and sampling from an experimenters' point of view. I suspect I will want to read at least one follow-up about choosing samples from human populations, as they tend to recommend leaving this to a survey statistician and I'm curious. But it's gotten me the first step up the ladder.
posted at: 15:29 | path: /reading | permanent link to this entry
Wed, 07 Jun 2006
Abstract of the week: Turney 2002
I'm in the early stages of my literature review at the moment: the stage where
the mountain of papers left to read seems to be growing exponentially, and they
all blur into one another a bit. (Or at least, at the moment I'm hoping that
this is a common stage, and not just me.) In this moment, a light in the
darkness:
Turney, Peter D. 2002,Specifically, go and look at the abstract of this paper? Isn't that delightful? It's a good, clear summary of the paper. After a week in the darkness, I'm happy to see that today.Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviewsin Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, Pennsylvania, USA. July 8-10, 2002. pp. 417-–424 .
posted at: 11:26 | path: /reading | permanent link to this entry
