Archive for surprisal

Meine Forschung in Deutschland

Posted in Uncategorized with tags , , on 2011 October 7 by Asad Sayeed

In my previous post I talked about the act of moving to Germany, and how it has been so far.  But I suppose some of you will want to know about my work life at the University of the Saarland.

Well, I’m appointed in the department of Computational Linguistics and Phonetics (COLI), but in the way of academia, there are a series of bodies involved in my employment, and their relationships are something I don’t entirely understand yet.  I am apparently part of the Multi-modal Computing and Innovation Cluster of Excellence (MMCI), which is a joint project of a bunch of groups, including the Max Planck Institute and the Deutsche Forschungsinstitut für Künstliche Intelligenz (DFKI, the German Research Institute for Artificial Intelligence).  My supervisor is Vera Demberg, who is a Junior Research Group Leader appointed as part of the MMCI package.

I chose to accept Vera’s offer back in the summer and come here because Vera is very active in bridging the gap between the formal sort of linguistics which is near and dear to my heart (as some of you know), the psycholinguistics that I’ve always wanted to get my fingers into, and the more practical-minded statistical efforts to make systems that represent the world in a robust way, which is where I focused my PhD dissertation work.  So it was a close match of congruent interests.

Right now, we’ve been working on defining our actual research project and goals, which has actually been a lot of fun.  There are plusses and minuses to starting a postdoc when there wasn’t already an active project in place (Vera has other projects less technologically-oriented).   During my graduate career, I had multiple opportunities to define projects both for my thesis and for grant- and internship-driven work, and for the most part it worked out well, so on balance I’m pretty happy to be there at the beginning.  Another advantage is that I don’t have to reverse-engineer someone else’s peculiar code.  Well, for now, at least.

I’ve had some catching up to do on some recent developments in psycholinguistic and representational frameworks, and we’ve particularly been focusing on surprisal-based measures of cognitive load.  Surprisal in computational psycholinguisics has developed a burgeoning literature over the past several years, particularly stemming from foundational work by people like John Hale, now at Cornell.  As an information-theoretic measure, surprisal is one way to bridge the gap between formal representation and statistical robustness which other measures of cognitive load (and statistical modeling) do not do as well.  As long as we have a conditional probability in the denominator, we can find the surprisal at particular points in a string (assuming incremental parsing).  Then it becomes a matter of testing the predictions experimentally.

However, we’ve been looking at opportunities to apply these to various kinds of real-time information retrieval and user interface tasks, particularly with transcribed/ASR speech.  But, and here’s the catch, we want to augment the typically syntax-based surprisal measures with some kind of additional formal semantics, which will then allow domain-dependence in our applications.

We’re hoping not to build an entire experimental and processing pipeline from scratch, so we’ve been casting about for resources and collaborators, as well as looking for students from within the Saarland fold.  I suppose I’m biased, but the current state of psycholinguistics, statistical modeling, and syntactic/semantic formalism is coming finally toward an interesting convergence points where we can start to model natural natural language processing activities, so to speak, so I already see lots of exciting opportunities.  And I’ve only been here a month.