Archive for seminar

Sentiment analysis seminar

Posted in Uncategorized with tags , , , , , , on 2009 September 9 by Asad Sayeed

I will attempt to blog some of things I attend during the semester.  One of them is a weekly seminar on sentiment analysis taught by Philip Resnik.  I am in it right now. This is therefore a liveblog and hence not guaranteed to make sense or be complete. Especially the latter, far from it.

Tim Hawes’ thesis and conversational analysis

The first thing we’re talking about is Tim Hawes work for his Master’s degree, which he defended just yesterday.  I attended it yesterday before I had even started this blog.  It was about predicting the outcome of Supreme Court (US) cases from transcripts of oral arguments.  This is particularly interesting today, as Philip just mentioned, as Sonia Sotomayor showed up for work today at SCOTUS for the first time.  I proposed by mailing list that one further means of predicting how individual would vote, even if they rarely say anything on the bench (true of some justices) would be their body of writing and argument prior to confirmation.  Philip proposed the use of a mixture model using prior argument, updated as the justice moves through his/her career.

In the case of legal arguments, we have to make some assumptions.  Hawes’ thesis mentioned a couple of textual assumptions: cohesion and coherence.  That is, we assume that there are topical and other elements that evolve through the text in a consistent way.  There are techniques we can use to measure and segment a document based on these kinds of assumptions, such as TextTiling and lexical chaining.

The distinction between cohesion and coherence: the latter is a semantic value that really must be judged by a person—it’s about interpretation.  It is possible to have cohesion without coherence. (We mentioned the word “zeugma”, look it up.)

The right place to look for discourse analysis from an NLP point of view is to start with Rhetorical Structure Theory (RST).  Daniel Marcu has written on this topic.

Hawes’ thesis had two kinds of results.  One of these was encoded in “rose diagrams.”  These are modified pie charts for which each slice varies in radius as though it were a petal of a rose, and each petal is coloured differently in a gradient of shades.  In this representation, we can visualize a large number of things at once.  In the case of SCOTUS, each justice can be represented as a petal, whose colour represents political leaning, radius agreement, and width the number of follow-up turns at questioning.  It’s a bit of a complicated representation and hard to describe without a diagram, which I’m not about to do on the fly.

While this form of visualization is quite complicated by itself, it can be used to make contrasts between types of cases and judicial situations, in which case it produces often very strong and visible contrasts.  Contrasts we can examine are between “liberal” vs. “conservative”, affirm vs. overturn, plaintiff win vs. lose, and so on.   It turns out that by this method, you can predict Clarence Thomas’ vote (who rarely speaks) to a high degree of accuracy.

We didn’t seem to get to the other technique Hawes’ used, but we had to change rooms.

Sentiment analysis

We didn’t end up changing rooms.

This part of the seminar we briefly touch on the basics of sentiment analysis, particular with reference to Bing Liu’s recent review article.  So we begin with a discussion of the general dimension and challenges of sentiment analysis, such as

  • What do people think of _____?
  • What features/facets/topics matter?
  • Mixed and neutral sentiment.
  • The effect of comparatives.
  • How opinions change, and what influences this.
  • Covert opinion/spin vs. overt expressions.
  • The holders of opinion
  • Multilingual and cross-language issues.

We then had a rather wide-ranging discussion of the different kinds of issues, including a detailed discussion on syntactic relationships within sentences that might relate opinion-holders to sentiments to opinion targets/objects. This discussion was so wide-ranging and yet very compressed that it is hard to represent it in a liveblog, but it covers issues that we will revisit in later sessions.

Advertisements