More Annotations

Favourite Annotations

Text

BRENDAN T. O'CONNOR

Brendan T. O'Connor. Associate Professor, College of Information and Computer Sciences. University of Massachusetts Amherst. Email: brenocon@cs.umass.edu Twitter: @brendan642. Room 238, Computer Science Building, 140 Governors Drive, Amherst, MA 01003. I am an associate professor in the College of Information and Computer Sciences at

University

CHEAT SHEET FOR SCALA SYNTAX Cheat sheet for scala syntax anyall.org/scalacheat by brendan Since it's hard to search for scala syntactic constructions, hopefully this page might help. Note FILLMORE 1982 CROPPED ssoxo) JO saA!13aÇpg pue sqaan uo pug Kq aogeogs!qdos pug avo pug la Jappqasn0H) gue!pur san3Ba1100 pt1B Jap10qasnoH pug Kq ssaaq8noaot11 qontu ao pa!ngo pagul a! 1 TANH IS A RESCALED LOGISTIC SIGMOID FUNCTION The tanh function, a.k.a. hyperbolic tangent function, is a rescaling of the logistic sigmoid, such that its outputs range from -1 to 1. (There’s horizontal stretching as well.) It’s easy to show the above leads to the standard definition tanh(x)= ex–e−x ex+e−x t a n h ( x) = e x – e − x e x + e − x . The (-1,+1) output range BE CAREFUL WITH DICTIONARY-BASED TEXT ANALYSIS Be careful with dictionary-based text analysis. Posted on October 5, 2011. OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count BAG OF WHAT? SIMPLE NOUN PHRASE EXTRACTION FOR TEXT ANALYSIS bol, accumulating all - and -delimited spans. Figure 1: Composed rewrite lattice L = I P for input I = (JJ NNP NN). Five spans are retrieved during lattice traversal. LOG-NORMAL AND LOGISTIC-NORMAL TERMINOLOGY The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to the normal distribution. If you draw samples from one, the arrows below show the transformation to make it such you have samples from another. For example, if x ~ Normal, then transforming as y=exp (x) implies y QUESTION ANALYSIS: HOW WATSON READS A CLUE Question analysis: How Watson reads a clue A. Lally J. M. Prager M. C. McCord B. K. Boguraev S. Patwardhan J. Fan P. Fodor J. Chu-Carroll The ﬁrst stage of processing in the IBM Watsoni system is to DATA SCIENCE: AN ACTION PLAN FOR EXPANDING THE TECHNICAL Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics William S. Cleveland Statistics Research, Bell Laboratories, 600 Mounrain Avenue, Murray Hill N507974, USA E-mail: wsc@research.bell-labs.com SEX BIAS IN GRADUATE ADMISSIONS: DATAFROMBERKELEY SexBias in Graduate Admissions: DatafromBerkeley Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. P. J. Bickel, E. A. Hammel, J. W. O'Connell Determining whether discrimination because of sex or ethnic identity is be- ing practiced against persons seeking passage from one social status or locus to another is an important problem in

BRENDAN T. O'CONNOR

University

identifying

TEXTUAL EVIDENCE GATHERING AND ANALYSIS Textual evidence gathering and analysis J. W. Murdock J. Fan A. Lally H. Shima One useful source of evidence for evaluating a candidate answer B. K. Boguraev SUPERFICIAL DATA ANALYSIS false 19.58333 na 2.750000 na true 22.80953 na 2.250000 na true 29.77778 1.833333 1.900000 na AUTOMATIC KNOWLEDGE EXTRACTION FROM DOCUMENTS Automatic knowledge extraction from documents J. Fan A. Kalyanpur D. C. Gondek D. A. Ferrucci Access to a large amount of knowledge is critical for success at SEX BIAS IN GRADUATE ADMISSIONS: DATAFROMBERKELEY SexBias in Graduate Admissions: DatafromBerkeley Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. P. J. Bickel, E. A. Hammel, J. W. O'Connell Determining whether discrimination because of sex or ethnic identity is be- ing practiced against persons seeking passage from one social status or locus to another is an important problem in WE DEVELOPED A RANDOM SENTENCE GENERATOR THAT TRAINEDWEB VIEW Title: We developed a random sentence generator that trained itself on a corpus of natural written language, and then could generate s Author: Academic Computing

BRENDAN T. O'CONNOR

University

FILLMORE 1982 CROPPED ssoxo) JO saA!13aÇpg pue sqaan uo pug Kq aogeogs!qdos pug avo pug la Jappqasn0H) gue!pur san3Ba1100 pt1B Jap10qasnoH pug Kq ssaaq8noaot11 qontu ao pa!ngo pagul a! 1 TANH IS A RESCALED LOGISTIC SIGMOID FUNCTION The tanh function, a.k.a. hyperbolic tangent function, is a rescaling of the logistic sigmoid, such that its outputs range from -1 to 1. (There’s horizontal stretching as well.) It’s easy to show the above leads to the standard definition tanh(x)= ex–e−x ex+e−x t a n h ( x) = e x – e − x e x + e − x . The (-1,+1) output range DEEP PARSING IN WATSON Deep parsing in Watson M. C. McCord J. W. Murdock Two deep parsing components, an English Slot Grammar (ESG) B. K. Boguraev parser and a predicate-argument structure (PAS) builder, provide BE CAREFUL WITH DICTIONARY-BASED TEXT ANALYSIS Be careful with dictionary-based text analysis. Posted on October 5, 2011. OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count LOG-NORMAL AND LOGISTIC-NORMAL TERMINOLOGY The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to the normal distribution. If you draw samples from one, the arrows below show the transformation to make it such you have samples from another. For example, if x ~ Normal, then transforming as y=exp (x) implies y BAG OF WHAT? SIMPLE NOUN PHRASE EXTRACTION FOR TEXT ANALYSIS bol, accumulating all - and -delimited spans. Figure 1: Composed rewrite lattice L = I P for input I = (JJ NNP NN). Five spans are retrieved during lattice traversal. A REGRESSION SLOPE IS A WEIGHTED AVERAGE OF PAIRS’ SLOPES This is a blog on artificial intelligence and "Social Science++", with an emphasis on computation and statistics. My website is brenocon.com. DATA SCIENCE: AN ACTION PLAN FOR EXPANDING THE TECHNICAL Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics William S. Cleveland Statistics Research, Bell Laboratories, 600 Mounrain Avenue, Murray Hill N507974, USA E-mail: wsc@research.bell-labs.com QUESTION ANALYSIS: HOW WATSON READS A CLUE Question analysis: How Watson reads a clue A. Lally J. M. Prager M. C. McCord B. K. Boguraev S. Patwardhan J. Fan P. Fodor J. Chu-Carroll The ﬁrst stage of processing in the IBM Watsoni system is to

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

Brendan T. O'Connor - UMass Amherst, Computer Science. Brendan T. O'Connor , College of Information and Computer Sciences, University of Massachusetts Amherst. DEEP PARSING IN WATSON Deep parsing in Watson M. C. McCord J. W. Murdock Two deep parsing components, an English Slot Grammar (ESG) B. K. Boguraev parser and a predicate-argument structure (PAS) builder, provide QUESTION ANALYSIS: HOW WATSON READS A CLUE Question analysis: How Watson reads a clue A. Lally J. M. Prager M. C. McCord B. K. Boguraev S. Patwardhan J. Fan P. Fodor J. Chu-Carroll The ﬁrst stage of processing in the IBM Watsoni system is to CALCULATING RUNNING VARIANCE IN PYTHON AND C++ could use a python for loop, but it’s slower than having numpy do an array operation because it’s in C :) i guess figuring out how to push array operation into low-level libraries is a step simpler than matrix operations in low-level libraries R2 IS RESCALED MEAN SQUARED ERROR If MSE is as bad as just guessing the mean for everything, then R2 = 0%: about as bad as possible. In fact, on your training data, if you ﬁt a linear regression with a bias term, SPECIAL QUESTIONS AND TECHNIQUES J. CHU-CARROLL Special Questions and techniques J. M. Prager E. W. Brown J. Chu-Carroll Jeopardy!i questions represent a wide variety of question types. The vast majority are Standard Jeopardy!

GAME OUTCOME GRAPHS

I think game theory could benefit immensely from better presentation. Its default presentation is pretty mathematical. This is good because it treats social interactions in an abstract way, highlighting their essential properties, but is bad because it’s hard to understand, especially at first. SIMPSON’S PARADOX IS SO TOTALLY SOLVED My friend Lukas just wrote a great formulation of Simpson’s Paradox as a puzzle:. Against left-handed pitchers, Player A has a higher batting average than Player B. Player A does better against right-handed pitchers also. THE IDENTITY POLITICS OF SATANANIC ZOMBIE ALIEN MAN-BEASTS I thought Eurovision was weird enough already. But in addition to the usual fun mix of kitschy pop and Cold War legacy nationalism in its telephone voting politics, this year will see Finland’s satanic band

Lordi:

LEARNING TO EXTRACT EVENTS FROM KNOWLEDGE BASE REVISIONS Part of the challenge in extending distant supervision to events in broader domains is the reliance of weakly supervised learning methods on redundancy - while many sentences on the web

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

GAME OUTCOME GRAPHS

Lordi:

BRENDAN T. O'CONNOR

Assistant Professor, College of Information and Computer Sciences University of Massachusetts Amherst Email: brenocon@cs.umass.edu Twitter: @brendan642 Room 238, Computer Science Building, 140 Governors Drive, Amherst, MA

01003

------------------------- I am an assistant professor in the College of Information and Computer Sciences at University of Massachusetts Amherst, and affiliated with the Computational Social Science Institute , the Initiative in Cognitive Science , and the Centers for Data Science and Intelligent Information

Retrieval .

LINKS: SLANG Lab , Teaching

, CV , Bio , Talks

, Notes , Misc .

CURRENT:

* Teaching in Spring 2020: COMPSCI 685: Advanced Topics in Natural

Language Processing

.

* Spring 2020 office hours: Wed 9:30-11am. If you're not in CS 685, you're welcome to come to my office hours, but please ask ahead of time in case I had to change it. I AM LOOKING FOR NEW GRADUATE STUDENTS FOR FALL 2020. Projects include statistical measurement from text, extracting subjective knowledge bases from socal media, and inferring international politics from

news.

TALKS ON CURRENT RESEARCH: * Identifying police killings from text with distant supervision * Demographic bias in social media language analysis: a case study of African-American English

PRESS COVERAGE:

* This AI reads the news to keep tabs on US police shootings

,

New Scientist, September 2017. * AI Programs Are Learning to Exclude Some African-American Voices

,

MIT Technology Review, August 2017. OTHER NEWS: I was recently awarded an NSF CAREER award for Social Aggregate Measurement from Text

.

RESEARCH:

What can STATISTICAL TEXT ANALYSIS tell us about SOCIETY? I develop text analysis methods that can help answer social science questions. I'm interested in statistical machine learning and natural language processing, especially when informed by or applied to areas like political science or sociolinguistics. My work often uses text data from news and social media. See also my earlier research statement or publications below. There is a rich set of other faculty at UMass interested in areas from computational social science to natural language processing. See the Computational Social Science Institute (CSSI) website , and this list of computation+language researchers and courses

.

Some of my CURRENT PROJECTS include: * Subjective Knowledge Bases * Class prevalence analysis for inferring social aggregates from text

BACKGROUND:

I joined UMass after receiving my PhD from Carnegie Mellon University's Machine Learning Department . I have also been a Visiting Fellow at Harvard IQSS , and interned with the Facebook Data Science team. Before grad school, I worked on crowdsourced annotations at CrowdFlower / Dolores Labs , and natural language search at Powerset. I started studying the intersection of AI and social science as an undergrad and masters student in Stanford Symbolic Systems (cognitive science).

Link: Full bio .

PUBLICATIONS

(For others, see Google Scholar

or my CV

.)

*

Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts . Jack Merullo, Luke Yeh, Abram Handler ,

Alvin Grissom II

, Brendan

O'Connor, and Mohit Iyyer . Proceedings of EMNLP 2019.

* Data

* Press coverage: The Undefeated website

,

Twitter thread

*

Query-focused Sentence Compression in Linear Time

.

Abram Handler and Brendan O'Connor. Proceedings of EMNLP 2019.

*

Uncertainty-aware generative models for inferring document class

prevalence

.

Katherine Keith and Brendan O'Connor . Proceedings of EMNLP 2018.

* Website

*

Twitter Universal Dependency Parsing for African-American and Mainstream American English . Su Lin Blodgett , Johnny Tian-Zheng Wei, and Brendan O'Connor . Proceedings of ACL

2018.

*

Monte Carlo Syntax Marginals for Exploring and Using Dependency

Parses .

Katherine Keith , Su Lin Blodgett , and Brendan O'Connor Proceedings of NAACL 2018.

*

Relational Summarization for Corpus Analysis

.

Abram Handler and Brendan O'Connor Proceedings of NAACL 2018.

*

Understanding the Representational Power of Neural Retrieval Models

Using NLP Tasks

.

Daniel Cohen , Brendan O'Connor , and W. Bruce Croft . Proceedings of ICTIR 2018.

*

Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation

.

Johnny Tian-Zheng Wei, Khiem Pham, Brian Dillon , and Brendan O'Connor

, BlackboxNLP

workshop at EMNLP 2018 (Analyzing and interpreting neural networks for NLP).

*

A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election. Tao Sun, Daniel Sheldon , and Brendan O'Connor . Proceedings of ICDM 2017.

*

Identifying civilians killed by police with distantly supervised entity-event extraction . Katherine Keith , Abram Handler

, Michael Pinkham

, Cara Magliozzi, Joshua McDuffie, and Brendan O'Connor . Proceedings of EMNLP 2017.

* Website

*

Rookie: A unique approach for exploring news archives

.

Abram Handler and Brendan O'Connor. Workshop on Data Science + Journalism

at KDD 2017.

* Website

*

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

.

Su Lin Blodgett and Brendan O'Connor . Fairness, Accountability, and Transparency in Machine Learning (FAT/ML ) workshop at KDD 2017.

*

A Dataset and Classifier for Recognizing Social Media English

.

Su Lin Blodgett , Johnny Tian-Zheng Wei, and Brendan O'Connor . 3rd Workshop on Noisy User-generated Text (WNUT )

at EMNLP 2017.

BEST PAPER AWARD.

*

Learning to Extract Events from Knowledge Base Revisions

.

Alexander Konovalov

, Benjamin

Strauss , Alan Ritter and Brendan O'Connor . Proceedings of WWW 2017

.

*

Bag of What? Simple Noun Phrase Extraction for Text Analysis

.

Abram Handler , Matthew J. Denny , Hanna Wallach , and Brendan O'Connor . NLP+CSS Workshop at EMNLP

2016 .

* _phrasemachine_ software * Slides (presenation at Text as Data conference, October 2016)

*

Demographic Dialectal Variation in Social Media: A Case Study of African-American English . Su Lin Blodgett , Lisa Green , and Brendan O'Connor . Proceedings of EMNLP 2016.

*

Improving Entity Ranking for Keyword Queries

.

John Foley , Brendan O'Connor

, and James Allan

. Proceedings of CIKM 2016.

* Data

*

Visualizing textual models with in-text and word-as-pixel highlighting

.

Abram Handler , Su Lin Blodgett , and Brendan O'Connor . At WHI 2016 - Workshop on Human Interpretability in Machine Learning (workshop at ICML 2016).

*

Challenges of Visualizing Differentially Private Data

.

Dan Zhang , Michael Hay

, Gerome Miklau

and Brendan O'Connor . At TPDP 2016 - Theory and Practice of Differential Privacy (workshop at

ICML 2016).

*

Posterior calibration and exploratory analysis for natural language

processing models .

Khanh Nguyen and Brendan O'Connor . Proceedings of EMNLP 2015.

*

Diffusion of lexical variation in online social media

.

Jacob Eisenstein , Brendan O'Connor , Noah A. Smith

, and Eric P. Xing

. PLOS-ONE, November 2014. * Also arXiv:1210.5268 ; an earlier version was from Oct. 2012 and poster at NIPS 2012 Workshop on Social Network and Social Media Analysis.

*

Thesis: Statistical Text Analysis for Social Science

.

Brendan O'Connor . PhD Thesis, Carnegie Mellon

University, 2014.

*

MiTextExplorer: Linked brushing and mutual information for exploratory text data analysis . Brendan O'Connor . ACL Workshop on Interactive Language Learning, Visualization, and Interfaces , June 2014. (Proceedings

of ACL 2014.)

* Software website

*

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing

.

Sam Thomson , Brendan O'Connor

, Jeffrey Flanigan

, David Bamman

, Jesse Dodge

, Swabha Swayamdipta

, Nathan Schneider

, Chris Dyer

, and Noah A. Smith

. In SemEval-2014 (Proceedings of the International (COLING) Workshop on Semantic Evaluations, Dublin, Ireland, August 2014).

*

Learning to Extract International Relations from Political Context

.

Brendan O'Connor , Brandon M. Stewart

, and Noah A. Smith

. Proceedings of ACL 2013. * Poster, slides, appendix, software

*

Learning Latent Personae of Film Characters

.

David Bamman , Brendan O'Connor

, and Noah A. Smith

. Proceedings of ACL 2013.

* Data

*

Improved Part-of-Speech Tagging for Online Conversational Text with

Word Clusters

Olutobi Owoputi, Brendan O’Connor , Chris

Dyer , Kevin Gimpel

, Nathan Schneider

and Noah A. Smith

. Proceedings of NAACL 2013

* Software and data

*

ARKref: a rule-based coreference resolution system. Brendan O'Connor and Michael Heilman

. arXiv:1310.1975

, Oct 2013.

*

Learning Frames from Text with an Unsupervised Latent Variable Model

.

Brendan O'Connor . arXiv:1307.7382 , Data Analysis Project report, Machine Learning Department, CMU. July 2013.

*

A framework for (under)specifying dependency syntax without overloading annotators . Nathan Schneider , Brendan O’Connor, Naomi Saphra , David Bamman , Manaal Faruqui

, Noah A. Smith

, Chris Dyer

, and Jason Baldridge . In Linguistic Annotation Workshop

, 2013.

*

Censorship and Deletion Practices in Chinese Social Media

.

David Bamman , Brendan O'Connor

, and Noah A. Smith

. In _First Monday_ 17.3, March 2012.

*

* Press coverage: BBC

, New Scientist

,

etc.

*

Computational Text Analysis for Social Science: Model Assumptions and

Complexity

.

Brendan O'Connor , David Bamman

, and Noah A. Smith

. In _NIPS Workshop on Comptuational Social Science and the Wisdom of Crowds_, Sierra Nevada, Spain,

December 2011.

*

Predicting a Scientific Community's Response to an Article

.

Dani Yogatama , Michael Heilman

, Brendan O'Connor

, Chris Dyer ,

Bryan R. Routledge

, and Noah A. Smith

. In Proceedings of EMNLP 2011

.

*

Part-of-speech tagging for Twitter: Annotation, Features, and

Experiments

.

Kevin Gimpel , Nathan Schneider

, Brendan O'Connor

, Dipanjan Das

, Daniel Mills

, Jacob Eisenstein,

Michael Heilman

, Dani Yogatama

, Jeffrey Flanigan and Noah A. Smith

. In ACL-2011

(short paper).

* Data and software

*

A Mixture Model of Demographic Lexical Variation

.

Brendan O'Connor , Jacob Eisenstein

, Eric P. Xing

, and Noah A. Smith

. In NIPS-2010 Workshop on Machine Learning and Social Computing

.

*

A Latent Variable Model for Geographic Lexical Variation.

Jacob

Eisenstein , Brendan O'Connor

, Noah A. Smith

, and Eric P. Xing

. In Proceedings of EMNLP 2010

(presentation).

* Appendix

* Data

* Press coverage: New York Times

,

All Things Considered

,

BBC , Washington Post

,

Wall Street Journal

,

Associated Press

,

New Scientist

,

San Francisco Chronicle

,

Ars Technica

,

LA Weekly

,

MSNBC

,

etc.

*

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time

Series.

Brendan O'Connor , Ramnath Balasubramanyan , Bryan R. Routledge

, and Noah A. Smith

. In ICWSM-2010

(presentation).

* Video , Slides

* Press coverage: Pittsburgh Tribune-Review

,

Mashable ,

Ars Technica

,

New Scientist

,

CNN Tech , Fast

Company

,

Science Now

,

Economic Times

,

BBC Radio 5

(at 13:00) and others.

*

TweetMotif: Exploratory Search and Topic Summarization for Twitter. Brendan O'Connor , Michel Krieger

, and David Ahn

. In ICWSM-2010

(demo track).

* Demo

*

Superficial Data Analysis: Exploring Millions of Social Stereotypes. Brendan O'Connor and Lukas Biewald

. In Beautiful Data

, ed. Toby Segaran

and Jeff Hammerbacher . O'Reilly Media. 2009.

* Blog post

, Data

, Draft (with color)

*

Cheap and Fast — But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. Rion Snow , Brendan O’Connor, Daniel Jurafsky , and Andrew Y. Ng

More Annotations

Carole Stephens

2019-10-15 03:17:27

Carole Stephens

2019-10-15 03:17:43

Carole Stephens

2019-10-15 03:17:59

Carole Stephens

2019-10-15 03:18:10

Carole Stephens

2019-10-15 03:18:30

Carole Stephens

2019-10-15 03:18:47

Carole Stephens

2019-10-15 03:18:59

Carole Stephens

2019-10-15 03:19:18

Carole Stephens

2019-10-15 03:20:22

Carole Stephens

2019-10-15 03:20:32

Carole Stephens

2019-10-15 03:20:50

Carole Stephens

2019-10-15 03:21:08

Favourite Annotations

Carole Stephens

2020-03-03 14:30:05

Carole Stephens

2020-03-03 14:30:41

Carole Stephens

2020-03-03 14:30:42

Carole Stephens

2020-03-03 14:31:43

Carole Stephens

2020-03-03 14:32:10

Carole Stephens

2020-03-03 14:32:15

Carole Stephens

2020-03-03 14:33:12

Carole Stephens

2020-03-03 14:33:12

Carole Stephens

2020-03-03 14:33:19

Carole Stephens

2020-03-03 14:33:44

Carole Stephens

2020-03-03 14:33:45

Carole Stephens

2020-03-03 14:34:51

Text

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

University

identifying

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

GAME OUTCOME GRAPHS

Lordi:

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

University

BRENDAN T. O'CONNOR

GAME OUTCOME GRAPHS

Lordi:

BRENDAN T. O'CONNOR

01003

Retrieval .

, CV , Bio , Talks

, Notes , Misc .

CURRENT:

Language Processing

.

news.

PRESS COVERAGE: