.

Saturday, March 9, 2019

Open Domain Event Extraction from Twitter

rough Do main causa Extraction from chirrup Alan Ritter University of majuscule Computer Sci. & Eng. Seattle, WA emailprotected washington. edu Mausam University of Washington Computer Sci. & Eng. Seattle, WA emailprotected washington. edu Oren Etzioni University of Washington Computer Sci. & Eng. Seattle, WA emailprotected washington. edu surface-to-air missile Clark? Decide, Inc. Seattle, WA sclark. emailprotected com ABSTRACT Tweets atomic number 18 the close up-to- duration and inclusive pour of in castingation and commentary on up-to-the-minute grammatical cases, exclusively they ar too fragmented and strident, motivating the need for transcriptions that nominate deplumate, substance and categorise faceful positions.Previous hit on leave offing complex body partd re affordations of even offts has foc hearty functiond king-sizely on intelligence informationwire schoolbook twitters unequalled characteristics rescue new-fangled cont residues and op portunities for splay- champaign answer decline. This intelligence operationprint describes TwiCal the ? rst open-domain detail- declension and miscellany brass for chirp. We evince that ide every(prenominal)y survivaling an open-domain calendar of signi? slang slips from Twitter is indeed feasible. In concomitant, we present a novel get along for discoering definitive resultant categories and disuniteing extracted payoffs found on practicable variable pretences.By leveraging large spates of unlabeled entropy, our advancement achieves a 14% extend in supreme F1 oer a supervised service line. A continuously updating reflexion of our system can be viewed at http//statuscalendar. com Our human actors line technology tools argon forthcoming at http//github. com/aritter/ twitter_nlp. Entity Steve Jobs iPhone GOP Amanda Knox Event Phrase died announcement debate verdict crack 10/6/11 10/4/11 9/7/11 10/3/11 Type goal increaseLaunch PoliticalEvent Trial panel 1 Examples of particulars extracted by TwiCal. vents. Yet the number of crushs affix planetary has upstartly exceeded two-hundred million, m some(prenominal) of which atomic number 18 either special 57, or of limited interest, leading to information oerload. 1 Clearly, we can bene? t from to a greater extent structured representations of effects that atomic number 18 synthesized from individual tweets. Previous practise in termination fall 21, 1, 54, 18, 43, 11, 7 has focused largely on news articles, as historically this genre of text has been the best source of information on current matters.Read withal Twitter Case StudyIn the meantime, genial networking sites such as Facebook and Twitter involve up become an fundamental complementary source of such information. While status messages contain a wealth of effective information, they argon very disorganized motivating the need for automatic extraction, aggregation and smorgasbord. Although in that respect has been much interest in tracking trends or memes in fond media 26, 29, little work has addressed the challenges arising from extracting structured representations of compositors cases from short or intimate texts.Extracting useful structured representations of events from this disorganized head t for each oneer of noisy text is a challenging problem. On the other hand, individual tweets be short and self-collected and ar accordingly non composed of complex discourse structure as is the case for texts containing narratives. In this paper we demonstrate that open-domain event extraction from Twitter is indeed feasible, for grammatical case our nobleest-con? dence extracted futurity events argon 90% accurate as demonstrated in 8.Twitter has several characteristics which present unique challenges and opportunities for the assess of open-domain event extraction. Challenges Twitter users much mention mundane events in their routine lives (such as what they ate fo r lunch) which are to a greater extentover of interest to their immediate hearty network. In contrast, if an event is mentioned in newswire text, it 1 http//blog. twitter. com/2011/06/ 200-million-tweets-per-day. html Categories and Subject Descriptors I. 2. 7 Natural Language bear on Language parsing and understanding H. 2. Database Management Database applications entropy mining General ground algorithmic rules, Experimentation 1. INTRODUCTION Social networking sites such as Facebook and Twitter present the some up-to- get word information and buzz some current ? This work was conducted at the University of Washington Permission to make digital or lumbering copies of all or part of this work for personal or schoolroom use is granted without fee provided that copies are not made or distributed for pro? t or commercial advantage and that copies bear this pick up and the full citation on the ? rst page.To copy otherwise, to republish, to post on servers or to redistribute to leans, requires prior speci? c permission and/or a fee. KDD12, imposing 1216, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 $10. 00. is safe to assume it is of general importance. single tweets are also very terse, lots lacking su? cient context to categorize them into topics of interest (e. g. Sports, Politics, Product rout out etc ). Further because Twitter users can talk about any(prenominal) they choose, it is unclear in advance which inflexible of event casings are al imprint for. eventually, tweets are written in an informal style ca victimisation natural language processing tools designed for edited texts to make out extremely poorly. Opportunities The short and self-contained nature of tweets mode they get very simple discourse and pragmatic structure, issues which still challenge state-of-the-art natural language processing systems. For example in newswire, complex reasoning about dealing between events (e. g. before and later ) is oft required to accurately relate events to worldly expressions 32, 8. The volume of Tweets is also much larger than the volume of news articles, so redundancy of information can be secondhand much easily.To address Twitters noisy style, we follow youthful work on natural language processing in noisy text 46, 31, 19, notation a corpus of Tweets with events, which is because used as develop selective information for sequence-labeling role models to identify event mentions in millions of messages. Because of the terse, some generation mundane, but highly redundant nature of tweets, we were motivated to focus on extracting an aggregate representation of events which provides admissional context for tasks such as event categorization, and also ? lters out mundane events by croping redundancy of information.We propose identifying important events as those whose mentions are beardown(prenominal)ly associated with references to a unique date as in arrogate to dates which are evenl y distributed across the calendar. Twitter users saucers a wide variety of topics, qualification it unclear in advance what position of event typewrites are admit for categorization. To address the diversity of events discussed on Twitter, we introduce a novel approach to discovering important event types and categorizing aggregate events inside a new domain. administer or semi-supervised approaches to event categorization would require ? st designing observation guidelines (including selecting an appropriate set of types to footnote), then annotation a large corpus of events undercoat in Twitter. This approach has several drawbacks, as it is apriori unclear what set of types should be annotated a large touchstone of e? ort would be required to manual(a)ly annotate a corpus of events firearm simultaneously re? ning annotation standards. We propose an approach to open-domain event categorization found on possible variable models that uncovers an appropriate set of types which match the info.The automatically sight types are subsequently inspected to ? lter out any which are incoherent and the rest are annotated with informative labels2 examples of types discovered victimization our approach are attended in ? gure 3. The resulting set of types are then applied to categorize hundreds of millions of extracted events without the use of any manually annotated examples. By leveraging large quantities of unlabeled data, our approach results in a 14% improvement in F1 score over a supervised baseline which uses the alike(p) set of types. Stanford NER T-seg P 0. 62 0. 73 R 0. 5 0. 61 F1 0. 44 0. 67 F1 inc. 52% bow 2 By training on in-domain data, we obtain a 52% improvement in F1 score over the Stanford Named Entity Recognizer at segmenting entities in Tweets 46. 2. establishment OVERVIEW TwiCal extracts a 4-tuple representation of events which includes a named entity, event sound out, calendar date, and event type (see Table 1). This representation was chosen to closely match the way important events are typically mentioned in Twitter. An overview of the various components of our system for extracting events from Twitter is presented in Figure 1.Given a raw stream of tweets, our system extracts named entities in acquaintance with event joints and unambiguous dates which are involved in signi? slope events. initiative the tweets are POS tagged, then named entities and event parlances are extracted, temporal role expressions take rootd, and the extracted events are categorise into types. Finally we measure the strength of familiarity between each named entity and date based on the number of tweets they co- materialize in, in decree to determine whether an event is signi? formalism.NLP tools, such as named entity segmenters and part of speech taggers which were designed to motion edited texts (e. g. news articles) complete very poorly when applied to Twitter text due to its noisy and unique style. To address these i ssues, we implement a named entity tagger and part of speech tagger trained on in-domain Twitter data presented in front work 46. We also develop an event tagger trained on in-domain annotated data as described in 4. 3. NAMED ENTITY SEGMENTATION NLP tools, such as named entity segmenters and part of speech taggers which were designed to process edited texts (e. g. ews articles) perform very poorly when applied to Twitter text due to its noisy and unique style. For instance, capitalization is a key feature for named entity extraction within news, but this feature is highly unreliable in tweets run-in are often capitalized solely for emphasis, and named entities are often left all lowercase. In addition, tweets contain a higher proportion of out-ofvocabulary words, due to Twitters 140 character limit and the creative spelling of its users. To address these issues, we hold a named entity tagger trained on in-domain Twitter data presented in previous work 46. Training on tweets vast ly improves mental process at segmenting Named Entities. For example, performance compared against the state-of-the-art news-trained Stanford Named Entity Recognizer 17 is presented in Table 2. Our system obtains a 52% increase in F1 score over the Stanford tabloidger at segmenting named entities. 4. EXTRACTING EVENT MENTIONS This annotation and ? ltering takes borderline e? ort. one of the authors spent roughly 30 minutes inspecting and annotating the automatically discovered event types. 2 In order to extract event mentions from Twitters noisy text, we ? st annotate a corpus of tweets, which is then 3 Available at http//github. com/aritter/twitter_nlp. Temporal Resolution S M T W T F S Tweets POS Tag NER Signi? cance Ranking Calendar Entries Event Tagger Event Classi? cation Figure 1 Processing pipeline for extracting events from Twitter. New components developed as part of this work are shaded in grey. used to train sequence models to extract events. While we drill an establ ished approach to sequence-labeling tasks in noisy text 46, 31, 19, this is the ? rst work to extract eventreferring phrases in Twitter.Event phrases can consist of many di? erent parts of speech as illustrated in the following examples Verbs Apple to Announce iPhone 5 on October quaternary? YES Nouns iPhone 5 announcement coming Oct 4th Adjectives WOOOHOO NEW IPHONE TODAY crumbT WAIT These phrases provide important context, for example extracting the entity, Steve Jobs and the event phrase died in connection with October 5th, is much more informative than simply extracting Steve Jobs. In addition, event mentions are helpful in upstream tasks such as categorizing events into types, as described in 6.In order to build a tagger for recognizing events, we annotated 1,000 tweets (19,484 tokens) with event phrases, following annotation guidelines similar to those developed for the Event tags in Timebank 43. We grapple the problem of recognizing event triggers as a sequence labeling task, victimisation conditional Random Fields for development and inference 24. Linear Chain CRFs model dependencies between the predicted labels of adjacent words, which is bene? cial for extracting multi-word event phrases.We use contextual, dictionary, and orthographic features, and also include features based on our Twitter-tuned POS tagger 46, and dictionaries of event terms ga in that locationd from WordNet by Sauri et al. 50. The preciseness and forswear at segmenting event phrases are report in Table 3. Our classi? er, TwiCal-Event, obtains an F-score of 0. 64. To demonstrate the need for in-domain training data, we compare against a baseline of training our system on the Timebank corpus. preciseness 0. 56 0. 48 0. 24 riposte 0. 74 0. 70 0. 11 F1 0. 64 0. 57 0. 15 TwiCal-Event No POS TimebankTable 3 clearcutness and disengage at event phrase extraction. two results are reported using 4-fold cross validation over the 1,000 manually annotated tweets (about 19K tokens) . We compare against a system which doesnt make use of features generated based on our Twitter trained POS Tagger, in addition to a system trained on the Timebank corpus which uses the said(prenominal) set of features. as input a reference date, some text, and parts of speech (from our Twitter-trained POS tagger) and marks temporal expressions with unambiguous calendar references. Although this mostly rule-based system was designed for use on newswire text, we ? d its precision on Tweets (94% estimated over as savor of 268 extractions) is su? ciently high to be useful for our purposes. TempExs high precision on Tweets can be explained by the fact that some temporal expressions are relatively unambiguous. Although there appears to be room for improving the unsay of temporal extraction on Twitter by handling noisy temporal expressions (for example see Ritter et. al. 46 for a list of over 50 spelling variations on the word tomorrow), we leave adapting temporal extraction to Twitte r as potential future work. . CLASSIFICATION OF EVENT TYPES To categorize the extracted events into types we propose an approach based on latent variable models which infers an appropriate set of event types to match our data, and also classi? es events into types by leveraging large amounts of unlabeled data. oversee or semi-supervised classi? cation of event categories is problematic for a number of reasons. First, it is a priori unclear which categories are appropriate for Twitter. Secondly, a large amount of manual e? ort is required to annotate tweets with event types.Third, the set of important categories (and entities) is in all likelihood to shift over time, or within a focused user demographic. Finally many important categories are relatively infrequent, so even a large annotated dataset may contain just a few examples of these categories, making classi? cation di? cult. For these reasons we were motivated to investigate un- 5. EXTRACTING AND RESOLVING TEMPORAL EXPRESSIONS In addition to extracting events and related named entities, we also need to extract when they occur. In general there are many di? rent ways users can refer to the same calendar date, for example next Friday, August 12th, tomorrow or yesterday could all refer to the same day, depending on when the tweet was written. To resolve temporal expressions we make use of TempEx 33, which takes Sports company TV Politics glory Music film Food Concert dischargeance Fitness discourse Product expel Meeting Fashion Finance School phonograph album unfreeze Religion 7. 45% 3. 66% 3. 04% 2. 92% 2. 38% 1. 96% 1. 92% 1. 87% 1. 53% 1. 42% 1. 11% 1. 01% 0. 95% 0. 88% 0. 87% 0. 85% 0. 85% 0. 78% 0. 71% Con? ct Prize Legal Death bargain VideoGame introduce Graduation Racing Fundraiser/Drive Exhibit Celebration Books Film fountain/Closing Wedding Holiday Medical Wrestling OTHER 0. 69% 0. 68% 0. 67% 0. 66% 0. 66% 0. 65% 0. 63% 0. 61% 0. 60% 0. 60% 0. 60% 0. 58% 0. 50% 0. 49% 0. 46% 0. 45% 0. 42% 0. 41% 53. 45% Label Sports Concert Perform TV Movie Sports Politics Figure 2 Complete list of automatically discovered event types with percentage of data covered. Interpretable types representing signi? cant events cover roughly half of the data. supervised approaches that result automatically brace event types which match the data.We adopt an approach based on latent variable models inspired by recent work on clay sculpture selectional preferences 47, 39, 22, 52, 48, and unsupervised information extraction 4, 55, 7. Each event indicator phrase in our data, e, is modeled as a mixture of types. For example the event phrase cheered might appear as part of either a PoliticalEvent, or a SportsEvent. Each type mark offs to a distribution over named entities n involved in speci? c instances of the type, in addition to a distribution over dates d on which events of the type occur. Including calendar dates in our model has the e? ct of encouraging (though not requiring) events which occ ur on the same date to be assigned the same type. This is helpful in guiding inference, because apparent references to the same event should also take up the same type. The generative story for our data is based on LinkLDA 15, and is presented as Algorithm 1. This approach has the advantage that information about an event phrases type distribution is shared across its mentions, while equivocalness is also naturally preserved. In addition, because the approach is based on generative a probabilistic model, it is straightforward to perform many di? rent probabilistic queries about the data. This is useful for example when categorizing aggregate events. For inference we use collapsed Gibbs ingest 20 where each hidden variable, zi , is sampled in turn, and parameters are integrated out. Example types are vauntinged in Figure 3. To estimate the distribution over types for a stipulation event, a sample of the corresponding hidden variables is taken from the Gibbs markov chain after s u? cient burn in. Prediction for new data is performed using a blow approach to inference 56. TV Product MeetingTop 5 Event Phrases tailgate scrimmage tailgating homecoming regular season concert presale performs concerts tickets matinee musical priscilla visual perception wicked new season season ? nale ? nished season incidents new episode watch love dialogue theme inception hall discharge movie inning innings pitched homered homer presidential debate osama presidential candidate republican debate debate performance network news broadcast airing primetime drama channel stream unveils unveiled announces launches wraps o? instals trading hall mtg zoning brie? g stocks tumbled trading report loose higher tumbles maths english trial exam revise physical science in stores album out debut album drops on hits stores voted o? idol scotty idol season dividendpaying sermon advocateing preached worship preach declared war war shelling o pened ? re weakened senate legislation repeal budget election victors lotto results enter winner contest bail plea murder trial sentenced plea convicted ? lm festival screening starring ? lm gosling live forever passed outside(a) sad news condolences burried add into 50% o? up exaltation save up donate tornado recess disaster relief donated raise money Top 5 Entities espn ncaa tigers eagles varsity taylor swift toronto britney spears rihanna contestation shrek les mis lee evans wicked broadway jersey shore true blood mirth dvr hbo net? ix black swan insidious tron scott pilgrim mlb red sox yankees match dl obama president obama gop cnn america nbc espn abc fox mtv apple google microsoft uk sony township hall city hall club commerce white contribute reuters new york u. . china euro english maths german bio twitter itunes ep uk virago cd lady gaga american idol america beyonce gloating church jesus p astor faith god libya afghanistan syria syria nato senate house relative obama gop ipad award facebook good luck winners casey anthony solicit india new delhi supreme court hollywood nyc la los angeles new york michael jackson afghanistan john lennon young calmness groupon early bird facebook etsy etsy japan red cross joplin june africaFinance School Album TV Religion Con? ict Politics Prize Legal Movie Death Sale Drive 6. 1 Evaluation To evaluate the tycoon of our model to classify signi? cant events, we gathered 65 million extracted events of the form Figure 3 Example event types discovered by our model. For each type t, we list the top 5 entities which have highest probability effrontery t, and the 5 event phrases which assign highest probability to t. Algorithm 1 productive story for our data involving event types as hidden variables.Bayesian Inference techniques are applied to invert the generative process and infer an appropriate set of types to describe the observed events. for each event type t = 1 . . . T do n Generate ? t consort to cruciate Dirichlet distribution Dir(? n ). d Generate ? t according to bilaterally symmetrical Dirichlet distribution Dir(? d ). end for for each unique event phrase e = 1 . . . E do Generate ? e according to Dirichlet distribution Dir(? ). for each entity which co-occurs with e, i = 1 . . . Ne do n Generate ze,i from Multinomial(? e ). Generate the entity ne,i from Multinomial(? n ). e,i TwiCal-Classify Supervised Baseline Precision 0. 85 0. 61 Recall 0. 55 0. 57 F1 0. 67 0. 59 Table 4 Precision and recall of event type categorization at the point of maximum F1 score. d,i end for end for 0. 6 end for for each date which co-occurs with e, i = 1 . . . Nd do d Generate ze,i from Multinomial(? e ). Generate the date de,i from Multinomial(? zn ). Precision 0. 8 1. 0 listed in Figure 1 (not including the type). We then ran Gibbs Sampling with one hundred types for 1,000 iterations of burni n, keeping the hidden variable assignments found in the last sample. One of the authors manually inspected the resulting types and assigned them labels such as Sports, Politics, MusicRelease and so on, based on their distribution over entities, and the event words which assign highest probability to that type. bug out of the 100 types, we found 52 to correspond to coherent event types which referred to signi? cant events5 the other types were either incoherent, or covered types of events which are not of general interest, for example there was a cluster of phrases such as applied, call, contact, job interview, etc hich correspond to users discussing events related to searching for a job. Such event types which do not correspond to signi? cant events of general interest were simply marked as OTHER. A complete list of labels used to annotate the automatically discovered event types along with the coverage of each type is listed in ? gure 2. Note that this assignment of labels to type s only needs to be through once and produces a labeling for an arbitrarily large number of event instances. additionally the same set of types can easily be used to lassify new event instances using streaming inference techniques 56. One interesting worry for future work is automatic labeling and coherence evaluation of automatically discovered event types analogous to recent work on topic models 38, 25. In order to evaluate the ability of our model to classify aggregate events, we sort out together all (entity,date) pairs which occur 20 or more times the data, then annotated the 500 with highest association (see 7) using the event types discovered by our model. To help demonstrate the bene? s of leveraging large quantities of unlabeled data for event classi? cation, we compare against a supervised Maximum Entropy baseline which makes use of the 500 annotated events using 10-fold cross validation. For features, we treat the set of event phrases To scale up to larger datasets, we performed inference in parallel on 40 cores using an approximation to the Gibbs Sampling procedure analogous to that presented by Newmann et. al. 37. 5 After labeling some types were combined resulting in 37 distinct labels. 4 0. 4 Supervised Baseline TwiCal? Classify 0. 0 0. 2 0. 4 Recall 0. 0. 8 Figure 4 types. Precision and recall predicting event that co-occur with each (entity, date) pair as a bag-of-words, and also include the associated entity. Because many event categories are infrequent, there are often few or no training examples for a category, leading to low performance. Figure 4 compares the performance of our unsupervised approach to the supervised baseline, via a precision-recall curve obtained by varying the threshold on the probability of the most probably type. In addition table 4 compares precision and recall at the point of maximum F-score.Our unsupervised approach to event categorization achieves a 14% increase in maximum F1 score over the supervised baseline. Figure 5 plots the maximum F1 score as the amount of training data used by the baseline is varied. It seems likely that with more data, performance will reach that of our approach which does not make use of any annotated events, however our approach both automatically discovers an appropriate set of event types and provides an initial classi? er with minimal e? ort, making it useful as a ? rst step in situations where annotated data is not straightaway available. . RANKING EVENTS Simply using frequency to determine which events are signi? cant is insu? cient, because many tweets refer to common events in users daily lives. As an example, users often mention what they are eating for lunch, therefore entities such as McDonalds occur relatively much in association with references to most calendar days. Important events can be distinguished as those which have strong association with a unique date as oppose to being spread evenly across days on the calendar. To extract signi? ant eve nts of general interest from Twitter, we thus need some way to measure the strength of association between an entity and a date. In order to measure the association strength between an 0. 8 0. 2 Supervised Baseline TwiCal? Classify 100 200 300 400 tweets. We then added the extracted triples to the dataset used for inferring event types described in 6, and performed 50 iterations of Gibbs sampling for predicting event types on the new data, holding the hidden variables in the original data constant. This streaming approach to inference is similar to that presented by Yao et al. 56. We then class-conscious the extracted events as described in 7, and randomly sampled 50 events from the top graded 100, 500, and 1,000. We annotated the events with 4 separate criteria 1. Is there a signi? cant event involving the extracted entity which will take place on the extracted date? 2. Is the most frequently extracted event phrase informative? 3. Is the events type represently classi? ed? 4. Ar e each of (1-3) correct? That is, does the event contain a correct entity, date, event phrase, and type? Note that if (1) is marked as incorrect for a speci? event, subsequent criteria are always marked incorrect. Max F1 0. 4 0. 6 Training Examples Figure 5 Maximum F1 score of the supervised baseline as the amount of training data is varied. entity and a speci? c date, we utilize the G log likelihood ratio statistic. G2 has been argued to be more appropriate for text analysis tasks than ? 2 12. Although Fishers Exact test would produce more accurate p-values 34, given the amount of data with which we are working (sample size greater than 1011 ), it proves di? cult to compute Fishers Exact Test Statistic, which results in ? ating point over? ow even when using 64-bit operations. The G2 test works su? ciently well in our setting, however, as computing association between entities and dates produces less sparse contingency tables than when working with pairs of entities (or words). Th e G2 test is based on the likelihood ratio between a model in which the entity is conditioned on the date, and a model of independence between entities and date references. For a given entity e and date d this statistic can be computed as follows G2 = x? e,e,y? d,d 2 8. 2 BaselineTo demonstrate the importance of natural language processing and information extraction techniques in extracting informative events, we compare against a simple baseline which does not make use of the Ritter et. al. named entity recognizer or our event recognizer instead, it considers all 1-4 grams in each tweet as candidate calendar entries, relying on the G2 test to ? lter out phrases which have low association with each date. 8. 3 Results The results of the evaluation are displayed in table 5. The table shows the precision of the systems at di? rent yield levels (number of aggregate events). These are obtained by varying the thresholds in the G2 statistic. Note that the baseline is only comparable to the third column, i. e. , the precision of (entity, date) pairs, since the baseline is not performing event identi? cation and classi? cation. Although in some cases ngrams do correspond to informative calendar entries, the precision of the ngram baseline is extremely low compared with our system. In many cases the ngrams dont correspond to salient entities related to events they often consist of single words which are di? ult to interpret, for example break which is part of the movie dark Breaking Dawn released on November 18. Although the word Breaking has a strong association with November 18, by itself it is not very informative to present to a user. 7 Our high-con? dence calendar entries are surprisingly high quality. If we limit the data to the 100 highest ranked calendar entries over a two-week date dress in the future, the precision of extracted (entity, date) pairs is quite good (90%) an 80% increase over the ngram baseline.As expected precision drops as more calendar entr ies are displayed, but 7 In addition, we notice that the ngram baseline tends to produce many near-duplicate calendar entries, for example Twilight Breaking, Breaking Dawn, and Twilight Breaking Dawn. While each of these entries was annotated as correct, it would be problematic to show this many entries describing the same event to a user. Ox,y ? ln Ox,y Ex,y Where Oe,d is the observed fraction of tweets containing both e and d, Oe,d is the observed fraction of tweets containing e, but not d, and so on.Similarly Ee,d is the expected fraction of tweets containing both e and d assuming a model of independence. 8. EXPERIMENTS To estimate the quality of the calendar entries generated using our approach we manually evaluated a sample of the top 100, 500 and 1,000 calendar entries occurring within a 2-week future window of November 3rd. 8. 1 Data For evaluation purposes, we gathered roughly the 100 million most recent tweets on November 3rd 2011 (collected using the Twitter Streaming API6 , and tracking a broad set of temporal keywords, including today, tomorrow, names of weekdays, months, etc. ).We extracted named entities in addition to event phrases, and temporal expressions from the text of each of the 100M 6 https//dev. twitter. com/docs/streaming-api Mon Nov 7 Justin meet former(a) Motorola Pro+ kick Product Release Nook Color 2 launch Product Release Eid-ul-Azha celebrated Performance MW3 midnight release former(a) Tue Nov 8 Paris love otherwise iPhone holding Product Release Election daylight vote Political Event Blue slue Park listening Music Release Hedley album Music Release Wed Nov 9 EAS test otherwise The Feds cut o? separate Toca Rivera promoted Performance Alert System test Other Max Day give OtherNovember 2011 Thu Nov 10 Fri Nov 11 Robert Pattinson iPhone show debut Performance Product Release James Murdoch Remembrance Day give evidence open Other Performance RTL-TVI France post play TV Event Other Gotti Live Veterans Day work closed Other Other Bambi Awards Skyrim perform arrives Performance Product Release Sat Nov 12 Sydney perform Other Pullman Ballroom promoted Other Fox ? ght Other Plaza party Party Red Carpet invited Party Sun Nov 13 Playstation answers Product Release Samsung Galaxy Tab launch Product Release Sony answers Product Release Chibi Chibi Burger other Jiexpo Kemayoran promoted TV EventFigure 6 Example future calendar entries extracted by our system for the week of November 7th. Data was collected up to November 5th. For each day, we list the top 5 events including the entity, event phrase, and event type. While there are several errors, the majority of calendar entries are informative, for example the Muslim vacation eid-ul-azha, the release of several videogames Modern Warfare 3 (MW3) and Skyrim, in addition to the release of the new playstation 3D display on Nov 13th, and the new iPhone 4S in Hong Kong on Nov 11th. calendar entries 100 500 1,000 ngram baseline 0. 50 0. 6 0. 44 entity + date 0. 90 0. 66 0. 52 precision event phrase event 0. 86 0. 56 0. 42 type 0. 72 0. 54 0. 40 entity + date + event + type 0. 70 0. 42 0. 32 Table 5 Evaluation of precision at di? erent recall levels (generated by varying the threshold of the G2 statistic). We evaluate the top 100, 500 and 1,000 (entity, date) pairs. In addition we evaluate the precision of the most frequently extracted event phrase, and the predicted event type in association with these calendar entries. Also listed is the fraction of cases where all predictions (entity + date + event + type) are correct.We also compare against the precision of a simple ngram baseline which does not make use of our NLP tools. Note that the ngram baseline is only comparable to the entity+date precision (column 3) since it does not include event phrases or types. remains high enough to display to users (in a ranked list). In addition to being less likely to come from extraction errors, highly ranked entity/date pairs are more likely to relate to popu lar or important events, and are therefore of greater interest to users. In addition we present a sample of extracted future events on a calendar in ? ure 6 in order to give an example of how they might be presented to a user. We present the top 5 entities associated with each date, in addition to the most frequently extracted event phrase, and highest probability event type. 9. RELATED WORK While we are the ? rst to study open domain event extraction within Twitter, there are two key related strands of research extracting speci? c types of events from Twitter, and extracting open-domain events from news 43. Recently there has been much interest in information extraction and event identi? cation within Twitter. Benson et al. 5 use distant supervision to train a intercourse extractor which identi? es artists and venues mentioned within tweets of users who list their location as New York City. Sakaki et al. 49 train a classi? er to recognize tweets reporting earthquakes in Japan they demonstrate their system is capable of recognizing almost all earthquakes reported by the Japan Meteorological Agency. Additionally there is recent work on detecting events or tracking topics 29 in Twitter which does not extract structured representations, but has the advantage that it is not limited to a narrow domain. Petrovi? t al. investigate a streaming approach to identic fying Tweets which are the ? rst to report a breaking news story using topically Sensitive Hash Functions 40. Becker et al. 3, Popescu et al. 42, 41 and Lin et al. 28 investigate discovering clusters of related words or tweets which correspond to events in progress. In contrast to previous work on Twitter event identi? cation, our approach is independent of event type or domain and is thus more widely applicable. Additionally, our work focuses on extracting a calendar of events (including those occurring in the future), extract- . 4 Error Analysis We found 2 main causes for why entity/date pairs were newsle ss for display on a calendar, which occur in roughly equal proportion Segmentation Errors Some extracted entities or ngrams dont correspond to named entities or are generally uninformative because they are mis-segmented. Examples include RSVP, Breaking and Yikes. Weak Association between Entity and betrothal In some cases, entities are properly segmented, but are uninformative because they are not strongly associated with a speci? c event on the associated date, or are involved in many di? rent events which detect to occur on that day. Examples include locations such as New York, and frequently mentioned entities, such as Twitter. ing event-referring expressions and categorizing events into types. Also relevant is work on identifying events 23, 10, 6, and extracting timelines 30 from news articles. 8 Twitter status messages present both unique challenges and opportunities when compared with news articles. Twitters noisy text presents serious challenges for NLP tools. On the other hand, it contains a higher proportion of references to present and future dates.Tweets do not require complex reasoning about traffic between events in order to place them on a timeline as is typically necessary in long texts containing narratives 51. Additionally, unlike News, Tweets often discus mundane events which are not of general interest, so it is crucial to exploit redundancy of information to assess whether an event is signi? cant. Previous work on open-domain information extraction 2, 53, 16 has mostly focused on extracting dealing (as opposed to events) from web corpora and has also extracted tellings based on verbs.In contrast, this work extracts events, using tools adapted to Twitters noisy text, and extracts event phrases which are often adjectives or nouns, for example Super Bowl Party on Feb 5th. Finally we note that there has recently been increasing interest in applying NLP techniques to short informal messages such as those found on Twitter. For example, recen t work has explored Part of Speech tagging 19, geographical variation in language found on Twitter 13, 14, modeling informal conversations 44, 45, 9, and also applying NLP techniques to help crisis workers with the ? ood of information following natural disasters 35, 27, 36. 1. ACKNOWLEDGEMENTS The authors would like to thank Luke Zettlemoyer and the anon. reviewers for helpful feedback on a previous draft. This research was supported in part by NSF grant IIS-0803481 and ONR grant N00014-08-1-0431 and carried out at the University of Washingtons Turing Center. 12. REFERENCES 1 J. Allan, R. Papka, and V. Lavrenko. On-line new event signal detection and tracking. In SIGIR, 1998. 2 M. Banko, M. J. Cafarella, S. Soderl, M. Broadhead, and O. Etzioni. fall in information extraction from the web. In In IJCAI, 2007. 3 H. Becker, M. Naaman, and L. Gravano. Beyond trending topics Real-world event identi? ation on twitter. In ICWSM, 2011. 4 C. Bejan, M. Titsworth, A. Hickl, and S. Harabagiu. Nonparametric bayesian models for unsupervised event coreference resolution. In NIPS. 2009. 5 E. Benson, A. Haghighi, and R. Barzilay. Event discovery in social media feeds. In ACL, 2011. 6 S. Bethard and J. H. Martin. Identi? cation of event mentions and their semantic class. In EMNLP, 2006. 7 N. Chambers and D. Jurafsky. Template-based information extraction without the templates. In proceedings of ACL, 2011. 8 N. Chambers, S. Wang, and D. Jurafsky. Classifying temporal relations between events. In ACL, 2007. 9 C. Danescu-Niculescu-Mizil, M. Gamon, and S. Dumais. Mark my words Linguistic style appointment in social media. In Proceedings of WWW, pages 745754, 2011. 10 A. Das Sarma, A. Jain, and C. Yu. Dynamic kin and event discovery. In WSDM, 2011. 11 G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) ProgramTasks, Data, and Evaluation. LREC, 2004. 12 T. Dunning. straight methods for the statistics of su rprise and coincidence. Comput. Linguist. , 1993. 13 J. Eisenstein, B. OConnor, N. A. Smith, and E. P. Xing.A latent variable model for geographic lexical variation. In EMNLP, 2010. 14 J. Eisenstein, N. A. Smith, and E. P. Xing. Discovering sociolinguistic associations with structured sparsity. In ACL-HLT, 2011. 15 E. Erosheva, S. Fienberg, and J. La? erty. Mixed-membership models of scienti? c publications. PNAS, 2004. 16 A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011. 17 J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. 18 E. Gabrilovich, S. Dumais, and E.Horvitz. Newsjunkie providing personalized newsfeeds via analysis of information novelty. In WWW, 2004. 19 K. Gimpel, N. Schneider, B. OConnor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging 10. CONCLUSIONS W e have presented a scalable and open-domain approach to extracting and categorizing events from status messages. We evaluated the quality of these events in a manual evaluation showing a clear improvement in performance over an ngram baseline We proposed a novel approach to categorizing events in an open-domain text genre with unknown types.Our approach based on latent variable models ? rst discovers event types which match the data, which are then used to classify aggregate events without any annotated examples. Because this approach is able to leverage large quantities of unlabeled data, it outperforms a supervised baseline by 14%. A possible avenue for future work is extraction of even richer event representations, while maintaining domain independence. For example grouping together related entities, classifying entities in relation to their roles in the event, thereby, extracting a frame-based representation of events.A continuously updating demonstration of our system can be vi ewed at http//statuscalendar. com Our NLP tools are available at http//github. com/aritter/twitter_nlp. 8 http//newstimeline. googlelabs. com/ 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 for twitter Annotation, features, and experiments. In ACL, 2011. T. L. Gri? ths and M. Steyvers. Finding scienti? c topics. Proc Natl Acad Sci U S A, 101 Suppl 1, 2004. R. Grishman and B. Sundheim. Message understanding company 6 A design history.In Proceedings of the International Conference on Computational Linguistics, 1996. Z. Kozareva and E. Hovy. Learning arguments and supertypes of semantic relations using recursive patterns. In ACL, 2010. G. Kumaran and J. Allan. Text classi? cation and named entities for new event detection. In SIGIR, 2004. J. D. La? erty, A. McCallum, and F. C. N. Pereira. Conditional random ? elds Probabilistic models for segmenting and labeling sequence data. In ICML, 2001. J. H. Lau, K. Grieser, D. Newman, and T. Baldwin. Automatic labelling of top ic models. In ACL, 2011. J.Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, 2009. W. Lewis, R. Munro, and S. Vogel. Crisis mt Developing a cookbook for mt in crisis situations. In Proceedings of the Sixth Workshop on Statistical Machine Translation, 2011. C. X. Lin, B. Zhao, Q. Mei, and J. Han. PET a statistical model for popular events tracking in social communities. In KDD, 2010. J. Lin, R. Snow, and W. Morgan. Smoothing techniques for adaptive online language models Topic tracking in tweet streams. In KDD, 2011. X. Ling and D. S. Weld.Temporal information extraction. In AAAI, 2010. X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In ACL, 2011. I. Mani, M. Verhagen, B. Wellner, C. M. Lee, and J. Pustejovsky. Machine learning of temporal relations. In ACL, 2006. I. Mani and G. Wilson. Robust temporal processing of news. In ACL, 2000. R. C. Moore. On log-likelihood-ratios and the signi? cance of rare even ts. In EMNLP, 2004. R. Munro. Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol. In CoNLL, 2011. G. Neubig, Y. Matsubayashi, M. Hagiwara, and K.Murakami. Safety information mining what can NLP do in a disaster -. In IJCNLP, 2011. D. Newman, A. U. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS, 2007. D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In HLT-NAACL, 2010. ? e D. O S? aghdha. Latent variable models of selectional preference. In ACL, ACL 10, 2010. S. Petrovi? , M. Osborne, and V. Lavrenko. Streaming c ? rst story detection with application to twitter. In HLT-NAACL, 2010. 41 A. -M. Popescu and M. Pennacchiotti.Dancing with the stars, nba games, politics An exploration of twitter users response to events. In ICWSM, 2011. 42 A. -M. Popescu, M. Pennacchiotti, and D. A. Paranjpe. Extracting events and event descriptions from twitter. In WWW, 20 11. 43 J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, D. Day, L. Ferro, and M. Lazo. The TIMEBANK corpus. In Proceedings of Corpus Linguistics 2003, 2003. 44 A. Ritter, C. Cherry, and B. Dolan. unattended modeling of twitter conversations. In HLT-NAACL, 2010. 45 A. Ritter, C. Cherry, and W. B. Dolan.Data-driven response generation in social media. In EMNLP, 2011. 46 A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets An experimental study. EMNLP, 2011. 47 A. Ritter, Mausam, and O. Etzioni. A latent dirichlet allocation method for selectional preferences. In ACL, 2010. 48 K. Roberts and S. M. Harabagiu. Unsupervised learning of selectional restrictions and detection of argument coercions. In EMNLP, 2011. 49 T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users real-time event detection by social sensors. In WWW, 2010. 50 R. Saur? R.Knippen, M. Verhagen, and ? , J. Pustejovsky. Evita a robust ev ent recognizer for qa systems. In HLT-EMNLP, 2005. 51 F. Song and R. Cohen. distort interpretation in the context of narrative. In Proceedings of the ninth depicted object conference on Arti? cial intelligence Volume 1, AAAI91, 1991. 52 B. Van Durme and D. Gildea. Topic models for corpus-centric friendship generalization. In Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, 2009. 53 D. S. Weld, R. Ho? mann, and F. Wu. utilise wikipedia to bootstrap open information extraction. SIGMOD Rec. , 2009. 54 Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 98, 1998. 55 L. Yao, A. Haghighi, S. Riedel, and A. McCallum. Structured relation discovery using generative models. In EMNLP, 2011. 56 L. Yao, D. Mimno, and A. McCallum. E? cient methods for topic model inference on stream ing document collections. In KDD, 2009. 57 F. M. Zanzotto, M. Pennaccchiotti, and K. Tsioutsiouliklis. Linguistic redundancy in twitter. In EMNLP, 2011.

No comments:

Post a Comment