Some of the corpora I use in my work include:
HERMES - A 100 million word randomised corpus of tweets originally collect in 2009. I have recently compiled a new version of this corpus in 2013. This corpus is used in Discourse of Twitter and Social Media
Obama Win Corpus (OWC) - A corpus of 45,000 tweets containing the lexical item 'Obama' collected over the 24 hours after the announcement of Barak Obama’s victory in the 2008 US presidential elections. This corpus is used in Ambient affiliation: A linguistic perspective on Twitter
MORPHEUS - A 100 million word corpus of tweets about sleep!
LUCIA - The entire twitter stream of a single user who writes about her experiences of motherhood.
Other corpora
Tweets2011 corpus (only IDs - you need to reconstruct the corpus once you have the IDs)
FSD corpus of tweets - This page includes code the enables you to download the FSD corpus of tweets
Twitter Stratified Random Sample (SRS) - A time-stratified, random sample of tweets. They sample at 10 minute intervals to build "a set of month-based corpora, each containing at least one million English tweets".
Twitter Stratified Random Sample (SRS) - A time-stratified, random sample of tweets. They sample at 10 minute intervals to build "a set of month-based corpora, each containing at least one million English tweets".
hi, may I ask if HERMES is available publicly . thank you :)
ReplyDeleteSomehow I missed this comment - sorry. Unfortunately Twitter's Terms of Service doesn't allow me to share the corpus :(
ReplyDeleteCorpora Photography based in Washington DC provides professional photography services for Washington DC Headshots and Headshots Orlando.
ReplyDelete