The academic community I’m most heavily involved in – Digital Humanities – are fairly invested in twitter. At all times of the day there are major figures, students, and newbies in the field on there, just hanging out, debating topics, forwarding links to events, job postings, interesting research and cool things they have stumbled upon. People have studied this – graphing and charting the discussions, especially around the DH conference, and heck, even I have co-authored a paper on the subject.
I’m currently working on a book/project called Defining Digital Humanities and I thought, wouldn’t it be fun to get all – and I mean all – the tweets that contain the hashtag #Digitalhumanities – what fun could be had charting the growth of the discipline, the geolocation of tweets, the networks that exist, the sentiments surrounding it – etc etc. Now, hindsight is a grand thing - I should have thought to start scraping these back in 2006 – but surely it must be possible to get access to this for research? So I asked.
The first approach was to Gnip – who have “full historical access to the twitter firehose available exclusively”. They were really very helpful, and we got into a conversation about my needs, their licensing, and – of course – costs. The upshot is that if you want a hashtag, you can get it for a price, with the text delivered in JSON format. I was quoted between $15,000 and $25,000 for the full historical set (depending on the exact volume of the data, they are now looking into it to give me the final figure - I and they dont yet know how many tweets there are containing this hashtag).
The second place I asked was Datasift– “the leading platform for building applications with insights derived from the most popular social networks and news sources”. They do have access to the historical twitter firehose, but they don’t do one off searches, and licensing will start at $3000 per month to get access to it (on a yearly contract). They will be launching a pay as you go service at some point, they tell me. By the way, you can get $10 worth of free credit for processing if you sign up and play around with some current searches: I set a set for #digitalhumanities and I had run out of credit within a few hours. (I find the user interface very obfuscating – I’m still wrangling with it to see what that data actually is!).
Now, these costs are very little compared to the costs to access the full firehose and lets face it – a free service like twitter has to make its money somewhere. These were not vexatious enquiries: I’d really like to do this study. But now I have to find $25k down the back of the sofa to get access to this data (and incidentally, if I do, I wont be allowed to quote it, only to show the stats that emerge from the analysis). $25k is a fair whack of money in academia-land. It will also take around 6 months (at least) to write it into a grant proposal to raise the money – and how to persuade academic funders that buying this dataset is good use of their money? Frankly, I’m not sure that will fly in the arts and humanities, where complete grant costings can come under £100k for a one year project.
Thinking caps are now on to see how we can get funding put together to get access to the data of the community I – goddamit – helped (in some small way) to create. I love twitter with a passion and it continues to inform and aid my teaching and research. But when we invest so much in a free service, we are selling ourselves. It’s interesting to see how much #digitalhumanities is “worth” to others. Anyone got a free $25k?