Tweak the Tweet - Social and Technical Considerations

Tweak the Tweet

Tweak the Tweet is an idea for utilizing the Twitter platform as a two-way communication channel for information during emergencies, crises, and disasters. Researchers in the area of crisis informatics have recognized that social media sites are places that people turn to during major events to both inform others and to get information from others. Tweak the Tweet seeks to formalize some of these communications to make the information shared more easily processed and redistributed back to the public.

The idea takes advantage of the public nature of Twitter as well as the availability of tools to filter and collect tweets. It also seeks to allow users to inform the public of disaster-related information within (or in a very similar way to) their normal Twitter communication patterns.

Crisis Reporting Hashtag Syntax

A Tweak the Tweet campaign asks users to format their tweets with specific hashtags that allow computers to do a first round of processing on the information. This processing includes extracting location information, creating incident reports from tweets, and sorting these reports into different types of categories. The processed tweets can then be displayed on public web-pages in a variety of formats that allows users to view aggregate information. Examples include spreadsheets that can be sorted over report type and interactive maps that allow users to see where different types of information has been reported.

TWEET-BEFORE: roads from PAP to les Cayes are open migration from PAP to rural areas has begun

TWEET-AFTER: #haiti #open roads from #loc PAP to les Cayes are open #info migration from PAP to rural areas has begun

This tells the computer:

what = road
what about it = open
where = at PaP to les Cayes
what else: “open migration from PAP to rural areas has begun”

TWEET-BEFORE: Altagrace Pierre needs help at Delmas 14 House no. 14.

TWEET-AFTER: #haiti #name Altagrace Pierre #need help #loc Delmas 14 House no. 14.

This tells the computer:

what = need help
who = Altagrace Pierre
where = Delmas 14 House no. 14.

To implement a Tweak the Tweet instance during an event, there are two crucial elements that must be successfully implemented: a social media campaign that instructs and motivates people to use the format, and the technical tools to collect, process, and redistribute the information created.

Section 1: Social Media Campaign

Perhaps not surprisingly, our initial construction of this idea under-estimated the importance and complexity of this phase. To leverage Twitter as a semi-formal communication channel, you have to shape the behavior of a large number of users during what is likely to be a very stressful time for them. After launching Tweak the Tweet for four events, we realize that this is the most critical piece of the system, and we are still brainstorming new and better ideas for how to support behavior shaping - getting users to effectively and efficiently use the TtT format, without hampering their ability to communicate in a way similar to their normal Twitter patterns.

Teaching the format

There are several approaches to teaching the format. Our strategy has been to use as many different methods as possible.

1) Prescriptive tweets. Send out prescriptive tweets at regular intervals. (Like the ones in our example above).

2) Translated tweets. Translate other tweets into the TtT format and retweet.

3) Automatic retweets. Retweet any tweet that correctly uses the format. We often use software - a python script - to do this.

4) Webpage. Maintain a webpage with an explanation of TtT, prescriptive tweets, and examples.

5) Syntax editor. Maintain a web application that helps users use the correct hashtags and format their tweets.

6) Mobile client. Create a Twitter client (web and mobile) that helps users format their tweets and then allows them to send a Twitter update from the application. 

Promoting the format

One advantage of Tweak the Tweet over anonymous applications that use private channels is that the use of the format is automatically broadcast on the public information stream. When one user begins to use TtT, other users can see the reports. This can work to both promote and to teach. A possible way of further leveraging this effect is to find influential Twitterers among the affected community - and encourage them to use the format. During the aftermath of Chilean earthquake, we were able to encourage an influential Chilean Twitter, one who was listed by CNN in their Twitter list for that event, to tweet our prescriptive tweets and examples. This led to a rapid uptake among Chilean Twitterers and a large amount of reports tweeted in the TtT format.

Things that work

Being part of the community is extremely important to implementing a successful TtT campaign. Community can take on different meanings here: there is a geographical community (those that live along the Gulf Coast during this Oil Disaster, for example); there is a cultural community (the Haitian diaspora was extremely important in helping to communicate, educate and direct relief activities through Twitter in the wake of the 2010 earthquake); and there is a virtual community of supporters and volunteers that develops after each and every event. Coordinating with the affected community and at the very least seeking their input and feedback into the development of the format is extremely important. What hashtags are being used? Are any others needed? What kind of things are being reported? What kind of things may need to be reported later?

As we mentioned above, it can also be very helpful to locate other Twitterers who are part of the community, especially potential influencers, and encourage them to help with the TtT campaign by using the format and retweeting examples and prescriptive tweets.

Another component that has emerged as being important during these events is interaction. Twitter is not merely a broadcast media for one-way communication. The Tweak the Tweet idea relies on the platform's ability to enable and encourage two-way communication. This may mean more than simply promoting and moving tweaked tweets. It often means interacting with the people reporting to you. Many tweet for multiple reasons. Obviously, they want to inform others about what they see, hear, smell, etc. But many are also looking for support as they deal with catastrophic events that are hitting too close to home. They want to know that someone is listening and they appreciate words of support, encouragement, and consolation. We are beginning to think that this is a key reason that people turn to profile-linked social media like Twitter for reporting - and may prefer it over anonymous reporting through web and mobile applications.

Section 2: Software to collect, process, re-distribute tweet info

On to the easy stuff. The technical part of Tweak the Tweet is outlined below. There are four components: collecting tweets, processes tweets into records, storing records, and displaying the collected information / records. The solution described represents just that - a solution. There are definitely others, probably more efficient ones.

A. Collect the tweets

The first thing you have to do is find and grab all of the TtT tweets related to the event you are monitoring. 

Twitter provides two APIs - tool sets - that allow you to search all public tweets. Our solution uses the Twitter Streaming API, which tracks tweets from the public timeline in real-time. This means that we must maintain a connection to avoid losing data. We tend to have back-up scripts running from multiple computers and use this redundancy to minimize down time and data loss. 

Another approach would look at the Twitter Search API. This looks back in time to find tweets using certain hashtags. While we have found these results to be less good than the Streaming API, this is a way of avoiding data loss when connections go down. You may want to build even more redundancy by collecting from the Streaming API and the Search API, then removing duplicates.

For our collection methods, we first filter across hashtags that we identify people using for the event. We have had more success tapping in to existing event hashtags than trying to prescribe a new event hashtag. As an event progresses, we may have to add new tags to our first filter group. For instance, for the Oil Spill, we began with #oilspill, then added our own #oilreport tag, then added #oildisaster and #gulfcoast after having other Twitterers recommend them to us.

To hone in on specifically tweaked tweets, we then filter across what we are calling a "primary" tweak the tweet hashtag. This might be something like #need for the Haiti event, or #wildlife for the Oil Spill event. So we cut the data set down to tweets that have both an event tag and a primary TtT hashtag. Some non-tweaked tweets still sneak through, but this does a good general cut.

We cut again after the record phase and make sure that the records match typical tweaked tweet structure. We'll talk about that later.

B. Process the tweets into records

The next part of the process takes the tweets and parses them into records. These records may differ slightly across event, but they include tweet text, author, timestamp, primary tag type, location info, contact info, name of missing person, other source of info, etc. This is a multi-layered process as well. Here is how I am currently doing it.

Parse and Filter 

1) Every 20 minutes, a separate (Ruby) script goes through and processes all new tweets. This script parses the tweets into the fields and records.

2) Turn each instance of "#<hashtag> <text>" into a key-value pair and populate a database with those fields/values. This is done using a series of reg-exes and some hacky code for special situations that seem to arise for each event.

3) Retrieve any location info in the metadata and put that into GPS, place, bounding box fields of database.

4) Parse (if possible) textual location into GPS lat/long. These will be stored in another GPS lat/long.

     Order of use:

#lat/#long tags from tweet text will be used when available

lat/long extrapolated from textual location will be used 2nd

(there are several free tools that claim to help with this, Google and Yahoo both have one)

if neither of the above are available, GPS location from tweet meta-data will be used

HOWEVER, you may want to privilege automatic GPS tags from a mobile device - in which case the 3rd entry here would be first accepted. It depends on whether you're taking into account translated tweets where other users add geolocation info and TtT tags.

5) Filter - If a parsed record has a minimum number of fields with information, we keep it. If not, we discard. How we privilege fields and set the minimum varies across event.

Alternative filters

Remove Retweets - in most cases, we want to remove all retweets. However, if people are translating and adding tweak tags to these retweets, we might want to keep them. I use some really hacky code to determine this. Whether or not you include this step depends on your orientation towards crowd-sourcing the TtT process.

Remove duplicate records - if a record has a match - same type of report, similar place, similar time - you may want to remove it. This is important when you are accepting translated tweets - and multiple Twitterers are translating the same reports.

Identify spam - we've been doing this manually. If your instance works too well, you'll need to do more of this. See SpillMap for how spam can infiltrate an anonymous reporting system. I assume it can do the same to a Twitter reporting system. There may be computational solutions for some of this - but I can't point you to them. Sorry.

Identify possible bad information using Twitter profile info - same as above.

C. Store records

We use a MySQL database on the back-end for the records. We also maintain a public Google Spreadsheet with these records. Every 20 minutes, a script (the same one as above) updates the Google Spreadsheet with any new records. It also may update existing records if more information has arrive - for instance if someone has retweeted a report and added #lat/#long info.

D. Map tweets

We have a webpage that uses HTML/Javascript and takes info from our public Google Spreadsheet and maps it. We offer several different views, and most use different color markers for different type of records. When a user clicks on the marker, they are able to see the record as well as the original tweet, timestamp, and tweet author.

Other mapping possibilities

E. Extra steps

For the @OilReport account, we had trouble getting users to add lat/long info and/or to turn on automatic geo-tagging. We have wizard-of-ozzed our data collection by doing a good deal of manual geolocation adding. If a record has no geolocation information, we:

1st - check tweet stream to determine an approx. location for tweet. If we can find some clues, we go to to find lat/long coordinates and enter those directly into the database record.

2nd - if that doesn't work, we use our formal Twitter account for our research group or the event to tweet the  author of the tweet-recrod to see if he/she will give a location in a tweet reply or DM

Coding help

We currently have coded solutions for each of these pieces in operation for Oil Spill/Oil Disaster reporting through our OilReport account and webpages. However, as you can tell from the above, there are contingencies for every event and places where human intervention is often needed to maintain the system. Additionally, each event will require slightly different software solutions depending upon the structure of the TtT format used. Email us and we will work with you to share our code and insight and get your instance up and running.


First person reporting vs. crowd-sourcing/translating

accept RTs w/ value added or exclude all RTs

privilege geo-tags in meta-data, or location info in text

(possibly) use geo-tags in meta-data as a filter - only include GPS-tagged tweets

(Only about 3-5% of tweets we've seen for #oilspill have GPS data in the meta-data. In this scenario, a huge part of your TtT campaign may be to get Twitterers to enable geo-tagging - from both their Twitter profile and their mobile client).

Links to Examples

OilReport map

Spreadsheet for tweet records from Chile after Earthquake

Project EPIC website