Showcasing the possibilities of using hashtags for investigations.
Twitter is a helpful platform to share and receive news, promote events and relay information in real time. From conferences and concerts to demonstrations, Twitter comes in handy for journalists, activists, NGOs, businesses, celebrities, news portals and the public in general. While Twitter does not follow the controversial 'real name' policy of Facebook, and has better privacy ratings than other social media platforms, surveillance concerns are nevertheless substantial. Hashtags are a very efficient feature to file, pool, and find information, but there is another side to this feature. Hashtags and those who use them are vulnerable to surveillance and social graphing.
In this article we aim to showcase the possibilities of using hashtags for investigations, and also raise awareness around possible threats so users can make informed choices on how to use Twitter. We focus on two security conferences, Black Hat and DEF CON, in the period between 30 July and 7 August 2016. Through extracting a sample of tweets posted to the hashtags #Bhusa and #Defcon we will analyse the accounts of the users who attended the conference and map their locations and networks through using Twitter's API, the mapping tool Carto and the visualisation platform Gephi.
Monitoring conferences through hashtags
When clicking on the event’s hashtag one can find all user accounts who tweeted about it, while using that particular hashtag, whether they were attending or not. If the user shares photos or videos, that aren't official conference material from the event, then most probably they are physically present there. Same if they share their location to be the same location as the conferences' or if their name appears in the event's list of speakers or participants.
Once you have identified a conference participant, it is also possible to understand the user’s background and network through analysing the people they follow and their followers if their Twitter profile is public.
Using open source tools, it is possible to understand the background and the network of participants attending a specific conference through monitoring a hashtag, analysing their tweets, their followers and those they are following. This is not only important to show the investigative possibilities of a hashtag, but also to demonstrate how easy it is to monitor attendees and expose their networks on Twitter.
Attending a conference can seem harmless for many people, but it is not the case worldwide. Restrictions apply in many countries for a variety of reasons; and in some cases serious repercussions have been documented. From visa denial by the UK government to the 2008 case of the Tunisian human rights lawyer being stopped from boarding a flight on the way to attend the Arab Free Press Forum. In 2003 a group of 41 people were arrested by the Government of Sudan on their way to attend a conference. More recently in 2016 a Palestinian journalist was arrested by Israeli forces on his way to attend the European Federation of Journalists in Sarajevo.
For the sake of this article we initially thought of using the Internet Freedom Festival (IFF), a conference that takes place annually, as an example in order to also raise awareness on how relatively simple it can be to map out activists and their networks when tweeting publicly at an event they are attending. However we did not want to expose individual participants networks by using them as a case-study. We then decided to focus on an event participated by companies involved in the surveillance industry. Black Hat and DEFCON were the two most relevant and time-senstive events we could identify in the time period we were researching this case-study.
On a conference hashtag, tweets can share the following information: date of the event, location of the event, content and resources, images and videos from the event, opinions about the event, networks through including others in the tweets through @ing them and much more.
Twitter's API (Application Programming Interface) provides a way to access, read, write and search Twitter’s data. The data includes tweets, hashtags, user accounts, favourites, locations and other similar data. The API also allows users to monitor tweets in real-time through hashtags, keywords, user accounts and specific locations through Twitter's streaming API. The following 550 tweets were collected via Twitter's API and imported into a CSV file. See the spreadsheet in full here via Google Docs (the full data set of both conferences can be found on the fourth sheet under "total tweets").
On the spreadsheet the following information can be seen:
Date of the tweet
Location timezone. This can be found in the profile setting. It’s usually set based on the computer’s timezone. It can be changed manually.
Location profile which is the location that has been set by the user on his/her profile.
Latitude and Longitude which we have set according to the location of the user.
Advanced Search on Twitter
The tweets can also be searched and collected manually by going to the advanced search options of Twitter.
You can then enter the hashtags, keywords, or usernames that you want to monitor, as well as a specific phrase or date range. In our case we will search in the dates of the conference.
We then cross referenced location timezones and location profile information to verify the actual location of the user.
Geolocating tweets through Carto
Using Carto (previously known as CartoDB), a free web-based tool to create interactive maps, we geolocated the tweets we had extracted. This will be very useful when trying to understand where the participants of the event or the people sharing its hashtag are located.
To do this we imported the latitude and longitude fields on a CSV file into Carto which automatically generated the map shown below:
The map can be seen here in more detail.
After geolocating the tweets it is possible to analyse them to find specific accounts of people and/ or companies who were attending the particular conference. This can be easily done with a quick text search for the needed information through the data extracted into the CSV file.
For this case study, and in order to highlight how social graphing of activists through social media can happen; we will be using a security company as a laboratory rat. We looked for any security company from the list of companies that appeared in the Surveillance Industry Index, the world's largest publicly available educational resource of data and documents on the surveillance industry. It is based on data collected by journalists, activists and researchers across the world.
A screenshot from the Surveillance Industry Index
The purpose of searching the Twitter accounts of these companies is to understand if they were represented at the conference at hand or not, as well as understanding their networks (following/followers). We can see from the results that there were two companies attending the #bhusa and #defcon conferences. One of these companies is Blue Coat who tweet under the name @Bluecoat as seen below:
Blue Coat Systems Inc. is based in Sunnyvale, California and according to their website, 'offers advanced network, security and cloud protection for 15,000 organisations every day'. Further details about the company and the type of equipment and infrastructure it offers can be found on the Surveillance Industry Index database.
The second company we focused on is Check Pointas you can see below tweeting under @CheckPointSW:
Check Point, based in Tel Aviv, sells cyber security products. Have a look at the Surveillance Industry Index database for further information.
The following tweets from Blue Coat's official account confirms attendance of the conference by stating they had a booth there:
Using publicly-available databases, such as the Surveillance Industry Index can be helpful for investigations. In this case we were able to find out the security companies attending the #bhusa conference as they were sharing the hashtag (#BHUSA) in their tweets.
There are many ways one can understand the network of a specific user and the relationships between them and other Twitter users. One of the techniques includes collecting and mapping out all their followers and those they follow. This can be done through using Twitter’s API - which is worth noting, has no limit to the number of followed or followers one can retrieve from a specific user. Other ways include understanding networks between users through monitoring the “Retweets”, “Mentions” and “Hashtags” between them.
In this case we want to monitor specific users that we identified earlier: @Bluecoat and @Checkpointsw and for this we used the visualisation software Gephi to help us understand the network of both accounts. Gephi is a free and open source tool for data analysts as well as exploring and understanding graphs and networks. Like Photoshop but for graph data, the user interacts with the representation, manipulates the structures, shapes and colours to reveal hidden patterns.
How-to create a social graph on Gephi
Create an application on Twitter through their application management settingsto get credentials which will then be added to Gephi.
A screenshot of the process of creating a Twitter app
Then add a plugin called “Twitter Streaming Importer” to Gephi, which can can be done by going to Gephi, clicking on Tools and then Plugins and then Available Plugins, the screenshots below demonstrate this process:
Next install the “Twitter Streaming Importer” plugin then add the credentials from the Twitter App you have created previously as shown below:
Once this is done, add the Twitter users that you wish to reveal their network. In this case study we used @Bluecoat and @Checkpointsw. You can also add specific keywords and hashtags as below:
The last important step is to select how Gephi reveals the network. This process includes three options:
User Network: This creates a graph based on the user’s network through Retweets and Mentions.
Full Twitter Network: This creates a full graph of a tweet's activity. This graph will include tweets, users, media, hashtags and links.
Hashtag Network: This creates a graph based only on hashtags.
For the purpose of this tutorial we will select the User Network option and then click on connect.
Gephi will start creating the graph of the Twitter user’s network which will look like the image below. The circles below are called nodes and they represent all Twitter users that Gephi collected. The lines represent connections between the circles and those are the type of connections between users that Gephi created through Mentions and Retweets.
Another view of the same network created on Gephi:
All the orange nodes are Twitter users that were collected through Gephi. As we see above @checkpointsw and @bluecoat have a bigger text size which in this case is due to them being the key terms and also that they are the most connected in this network. The edges in green reflects Twitter users who “Mentioned” @checkpointsw and @bluecoat. The edges in red reflects Twitter users who “Retweeted” @checkpointsw and @bluecoat.
From this graph you can start applying filtering techniques and further analysis to better understand the network. For example one can easily find users who are connected to both accounts. Also users who are the most influential in their network.
This article has focused on collecting data, the next step is to begin filtering and analysing the data to better understand the network. Something that Marc Owen Jones writes about in his investigation on uncovering at least 10,000 tweets per day coming from suspicious accounts that spread pro-Saudi propaganda on Twitter. Read about his investigation that delved deep into analysing twitter hashtags and where they were coming from here.
Hadi Al Khatib, is the founding member of The Syrian Archive. Hadi works on security and protection of human rights defenders. He has also produced a radio programme with Nasaem Syria Radio to raise awareness on issues related to the security of Syrian journalists and human rights activists.