This guide looks at how metadata has been used to expose, protect and verify abuses and excesses of power. We will then focus on exactly what metadata is contained within what format and introduce tools to extract, strip and add metadata.
Metadata can be understood as a modern version of traditional book cataloguing. The small cards stacked in library drawers provide the title of the book, publication date, author(s) and location on the library shelves. Similarly in the digital sphere, a digital image may contain information about the camera that took the image, the date and time of the image, and often the geographic coordinates of where it was taken. Such multimedia-related metadata is also known as EXIF data, which stands for Exchangeable Image File Format.
The Australian National Data Service provides the following definition: “Metadata can actually be applied to anything. It is possible to describe a file on a computer in exactly the same way as one would describe a piece of art on a wall, a person on a job, or a place on a map. The only differences are in the content of the metadata, and how it refers to the “thing” being described.” Metadata is structured information that descibes, explains, locates or otherwise simplifies the retrieval, usage or management of an information resource. Metadata is often called data about data or information about information.
In an interview with Exposing the Invisible, Smári McCarthy, head of the technology team on the Organized Crime and Corruption Reporting Project, says that “every information source has metadata, sometimes it is very explicit, created as part of the documentation process of creating the data. PDF files, images, word documents, all have some metadata associated with them unless it has been intentionally scrubbed.”
To illustrate this point, McCarthy describes a small chip contained within all digital cameras which tracks all the metadata of that device. He explains that all of these chips, known as Charge-Coupled Device (CCD) chips, basically light-sensitive circuits, come with minor factory flaws that are unique to the individual CCD chip. This idiosyncrasy means that the data contained within all images taken with that device, data one would usually ignore and is invisible to the human eye, becomes a digital ‘finger print’ identifying all images taken with that particular CCD chip. This highlights the almost omnipresence of metadata as well as the possibilities of working with it. McCarthy calls metadata “a best friend, it helps with searching, it helps with indexing and with understanding the context of the information.” But even metadata enthusiasts like McCarthy admit that metadata can also become “a worst enemy”, and thus understanding it is crucial not only for people working with metadata, but also for the wider network of individuals and groups working on sensitive information.
The possibilities of using metadata are multiple and varied. The Australian National Data Service points out that:
“metadata generally has little value on its own. Metadata is information that adds value to other information. A piece of metadata like a place or person’s name is only useful when it is applied to something like a photograph or a clinical sample. There are, though, counter-examples, like gene sequence annotations and text transcripts of audio, where the metadata does have its own value, and can be seen as useful data in its own right. It’s not always obvious when this might happen. A set of whaling records (information about whale kills in the 18th century) ended up becoming input for a project on the changing size of the Antarctic ice sheet in the 20th century.”
Michael Kreil, an open data activist, data scientist and data journalist working at OpenDataCity, a Berlin-based data-journalism agency which specialises in telling stories with open data, says :
“metadata seems to be some kind of a by-product, yet it can be used to analyse certain behaviours, of political and social nature, for example. Let’s take something simple as an example, like a phone call. Making a phone call doesn't seem very important. It's hard to analyse one million phone calls or one million photos, with the analysis being based on speech recognition or face detection, both fields still being in a state of technological development. But it's pretty easy to analyse the metadata contained within them, because metadata has a simple, standardised format for every phone call: there is the date, the timestamp, the location and numbers of the caller and the callee. This standard allows us to analyse a huge amount of metadata in one big database. For example are there instances happening in the population that are represented in the metadata, such as who has depression or who is committing adultery?”
Often, this type of metadata is more valuable that the content of the phone call. This metadata provides information about networks, their scale, frequently visited locations and far more besides. There is currently no online communication method which does not leave metadata traces throughout or at some crucial point of the communication process.
Activists, experts, investigative journalists and human rights defenders are increasingly taking an interest in metadata, as are governments and corporations. Using metadata has proven helpful in various cases in fighting corruption, or as a weapon to crackdown on dissidents and human rights defenders.
It is important to understand how metadata works and how to use it as a tool. It is also vital to know also how to protect oneself and one’s work in relation to the metadata we generate. Whether exposed, stripped, or added and verified, standalone or used in cross-reference with other data found through other sources (conventional or not), metadata is key to today’s investigative journalism and human rights advocacy especially when it comes to documentation, to image and video activism as well as for evidence collection. Understanding metadata and how to use it is crucial for self-protection and the protection of one’s work.
Metadata is a powerful tool to expose and provide evidence. In 2009, data scientist Michael Kreil created Tell-all Telephone, a project that generated a visualisation of six months of German Green Party politician Malte Spitz's telephone data. Michael Kreil told Exposing the Invisible that he had received “an Excel sheet with 36,000 lines of whatever in there, and there was no tool at all to have a look inside. You could make a simple map, using just the geolocation data, but you wouldn't see the aspect of time. You wouldn't see the movement. So, I wrote a small prototype, just a simple map with a moving dot. This was actually the basis of the application that went online a few weeks later.”
The data provided revealed much about Spitz’s behaviour, when he was walking down the street or when he was on a train, as well as his whereabouts during his private time. Some of the information was not provided by Spitz’s telecommunication company, like the phone numbers he called or texted, or those who contacted him. This would have made it easy to not only identify Spitz’s social and political circles and reveal much about him but also reveal personally identifiable information about the people with whom he is in contact. Kreil and Spitz were not granted access to this information, but the telecommunication company does have access to it and this means that the authorities can also acquire access to all of this information.
Kreil also used publicly available information, like Spitz’s online behaviour, appointments announced on the party’s website as well as his tweets to corroborate the data provided by the telecommunication company. By combining all of this data, it was possible to for Kreil to pinpoint Spitz’s movements even further and the result provided a thorough analysis of Spitz’s life and political activities. Kreil hoped to demonstrate how metadata can be used to not only track an individual’s every move, but also to reveal how (meta)data retention can expose an individual’s whole social and political network.
Image from the Tell-all Telephone website
Illinois Republican congressman Aaron Schock was known as the 'most photogenic' congressman due, in part, to his Instagram account which featured him in eccentric and zany poses in exotic locations. He posted pictures of himself jumping into snow banks, on sandy beaches and in various private planes. The attention his photos attracted led to questions about where Schock’s public-office-related business trips stopped and his holidays started. The Associated Press (AP) began an investigation which extracted the geolocation data from the photos Schock posted and tagged along with his location on his Instagram account and then compared it to the travel expenses he was charging to his campaign expenses. The AP analysed his travel expenses, his flight records of airport stopovers and the data extracted from his Instagram account and found that taxpayer’s money and campaign funds had been spent on private plane flights. It wasn't only Schock's Instagram that was revealing. The account of a former Schock intern showing an image from a Katy Perry concert with the tag-line "You can't say no when your boss invites you. Danced my butt off," was connected to a $1,928 invoice paid to the ticket service StubHub.cm listed as a “fund-raising event” on Schock's expenses. The AP published their findings in February 24, 2015 and on March 17, 2015 he announced his resignation from Congress.
Image of Aaron Schock from his, now closed, Instagram account
In most cases, it is necessary to employ a variety of software, tools and resources to make sense of the extracted metadata and extract meaningful information. A good example of these creative investigation techniques using metadata is the case of Dmitry Peskov, Putin's spokesperson. Peskov was questioned about his income as a state official when he was spotted wearing an 18-carat gold Richard Mille watch, worth almost £400,000. The watch was visible on his wrist in a photo posted from his wedding. During the ensuing controversy, Peskov stated that the watch was a gift from his new wife, a claim which was later refuted by a photograph on his daughter’s Instagram account. There, a photo posted by his daughter months before the wedding showed Peskov wearing the same watch.
Images found of Peskov's watch at his wedding and the watch in question the Richard Mille RM 52-01.
Peskov was hit with another scandal when rumours emerged about his spending during his honeymoon, which he spent with friends and family aboard the Maltese Falcon, one of the 25 most expensive yachts in the world. The weekly rent of the Maltese Falcon far exceeds the politician’s declared economic means. Bellingcat reports that anti-corruption activist Aleksey Navalny, who broke the news about the watch with the help of other activists and supporters, took up the investigation regarding the yacht. By using the yacht's website, yacht-spotting websites and Instagram photos from Peskov's daughter and one of Peskov's friends, they were able to provide reasonable doubt of Peskov's denial of personally renting the Maltese Falcon. Peskov's friend had posted photos of two yachts on his Instagram profile, and by using VesselFinder, Navalny and co. managed to place the two yachts in the same area as the Maltese Falcon at the same time. Navalny's team matched a small portion of a door that appeared in a photo Peskov's daughter posted of herself on Instagram, to a video of the Maltese Falcon showing the same door with two distinctive marks.
A lot of attention is focused on the metadata that can be extracted from images or from communications. However, text files can be equally useful for an investigation, or pose an equal threat as images. In 2005, the former prime minister of Lebanon, Rafik Hariri, was killed along with 21 others. Though the United Nations investigators used metadata to investigate the assassination of Hariri by looking through communication metadata they had received from telecommunications companies, they did not pay attention to the metadata they left behind. When their long-awaited report on Syria's suspected involvement in the assassination, known as the Mehlis Report, was published, it caused a stir not only for its findings but for what a deeper look into its metadata revealed. The metadata attached to the editing changes were shown along with the exact times they were made. The key changes included the deletion of names of officials allegedly involved in the assassination, including Bashar al-Assad’s brother and brother-in-law. This not only jeopardised the (deleted) mentioned individual, and various governments and international bodies involved in a gravely destabilised region; but the United Nations and the Mehlis team too. The incident was considered extremely serious and lead to the UN issuing a response to the concerns regarding the deletion.
There are many tools available that can be used to reveal the metadata in files and images, though as can be seen in the case studies, in most cases a wider investigation is required to make sense of the metadata. See the section on tools for information and description of the tools.
Metadata is a double-edged sword: it can be extremely useful for investigating social justice and corruption cases but it is also being used to troll and doxx. Human rights defenders, women, female journalists & LGBTIQ individuals vocal on social media are all prime targets. The increased usage of smartphones in protests and mobilisations worldwide has increased and expanded the risks of sharing one’s location or whereabouts at a certain time, and one’s identity can be determined through mobile phone tracking using the images posted. The geolocation data available in images can be used to track anyone and anything, including endangered species. In a South African reserve, visitors were advised not to disclose the whereabouts of the animals spotted and to switch off the geotag function on their phones and social media platforms as poachers and hunters were using this information posted online to locate animals.
Image taken by Eleni de Wet in South Africa and posted on her twitter on 4 May 2012.
In 2012, millionaire and controversial computer programmer and developer John McAfee, founder of McAfee Virus Protection, was arrested based on metadata found on an image posted by the media company, Vice. Vice journalists gained exclusive access to McAfee and accompanied him on his escape from an investigation in Belize regarding the murder of one of his neighbours. Vice not only posted the image, but bragged about their scoop by reporting on the time they spent with McAfee. When the image was posted with its metadata revealing where it was taken in addition to Vice’s publishing information on when they had seen him, it was simple to determine McAfee’s whereabouts. Though the image was most probably sent from the person who took it in Belize to Vice offices to be later uploaded on their website, it still retained the metadata of where McAfee was. The Vice journalists in question should arguably have known how to better protect their sources, as well as the subject of their reports, leading Vice to issue an official statement about the event.
Image by Robert King taken from the article "We are with John McAfee Right Now Suckers", published on Vice on December 3 2012.
One might assume that persons operating in high-risk areas and industries and taking part in high-risk activities would be more careful about revealing their whereabouts, but this was not the case for Michelle Obama or US soldiers in Iraq. In 2007, insurgents in Iraq used geotags from images shared online by US soldiers to attack and destroy several US AH-64 Apache helicopters. Michelle Obama's Instagram photos were geotagged revealing either her whereabouts when taking the images, or the whereabouts of the person managing her account. In both cases, this could and did pose a serious security threat not foreseen by those posting the images.
Image taken from Michelle Obama's Instagram published on Fusion
Metadata can be and has been used to curtail freedom of speech and intimidate people online. For example, it was used to doxx - a practice of targeting individuals for their political views or personal lives. It has been used to target women activists online, women game developers, human rights activists and journalists, among others. Managing metadata correctly is crucial for an individual with a high profile, especially on social media, and those who engage in political activities or lead their lives in ways that counter the mainstream or the status quo. The manual “Zen and the Art of Making Tech Work For You" discusses this particular aspect of metadata with recommendations and resources on the topic written from a gender and tech perspective.
A project by OpenDataCity also highlights how metadata can be used to put people in danger, often unwittingly.
“Years later, Balthasar Glättli (a Swiss politician) also wanted an analysis of his data. In the end, he didn't just give me his telephone data, he gave me everything else that is collected by the data retention in Switzerland. Additionally, Balthasar had a few problems, because he's also in the Defence Committee of the National Council. In his metadata was the location of a secret hideout that he visited. It was secret, but his phone provider collected Balthasar’s locations and, by publishing this data, some journalists found the hideout and published it. It was too late to remove it. It’s an interesting thing that when cellphones are tracked all the time, you should, actually, constantly think about when to switch off your cellphone in your pocket.”
Metadata also takes centre stage in the discussion around intellectual property, especially for artists. Some websites, like Facebook for example, strip out the metadata to minimise the size of the files (metadata occupies file space), and to protect the privacy of the users. This was a point of contention for people retaining intellectual property of their work. Many photographers, for example, needed to keep the metadata in their photos, especially in this age of mass sharing online without crediting. Here, the metadata provides a guarantee that the artist is assigned the credit they are entitled to for their work. Flickr, on the other hand, retains and shares the metadata, and though users can deactivate this feature, many are not aware it exists. On Flickr, a simple click on ‘show EXIF’ under the image reveals a lot of details which the user themselves might not be aware that they are sharing publicly.
Various tools can be used to remove the metadata from files and images, and there is always the option of tweaking the settings of the device or platform used to stop the registry of certain metadata. But to minimise the risks, it is recommended that one always double check what metadata is being shared (using the tools recommended in the Expose section), and then strip away any data left there and not intended for sharing. See the section on tools for information and description of the tools.
Metadata can also be used to verify information and evidence by 'proving' that a certain event took place at the time and place it was said to have taken place. In recent years, and with the viral spread of social media videos and images, verification has proven key to political participation, not just as a tool to prove something has happened at that time and place, but also to refute the spread of false videos and images that can discredit movements for social justice. In the Verification Handbook for Investigative Reporting, Christoph Koettl from Amnesty International explains how metadata helped verify the participation of the Nigerian army in extrajudicial killings.
We explored this in more detail in our interview with Harlo Holmes, the former technical lead on CameraV, and with a tool review of CameraV, a mobile App that enables users to verify photographs and videos in order for them to be able to be used as part of additional evidence in a court of law.
“CameraV which begun its life as a mobile App named InformaCam, was created by The Guardian Project and WITNESS. It's a way of adding a whole lot of extra metadata to a photograph or video in order to verify its authenticity. It's a piece of software that does two things. Firstly it describes the who, what, when, where, why and how of images and video and secondly it establishes a chain of custody that could be pointed to in a court of law. The App captures a lot of metadata at the time the image is shot including not only geo-location information (which has always been standard), but corroborating data such as visible WiFi networks, cell tower IDs and bluetooth signals from others in the area. It has additional information such as light meter values, that can go towards corroborating a story where you might want to tell what time of the day it is.
All of that data is then cryptographically signed by a key that only your device is capable of generating, encrypted to a trusted destination of your choice and sent off over proxy to a secure repository hosted by a number of places such as Global Leaks, or even Google Drive. Once received, the data contained within the image can be verified via a number of fingerprinting techniques so the submitter, maintaining their anonymity if they want to, is still uniquely identifiable to the receiver. Once ingested by a receiver, all this information can then be indexable and searchable.”
This raises a question regarding the forging and insertion of metadata. Looking at CameraV for instance, Harlo Holmes talks about this issue and raises an important point about the trustworthiness of the device used:
“Technically speaking, it’s very difficult for those things to be manually forged. If someone took the metadata bundle and changed a couple of parameters or data-points - what they ultimately send to us in order to trick us would not verify with PGP, and each instance of the App has its own signing key. That said, I do realise that devices need to be trustworthy. This is an issue beyond CameraV: any App that uses digital metadata and embeds it into a photograph or video is going to have to be a trustworthy device.”
Holmes elaborates on the importance of this trust by explaining that the
“verification in CameraV works the same way as with PGP. Key parties exist because human trust is important. CameraV easily allows you to export your public key from the App. If you give this key to someone when they're in the room with you, and compare fingerprints, then you trust that person's data more than if a random person just emailed you their public key unsolicited. If organisations want to earnestly and effectively use the App in a data-gathering campaign, some sort of human-based onboarding is necessary.”
Another useful tool for the purposes of verification is eyeWitness, a tool that allows users to capture photos or videos through their mobile camera App “with embedded metadata showing where and when the image was taken and verifying that the image has not been altered. The images and accompanying verification data are encrypted and stored in a secure gallery within the App. The user then submits this information directly from the App to a storage database maintained by the eyeWitness organisation, creating a trusted chain of custody. The eyeWitness storage database functions as a virtual evidence locker, safeguarding the original, encrypted footage for future legal proceedings.”
In addition to that, the eyeWitness team includes an expert legal team who will analyse the received images and identify the appropriate authorities, including international, regional or national courts, in order to investigate further. In some cases, eyeWitness will bring situations to the attention of the media or other advocacy organisations to prompt international action.
Multiple tools and workarounds can be used to verify metadata in files and images; experts and enthusiasts are constantly coming up with new ways to verify information. It is also important to note that verification is not always completed simply by using an App, but may in some cases require cross-referencing the data with other sources and undertaking creative investigative approaches.
To better understand how to work with and around metadata, it is important to know in practical terms what is generally meant when metadata is mentioned. Below is a list of the metadata that may be stored along with different types of data:
Depending on the program used to create the document, the data may include:
Metadata in video files can be divided in two sections
* Recommended reading: A thorough overview on video metadata and working with it from WITNESS.
Audio metadata is similar to video but more widely used especially to register property of the file. In addition to that it can include:
Metadata in communication depends on the type of communication used (i.e. email, mobile phone, smartphone..etc). But in general it can reveal the following (if no tools to hide the metadata are used):
There are various ways to extract metadata from files. The options vary according to operating systems, from tools to plug-ins, to desktop versions or in-browser tools.
Disclaimer: When using online platforms to extract metadata, it is important to keep digital privacy and security in mind. There is not enough information available to guarantee the confidentiality of the process. These platform might track your online behaviour, store your data, or share it with third parties or the authorities.
There are various ways to reveal or look at metadata-methods and tools that will be detailed later on in the chapter. Some tools can read the metadata of the in-built file information (like e.g. Photoshop) which means they will show the data in their format. Others have a more detailed output.
Though metadata can be removed or altered after a file is created, it is sensible to consider certain elements before creating the file. For example, it may be advisable to change the settings on your phone, use a certain App, modify user details on the software used, etc. Below are two examples of using a smartphone’s camera.
Fig. A: Photo taken with an Android phone using CyanogenMod. Does not show the geolocation or the type of phone used.
Fig. B: Photo taken with an iPhone. Notice the extra details revealed including address, type of phone, type of camera and program used.
There are various tools for viewing metadata, and the choice of tool may depend on the objective. In addition to the softwares that include a metadata feature (like Photoshop, Adobe Acrobat, etc.), below is list of tools to view metadata.
Disclaimer: Please note that to extract and edit metadata, some online platforms might track your online behaviour, store your data or share it with third parties or the authorities. It is important to keep digital privacy and security in mind. There is not enough information available to guarantee the confidentiality of the process.
Compatibility: Windows, Mac OS, and Linux
Proprietary status: Free and open source
This tool comes highly recommended, though it might require some effort to navigate since it depends on command lines. However it is quite comprehensive in the file formats it covers and the outcome it gives. ExifTool allows the user to read, write and edit metadata. The tool's website provides information, downloads and workarounds.
Compatibility: Online, no compatibility issues
Type: Use online through a browser
This is an online tool based on Phil Harvey’s ExifTool, with the option of uploading an image or using the URL of an image online. It does offer a button to be added to Mozilla or Safari allowing a short-cut for a faster extraction of metadata.
Compatibility: Online, no compatibility issues
Type: Use online through a browser
This is an online tool based on Phil Harvey’s ExifTool. It has direct access to DropBox, Flickr and Google Drive. A user can log in from the Exifer website and edit their images directly from there. Exifer has a privacy disclaimer stating that: “pictures will be temporary downloaded just to let you edit them. The temp files will be deleted as soon as you'll refresh the home page of this site, or automatically after 15 minutes from the download time.”
Compatibility: Online, no compatibility issues
Type: Use online through a browser
Compatibility: Android mobile phones
Proprietary status: Free and open source
Type: Mobile phones
CameraV is a mobile App created by The Guardian Project and WITNESS. The V in the App's name stands for verification and it was created to add a large amount of extra metadata to a photograph or video in order to verify its authenticity. This piece of software does two things. First it describes the who, what, when, where, why and how of images and video. Secondly, it establishes a chain of custody that could be pointed to in a court of law.
Compatibility: Linux, Mac OS
Proprietary status: Free and open source
Just as the title suggests, this script allows the extraction of geolocalisation metadata from a bulk of images. It can be a valuable time-saver when processing large numbers of images. The script written by the Exposing the Invisible team members should be placed in a file called geobatch.rb and run in the folder with all the images in it.
Compatibility: Mac OS
Type: Mobile Phone
TrashEXIF is an iPhone App that allows users to strip all metadata from images or to control which metadata should be removed or kept. The App also allows for presetting a protocol to be applied to all images taken.
There are various ways to remove metadata from files. Here are few suggestions taken from the Security in-a-Box toolkit.
You can prevent a specific kind of metadata like GPS location from being captured by:
Switching off wireless and GPS location (under location services) and mobile data (this can be found under data manager -> data delivery).
When taking a photo, make sure that the settings of the tag-location from the photo App is off too.
Using tools like Metanull (for Windows), you can ensure that all metadata is removed before you share it. This tool is discussed in detail below.
Note: Some files like DOCs and PDFs can hold image files within them. If you do not exercise the necessary caution, you can scrub the metadata on the document that is holding the image, but the metadata for the embedded image will be retained! Using Metanull before adding the image to the DOC will remove all metadata from it beforehand.
Removing metadata from documents and other files
As noted above, other commonly used file types such as Portable Document Files (PDFs) or word processing documents created by applications such as Microsoft Office or LibreOffice contain metadata which may include:
the username of the person who created a document
the name of the person who most recently edited saved a document
the date when a document was created and modified.
In some cases, your document might also contain additional personally identifiable information such as addresses, email addresses, government ID, IP addresses or unique identifiers associated with personally identifiable information in another program on your computer.
Some of this information is easily accessible by viewing the file properties (which can be accessed by right-clicking the file icon and selecting properties). Other information or hidden data requires specific software to be viewed. In any case, depending on your context, this information might put you at risk if you are working and exchanging sensitive information.
Removing metadata from PDF files
Windows or MAC OS users can use programs such as Adobe Acrobat XI Pro (for which a trial version is available) to remove or edit the hidden data from PDF files.
Opening any PDF file with Acrobat will allow you to edit the metadata by going to the File menu and then selecting properties. Here, you can modify the document author’s name, title, subject, keywords and any additional metadata. You can remove information about the creation time, modification time, type of device used for creation the file, and other hidden data you don't see by going to the Tools menu, then Protection, and selecting Remove hidden information.
For GNU/Linux users, PDF MOD is a free and open source tool to edit and remove metadata from PDF files. However, it doesn't remove the creation or modification time, nor does it remove the type of device used for creating the PDF.
Removing metadata from LibreOffice documents
In LibreOffice documents, the metadata can be viewed by selecting the File menu, then Properties. Under the General tab, can click Reset to reset the general user data, such as total editing time and revision number. You can also make sure that the Apply user data checkbox on this screen is unchecked, so that the name of the creator is removed. When you are finished, go to the Description and the Custom Properties tabs to clear any data there that you don't want to appear. Finally, click on the Securit** tab and uncheck the *Record change box, if it's not unchecked by default.
Note: If you use the Versions feature, you can delete older versions of the document which may be stored there by going to the File menu and Versions. If you use the Changes feature, go to the Edit menu, then Changes to accept or reject to clear the data relating to changes made to the document at any time, if you no longer need this information.
Other strategies for scrubbing metadata
Some file types contain more metadata than others, so if you don't want to play around with software, and the formatting of a file doesn't matter, you can change files from ones that contain a lot of metadata (such as .DOCs and .JPEGs for example) to ones that don't (.TXTs and .PNGs for example)
Avoid using your real name, address, company or organisation name when registering copies of software such as Microsoft Office, Open Office, Libre Office, Adobe Acrobat and others. If you must give a name or address, use a fake one.
Header image created by John Bumstead