grey and blue lines on dark background

OSINT Tools for Online Political Research: the Context of Argentina

This article was developed by the team of EdiPo (Equipo de Investigación Política / Political Investigations team) at Revista Crisis in Argentina as part of a partnership with Exposing the Invisible. It addresses the importance and role of Open Source Intelligence tools and verification techniques in the areas of political, social and policy research and investigation conducted by people and organisations interested in gathering evidence, exposing wrongdoing and tracing networks of influence affecting our present-day societies. The article focuses on the context of Argentina.

By EdiPo team, Revista Crisis - developed as part of a 2023 partnership with Exposing the Invisible (ETI)

Note: this article uses the terms “research” and “investigation” interchangeably in the context of OSINT techniques.

A context

Investigations into the political realm aim to delineate the “faces” and profiles of contemporary powers as seen through their various manifestations and roles in the present-day debates and struggles. By collaboratively mapping these powers’ composition, mechanisms and alliances, political investigations can help trace changes across time and space, allowing us to create useful knowledge that can fuel the self-defence and reaction strategies of civil society organisations in their quest to expand civil rights or to stop their regression.

In Argentina, this is part of a tradition of social politicisation of research introduced by local human rights movements after the last civil-military dictatorship (1976-1983), giving shape to a meaningful process of democratisation under the slogan that was a cry, a practice and then a method: "Memory, Truth and Justice". Democracy, in spirit and in action, was strengthened by feminism, environmentalism and a diversity of street movements. And, democracy in opposition, every day and on every corner, in the face of the reactionary, authoritarian advance that sought to restrict rights and lifestyles.

Faced with the question of “how to defend ourselves against contemporary violence?”, we create tools, information and research that enrich public debates and strengthen community responses. Whether by focusing on in-depth analyses or through deeper inquiries to map power and influence, political research and investigation seeks to give a name to an existing interdisciplinary experience, linked to the processes of struggle and their temporalities.

Although academic and journalistic investigation / research techniques are fundamental in this context, they are not enough on their own. Territorial (cross-border) interconnections together with collaborative methods of data collection and analysis contribute to a strategic and engaged view of the conflicts. In the tensions between disciplines, contaminated by various ambitions and activist practices, intersecting horizons of intervention and experimentation can emerge.

Investigation is political not only because of its object and the collective, interdisciplinary and networked nature of its dynamics, but also because it assumes the territory of information as a strategic field of debate.

It seeks to develop collective and connective research skills to identify, document and report abuses, and to defend oneself while gathering evidence. It supports the development of critical perspectives, alternative narratives and strategic interventions in a context where information, subjected to a trend of monetisation and appropriation in the hands of a few, is a source of speculation and manipulation.

How can we build autonomous ways of information access, re-appropriation and processing capacities that provide democratic alternatives to the capitalist valuation of data?

The struggle for real data democratisation, so that it becomes a common good and not a commodity, should not only be fought at the legislative or state level, but also through concrete grassroots initiatives.

Adversaries may become blurry amidst abstract webs of data and connections, but the effects of the abuses that mark people’s existence today are not. They are evident and palpable.

Unlike the secrecy typical of state (or para-state) intelligence linked to a central and hierarchical apparatus, political research makes socially relevant data openly available through collaborative approaches via horizontal networks, and supported by freely accessible tech tools. Its methods are characterised by a democratization and decentralisation of information.

Today, practically anyone with a cell phone and basic knowledge of photography, video and audio data collection can contribute to political research experiences. Technology has democratized the act of documenting and reporting, and has contributed to more people being able to investigate matters and events in-depth, anywhere in the world. There is a great amount of public interest information that can be collected and processed with accessible tools and techniques.

In this context, new forms of commitment and participation emerge. With our own commitment to collaborative research practices, we will focus here on exploring the vast ocean of information scattered on the internet through so-called OSINT (open source intelligence) techniques. But this is not the only resource that fuels political research, whose sources also extend to the territorial knowledge embodied in local communities, specialized technical or academic sources and, when available, public or private databases, documentation from court cases and materials amassed by institutional archives.

With the support of Tactical Tech, we share some useful resources for online research based on the "Exposing the Invisible" Kit, which we recommend for more in-depth browsing. In a context of information saturation and constant tension, we aim to strengthen our societies’ abilities to question information that may be false, find more information when it is scarce, and filter information when it becomes overwhelming.

OSINT: Diving into an ocean of information

Open Source Intelligence (OSINT) is a professionalization of a basic concept related to free or low-cost information, tools or media that can be accessed, reviewed and used by average people, without licenses or active permits. Intelligence, in this case, means information, data, knowledge obtained from openly available sources.

OSINT is not just information available by any means. It refers to information legally accessible by a member of the public, like a Mercado Libre review, an unprotected tweet, an unsealed court document, or a construction site’s activities seen from the street. OSINT also describes information leaked to the public, like the Panama Papers or information published by Wikileaks.org.

Legally accessible information means something different depending on where you live and who you are.

An important aspect of OSINT research is that it relies on creating overlaps and meshing knowledge from different sources. Each source creates deeper understanding and further develops in a wider context. Individual pieces of information may not be accessible to everyone, but combining enough different sources can help develop seemingly useless data into meaningful results.

Many governments and corporations are required to publicly register information, but this often happens in decentralized ways, thus making the data less accessible, hard to index and difficult to trace historically and geographically, across regions and institutions (even when these institutions are be connected.) The information is always most powerful when it can be brought together – combining datasets that have mutual importance.

Let’s remind ourselves that with OSINT (as with many other kinds of information), there is a tension between the seemingly obvious “good” of open, transparent and free information and the “good” of wanting to protect personal, sensitive data of individuals from exploitation. Free and open data about companies, governments and the environment may be a human right, just as many people believe that personal privacy is a human right. Perhaps this tension won’t be resolved until data protections for corporations and the assets they often hide can be properly separated from those for natural persons.

OSINT investigations often require creativity and intuition: seeing where different datasets slot together, imagining where we might find something useful, or pursuing new sources even when we’ve found a dead end, time and time again. Sometimes the information we want or need simply is not OSINT – it happens. But layering information, pulling in additional context, adding seemingly small details to our pool of data, and looking in unusual places can usually get us where we need to be. OSINT resources can lock into each other like puzzle pieces or can develop a picture like photo chemicals when used all together. Seemingly useless or irrelevant open data can be transformed into extremely powerful tools and used to further important investigations using OSINT research techniques.

We know none of this is easy. Turning information into evidence, and then into knowledge, action and justice requires time, effort, persistence and courage. But we believe this endeavour is necessary. Investigating the issues that define our here and now, as well as our future, is vital.

Dorking

Dorking essentially refers to advanced online searches – using search engines to their full potential to unearth results that are not visible with a regular search. (A “regular search” is a regular way of asking for information, either by typing a full question: "how to make pancakes?", or choosing keywords: "pancakes recipe".)

With a few keywords and search “operators”, it is possible to refine a query, increasing the levels of accuracy when browsing web pages, databases and documents available online. Uncovering hidden files and various websites’ flaws through the use of dorking doesn't require a great deal of technical knowledge, but rather learning a few techniques and using them across various search engines.

How does “Dorking” work?

A “dork” refines a query, by combining technical and semantic elements, in order to take full advantage of the fact that web content is being constantly scanned and indexed by machines.

All you need to carry out a “dork” is a computer, an internet connection and a basic understanding of the appropriate search syntax: keywords and symbols (sometimes called “prefix operators” or “filters”) that you can use to refine your search results. To do so effectively, however, you may also need persistence, creativity, patience and luck.

A “prefix operator” is a special text that is added before the searched text in a search bar.

In practice

Here are some examples of dorks based on “prefix operators.”

For instance, site:https://www.worldbank.org filetype:pdf will look for all the .pdf files on the World Bank site. It will thus filter only those file types for you to investigate further.

The following is a simple example of a dork that does rely on a prefix operator. It will search https://tacticaltech.org for all indexed PDF files hosted on that domain.

  • site:tacticaltech.org filetype:pdf

Another example, which returns all websites under the tacticaltech.org domain that have the word “invisible” in their titles, might look like this:

  • site:tacticaltech.org intitle:invisible

If you need to use a search term that contains multiple words, you can surround them with quotation marks:

  • site:tacticaltech.org intext:exposing intitle:“the invisible”

Dorks can also be paired with a general search term. For example:

  • exposing site:tacticaltech.org, or
  • exposing site:tacticaltech.org filetype:pdf

Here, ‘exposing’ is the general search term, and the filters site: and filetype: narrow down the results.

NOTE that Each filter keyword ends with a colon (:) and is followed by the relevant search term or terms - with no space before or after the colon!

With great information access comes great ethical responsibility. While you can use these techniques, in a responsible manner, to extend your investigations, others can use them to obtain personal data or exploit vulnerabilities. As is often the case, intentions matter.

Below, we included the most widely used search engines and the most common and useful “dorks” that go with them.

Our preferred service is DuckDuckGo, a privacy-focused search engine, which claims not to collect personal information about its users and stores search queries in such a way that they cannot be attributed to specific users. That said, if you are doing sensitive research, it makes sense to use the Tor Browser in combination with DuckDuckGo, to further protect your privacy. And fortunately, DuckDuckGo is much less likely than Google to block Tor users or make them solve CAPTCHAs.

The order of the terms and filters in this table is important for some search engines, it is advisable to try different combinations for more accurate or relevant results.

Note that this list might not be exhaustive, but the operators should help you get started.

Table: Dorking operators for Google, DuckDuckGo, Yahoo and Bing

Dork Description  Google   DuckDuckGo   Yahoo   Bing 
cache:[url] Shows the version of the web page from the search engine’s cache.
related:[url] Finds web pages that are similar to the specified web page.
info:[url] Presents some information that Google has about a web page, including similar pages, the cached version of the page, and sites linking to the page.
site:[url] Finds pages only within a particular domain and all its subdomains.
intitle:[text] or allintitle:[text] Finds pages that include a specific keyword as part of the indexed title tag. You must include a space between the colon and the query for the operator to work in Bing.
allinurl:[text] Finds pages that include a specific keyword as part of their indexed URLs.
meta:[text] Finds pages that contain the specific keyword in the meta tags.  ✓
filetype:[file extension] Searches for specific file types.
intext:[text], allintext:[text], inbody:[text] Searches text of page. For Bing and Yahoo the query is inbody:[text]. For DuckDuckGo the query is intext:[text]. For Google either intext:[text] or allintext:[text] can be used.
inanchor:[text] Search link anchor text
location:[iso code] or loc:[iso code], region:[region code] Search for specific region. For Bing use location:[iso code] or loc:[iso code] and for DuckDuckGo use region:[iso code].An iso location code is a short code for a country for example, Egypt is eg and USA is us. https://en.wikipedia.org/wiki/ISO_3166-1
contains:[text] Identifies sites that contain links to filetypes specified (i.e. contains:pdf)
altloc:[iso code] Searches for location in addition to one specified by language of site (i.e. pt-us or en-us)
feed:[feed type, i.e. rss] Find RSS feed related to search term
hasfeed:[url] Finds webpages that contain both the term or terms for which you are querying and one or more RSS or Atom feeds.   ✓     ✓
ip:[ip address] Find sites hosted by a specific ip address
language:[language code] Returns websites that match the search term in a specified language
book:[title] Searches for book titles related to keywords
maps:[location] Searches for maps related to keywords
linkfromdomain:[url] Shows websites whose links are mentioned in the specified url (with errors)

For more examples, check: "Files Containing Juicy Info" from https://www.exploit-db.com/google-hacking-database

Defensive dorking

You can use dorking to protect your own data and to defend websites for which you are responsible. This is called “defensive dorking,” and it typically takes one of two forms:

  1. Checking for security vulnerabilities in an online service, such as a website or an FTP server that you administer. The Google Hacking Database (GHDB) suggests various keywords and other terms that you can use - along with the site:yoursite.org filter in order to identify certain vulnerabilities. While these searches may help attackers locate vulnerable services, they also help website administrators protect their own.

  2. Searching for sensitive information about yourself - or about someone else, with their permission - that might be exposed unintentionally on a website, regardless of whether or not you administer that website. To look for sensitive information, you can start with the following simple commands, along with the site:yoursite.org filter. You can then remove the site: filter to discover which other websites might be exposing information about you or your organisation. Below are a few examples.

If you search for your name or address and then your ID number, you're giving this information to whoever runs the search engine. Even Tor can't protect you from this type of privacy leak.

  • You can search for your name in PDF documents with: <your name> filetype:pdf

  • You can repeat this search with other potentially relevant filetypes, such as xls, xlsx, doc, docx, ods or odt. You can even look for several different file types in one search: <your name> filetype:pdf OR filetype:xlsx OR filetype:docx

  • Or you can search for your name in regular website content with something like the following (see the table above for information about whether your search engine of choice uses intext: or inbody: as the text-searching filter): <your name> intext:”<personal information like a phone number or address>”

  • You can also search for information associated with the IP address of your servers: ip:[your server’s IP address] filetype:pdf

For more examples, have a look at Exploit Database’s list of Files Containing Juicy Info.

Not all advanced search techniques rely on prefix filters or operators like those shown above.

  • Adding quotation marks ("...") tells most search engines to match an exact phrase (e.g. "farmacias de guardia en Buenos Aires" / "pharmacies on duty in Buenos Aires")

  • Placing an all-caps OR (O in Spanish) between search terms tells the search engine to return results with either term ("farmacias O droguerías en Buenos Aires” / “pharmacies or drugstores in Buenos Aires”.)

  • Adding an asterisk * in a quoted phrase works as a wildcard giving the option to the search engine to include results that have other characters or another arbitrary term in place of the asterisk (e.g. "pharmac* on duty in Buenos Aires" / "farma* de turno en Buenos Aires".) An asterisk replaces multiple characters.

  • Adding - (minus) in front of a word excludes results with that term (e.g: "pharmacies on duty in Buenos Aires -homeopathy") – a minus (-) acts as NOT while a plus sign (+) acts as AND.

Resources

Verification

Don't take anything for granted and question your own assumptions.

Information can take various forms and appear on various mediums. Images, videos, sound, written testimonies and webpages are just a few. For each medium there are tools, techniques and tricks that help with the process of verification. You do not need to learn everything about every medium before you start investigating. There will be plenty to discover as you go along, and you will also be able to learn through practice.

It helps to always start with the following question.

What do you know with 100 per cent certainty?

If it’s an image:

  • Do you know who took it? When was it taken? On what camera? In what place? How do you know it hasn’t been modified? Could anyone try to mislead you by providing this image?

  • Can you truly confirm that subjects and events shown in an image are from the specific day, time, place and incident you are investigating?

  • Is the issue contentious? Are there sides that have something to gain by spreading this information? There could be groups spreading false information hoping that someone might give them attention without checking.

It is important to save and archive the information as soon as you find it. Before verifying, download the photo or video and take a screenshot of the post (if it appeared on a social media, a website or a messaging app for instance). Its author could delete it if they realize they made a mistake or notice that they have been exposed.

There are three main phases of verifying information:

  • Verifying the source - Where you got the information and where it originated.

  • Verifying the content - Whether it is exactly what it claims to be.

  • Verifying its relevance - Whether it fits in to your investigation.

Resources

To expand your verification techniques, you can check the following resources:

Archiving online content

Websites get taken down, webpage contents change and valuable links break causing you to lose track of information you might have seen online in the past. Fortunately, there are some easy ways to retrieve old online content and deleted webpages so you can still use them in your research. Various methods and tools allow you to also save currently accessible pages so you can access them later, even if they get modified or deleted at some point in the future.

There are several services that automatically archive older versions of websites. In addition to content, these digital archives often contain information that can help you identify other important data, such as the owner of a website, useful names, contact details, documents, and links to other websites.

Some of these services allow you to contribute to the list of websites they archive by manually saving webpages at a time of your choice, so you and others can retrieve copies and screenshots of those webpages and websites later.

Resources

Important tools that help you retrieve historical online content, and archive online content for future reference are:

  • The Internet Archive - the world's largest digital archive, allows you to search and retrieve previously archived content.

  • WayBack Machine - an Internet Archive tool that allows you to manually save and archive web pages online and retrieve archived websites. It has some shortcomings with archiving and displaying images and graphics from webpages but it can take screenshots of such pages if you manually select the option when saving specific webpages.

  • Archive Today - An online tool for archiving and retrieving archived online content. It works similarly to Wayback Machine and it automatically takes screenshots of webpages, thus images and graphics are easier to preserve. Unlike Wayback Machine, Archive Today saves snapshots only on demand, meaning webpages that people preserve manually.

  • For an overview of how to archive and retrieve archived content on the internet, read the guide "Retrieving and Archiving Information from Websites" from Exposing the Invisible: The Kit.

Geolocation

There are plenty of guides and tutorials for learning and practicing how to geolocate photos and videos, and to identify and verify images from a specific location or event. The following are a good starting point:

Geolocation involves plenty of observational capacity to identify the location of photographs and videos, patience and some research skills to identify keywords and image elements to search for in order to identify possible locations. While searching, we can resort to a variety of free online tools to geolocate events and places such as:

Further Resources

Analysing and preserving metadata

Metadata is information that describes properties of a file, be it image, document, sound recording, map etc. For example the contents of an image are the visible elements in it, while the date the image was taken, the location and device it was taken on, are called metadata.

Metadata (of photographs, videos, documents and other types of files) can be easily altered and is thus vulnerable as evidence since it’s not always easy to prove, verify and 100% confirm its reliability when it comes to the original source, or other details such as date or time.

Below are some tools that can help view, verify, edit or safely record and preserve original image metadata.

Resources

Tools to safely capture, preserve and share metadata of your images to use as evidence:

  • eyeWitness: https://www.eyewitness.global/ – recording app to capture verifiable photo and video documenting abuses.

  • Save: https://open-archive.org/save – app designed to help you store and share mobile media while ensuring your identity remains protected. Free, open-source, and available for iOS and Android.

  • Proofmode: https://proofmode.org/ - app that turns your photos and videos into secure, signed visual evidence.

Tools to view, verify or edit image metadata:

  • Phil Harvey’s Exiftool: https://exiftool.org/ – metadata viewer and editor available for download and use on your own computer. Apart from reading metadata, this tool allows you to read, write and edit metadata from photos and videos. It is safer to use when dealing with sensitive material, as opposed to uploading your images online for metadata viewing.

  • Image Verification Assistant: https://mever.iti.gr/forensics/index.html – a tool for image verification on the web.

  • Fotoforensics: https://fotoforensics.com/ – online image analysis tool, for metadata checks and information on whether an image has been altered (watch out when uploading images for checks – do not do so if using sensitive material or trying to stay digitally undetected.)

Reverse image search tools to try and identify the source of an image:

Be careful when uploading images to the web for verification. It is not recommended to upload compromising images or if you intend to maintain digital anonymity.


Credits

Author: EdiPo / Revista Crisis

  • Content prepared by EdIPo (Political Research Team) at Revista Crisis (Argentina) with the support of Tactical Tech, June-July 2023.

  • Translated from Spanish, and adapted by Exposing the Invisible, Tactical Tech.

Published under Creative Commons (CC) license Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) https://creativecommons.org/licenses/by-sa/4.0/.

More about this topic