Smari McCarthy: Making data speak

Smari McCarthy talks about how to work with data to create knowledge from information, how when it comes to information there is never a “one size fits all” approach, and why we all need to learn to code.

Smari has many hats. His default one is as Director of the International Modern Media Institute, but sometimes he is a member of the Icelandic Pirate Party and sometimes a member of the Peer to Peer Foundation. Sometimes he is the Icelandic Digital Freedom Society, and sometimes he is an information activist, or a hacker...

Here he talks about how to work with data to create knowledge from information, how when it comes to information there is never a “one size fits all” approach, and why we all need to learn to code.

So which one is your favourite hat?

All of the hats are really hats. They're interfaces between me and some parts of reality. I have all of this stuff going on in my head and it makes perfect sense to me. But when I'm exposing myself to reality, I need to sometimes have a particular kind of outward facing interface that helps people understand me. There's this notion of legibility that James C. Scott has written quite a bit about that I really like. And others have expanded on it, talking about illegible people and especially how people who do this kind of information activism work are often completely illegible to the world around them. There's not much to enable people to understand or get an easy grasp on what we do.

And if you were to introduce yourself to your grandmother's friends?

If I had a grandmother and she were to have friends, I'm not sure. I would probably just say that I work in information, or computers or maybe politics. Simple one-word things. It's the same way as people always expect; you know...  'carpenter', 'or plumber', or 'lawyer', or 'policeman. Simple, single words.

Do you often work with data?

In a way, everything I do is working with data. But there are different levels of directness. Sometimes I'm literally taking databases or datasets and trying to extract some kind of information from them. Other times I'm just taking knowledge which has already been acquired and pushing it through some kind of process, either computational or political or social, or whatever. But there's always a purpose.

What exactly do you do with data?

My big goal, the thing I am trying to accomplish with all of the things that I do, is to make society better. "Better", in this working model that I'm operating on, is a society where people have more ability, more rights, more freedom to make up their own minds about how they want to live their lives, and more freedom to make decisions about things that affect them.

In order to do that, you need two things. First you need the information required to make enlightened decisions, and then you need the processes and the abilities to actually make the decisions. So, first you need transparency, and then you need democracy. And these mash pretty well together. You can't say that one comes before the other because there's always this interlock. But moving those two forward is a big goal.

What's the process for you of using data to enable those things?

It depends on the data. Sometimes it's something simple. When somebody tells me something is happening, the reaction might be to try to seek more information to capture and inform others about it. It might be to work through it in various ways. It completely depends on the context and what the data is.

When you go through data, what are the decisions you need to make?

In order to work with data, you need to understand what the data is trying to represent. One of the first things you do when you get any sort of dataset is try and understand the conditions under which the dataset came into existence. There's always a story behind the data. There's always some kind of narrative. And I think that humans understand the world best through stories and narratives. That's the way our brains have evolved to think about things.

So getting the context, understanding who is generating the data, why, what the errors can be, what kind of faults and flaws there might be, what's being omitted and what isn't, what's very explicitly being kept accurate, and so on...  Once you have that information you can start to build out and decide whether you need to go into more information collection or try to do some . Sometimes you can directly act on what you have - write a report, write an e-mail, make a poster, somehow interact with the world based on the knowledge you've extracted. But there's never a 'one size fits all' approach to dealing with data.

Is there a difference for you between interpreting data and curating the data?

Yes. Interpretation of data is when you try to take data and use it for some specific goal. Curation of data is rather the maintenance of a set of data. Or it could be the collection of data. Or just managing a growing volume.

It's like the distinction between the archivist and the historian. The historian goes through the archives searching for some piece of evidence, some piece of understanding that helps explains history. Whereas the archivist isn't really that concerned with the history, or exactly what happened, but merely in the accurate cataloguing of what exists and the protective storage of that information.

What are the unexpected results of opening the data?

A lot of problems come from the fact that we have never seen as much data as is coming into existence right now. Every hour now humans are creating more data than they created during the first million years of humanity... or hundreds of thousands at least. We haven't figured out a lot of things about how to manage this amount of data and there's a lot of surprising statistical anomalies that come up when you have very large datasets, such as the tendency towards false positives and false negatives. These kind of freakish anomalies can do a lot of harm because we don't know how to treat them yet. And even though within certain statistical circles, or in academic circles, there are people who understand these weirdnesses where they exist, it hasn't become common knowledge yet. People just haven't gotten to the level of managing to fathom what it means to live in a world where petabytes of data are a commonplace occurrence.

That means that has completely changed. There are examples of anonymised datasets that have been scrubbed clean of any identifying information that, once put into some kind of statistical correlation and some filters, could pinpoint back to the correct individuals. That means we need to come up with a practical definition of differential anonymity - essentially some way of measuring what the likelihood is of a particular dataset that has been anonymised being able to identify a particular individual. And also some common rules about how, as a society, we deal with information that can be harmful to individuals. \ \ On the flip side, we can do amazing things with all this information. The risk of violations of privacy and threats to various values that we might hold should not be a deterrent from investigating and exploring. We should be mindful of privacy and we should do everything we can to protect it. But at the end of the day, we should be using all this newfound information to better our society, and partially just to make better decisions. And we can't do that unless we're informed.

What role does context play for both the archivist and the curator?

The archivist uses context to do better cataloguing. And that's where plays a big role in understanding the tags, the folders or organisational structure; how you might structure a database or how you might structure a file set. Context for the historian, on the other hand, is more an approach of -  when did this happen? Why did it happen? Who was involved? What events occurred during the process of this data coming into existence that informs the decisions that were then taken?

And is there a difference when you archive something that has already happened versus something that is happening right now?

Well, mostly in terms of sensitivity. When you're archiving something that is an ongoing process (for instance video documentation of a conflict, or even just collecting statistics on people injured, people killed), then your key obligation is towards accuracy. But there might be a political element that's altering the way in which the data can be collected. And of course depending on who's collecting the data, the politics might influence the actual outcome of the data. But when you are delving into historical data, a lot of it will exist in multiple sources, and there you might focus on comparing and trying to cull the errors that were brought in through the various political elements over the course of the collection.

What about the negative side of data, where there are missing pieces, or only some elements of the data are used as fact?

There's a common reference to Donald Rumsfeld that's interesting - where he talked about “known knowns”, “known unknowns”, “unknown knowns”, and “unknown unknowns”. When we look at the data set, especially if it's structured data, it's a lot harder to detect than less structured data. It sometimes jumps out at you, what omissions have been made. So very explicitly, people have just not collected certain information. And sometimes that is a very important thing. For instance if you are dealing with a data set relating to medical issues that people have, the names of those people being omitted is a part of an attempt to protect their privacy. But then you can do all sorts of differential analysis on data, and maybe the anonymity that is being provided can be eradicated through the useful use of the “known knowns”. But sometimes we just appreciate that the “known unknowns”, such as a name, are not there.

But sometimes we gain a certain amount of information from the fact that they're not there, that people have either intentionally dropped them for political reasons or for privacy reasons. Or simply, they made some kind of enlightened decision based on difficulty, based on access, based on the likelihood of errors to just not include that in the data. And there's a lot of meta-contextual reasoning that we never really got along with the dataset, that really the person who has created the data should be documenting as a part of the creation.

What's metadata?

Metadata is information that explains information. So metadata on a photo is, for instance, information about how big the photo is and its dimensions, what kind of shutter settings were being used on the camera, sometimes the make of the camera, sometimes the serial number of the camera, or the geographical location where the picture was taken. Virtually every information source has metadata. Sometimes it's very explicit and created as part of the documentation process of creating the data. So PDF files, images and Word documents all have some metadata associated with them unless it's been intentionally struck.

Then there's the implicit metadata, generally referred to as 'noise' or 'artefacts', which wasn't added intentionally as a way of documenting the information, but comes as part of the side effect of the process by which information was captured. For example when you take a picture with a digital camera, the CCD chip that captures the image always has slight manufacturing flaws, and these flaws are unique per CCD chip so no two CCD chips are identical. That means that if you take multiple pictures with the same camera they always have the same kind of noise pattern. It's invisible to the human eye, but to artificial intelligence that's skimming through lots and lots of pictures, it would stand out as something that might be able to identify multiple pictures coming from the same camera.

That's such a concrete an example of metadata that nobody really notices, except those who are trying to learn things about the data they have that aren't really supposed to be learn-able. So metadata is in some cases our best friend - it helps with searching, it helps with indexing, it helps with understanding the context of the information. But sometimes it's our worst enemy.

In some scientific circles there's an opinion that the role of the narrator should just be to put the pieces together, because the data should speak for itself. Do you agree with that?

Both in journalistic and scientific investigations, one of the main outputs is an interpretation of the data. And that's really kind of key to creating value. That's really the step that takes data and turns it into information. Staring at a spreadsheet full of numbers is very unenlightening to most people. But if those numbers are taken and put into some kind of context, maybe visualised in a useful way or a story told about them, then suddenly we can become informed. And that's fine. That's why we have journalists. That's why we have scientists. That's how knowledge is created.

The thing that's missing very frequently (and this is something that has been talked about both in scientific and journalistic circles over the last couple of years) is that very frequently we get to see the interpretation, but we don't get to see the data that was used to create the interpretation. And that means that we can't independently verify whether the interpretation is sensible or meaningful or correct. We can really only hope that they've got it right and move on. From a scientific perspective, that's completely against the entire idea of scientific method. We need things that can be independently verified. For journalism, that means that suddenly the Fourth Estate has a power of interpretation that they're not granting to the readers, the people who are further down the chain.

Publishing the datasets and making sure that everybody can independently create the same story, or alternatively come up with a different story around the same data, that's a really important thing. Hopefully we'll see more of that.

When we talk about data it's mostly numerical, something you can count. But when we think about evidence we think of other formats - a video, an interview, a photograph, testimonial, recording etc. Can you elaborate on the difference between data and evidence. Which is which and what are the confusions?

I would say there are really three things here: there's data, there's information, and then there's evidence.

Data, when granted interpretation, becomes information. And information that has a proof value assigned to it becomes evidence. So when you're trying to prove something, information can become evidence of the truth or the falsity of some statement. That's how we use it in a legal context, and how we use it in journalistic contexts and forensics and so on. The creation of evidence, and I don't mean in a malicious way, is essentially the same as the creation of data, except we're applying a test to it. We're saying - "here's some information we have gained from understanding this data. Now we want to test some hypothesis." Is this evidence of the truth or falsity of this hypothesis? If it is, then it's evidence.

Can you create data in that context? Data is out there on a universal level...

It's relatively easy to generate data. Then we get into some interesting philosophical and metaphysical discussions about the nature of the universe and whether the universe itself is just information, or event potential, and we're just capturing stuff that was in some way inevitable. But the reality is, we generate data all the time.

When I travel somewhere, the path I took is data. Whether that data is captured and stored is another question. And whether we then use that data to create some information is a third question. So we can reliably create data in multiple fields. The scientific method is, to a very large degree, based on the idea of generating data to create information to support or reject a hypothesis. And in a sense, that's what journalism is about, as well - collecting and documenting data that has been generated by others, and sometimes generating new data through interviews, investigations, and so on.

What's the importance of data and evidence in activism?

There's a somewhat tedious tendency amongst some activists to have an opinion, and push for that opinion regardless of any evidence to the contrary, any better explanation. Thankfully I think it's becoming more and more common in activist scenes to want to get cold hard facts: clear evidence, clear data, clear information that supports their idea, the thing that they are fighting for. And that can occasionally mean that people might find out that their cause is just not well founded. But so be it, right?

When we start using data in activism, what we're doing is providing a set rigour to it. We're applying rigour to the action of activism. That means it's much harder for authorities or higher powers to contradict us on the basis of propaganda. We eliminate a lot of lies and truths and propaganda just by making sure that everything we say is backed up by evidence. That's a growing trend and, even over the last decade, it's noticeable how activists have become more data-centric. They are doing things like data meetups, data dumps, sharing large files among themselves, with the intent of being well informed and having the best data that's available to them. Whereas ten years ago, the rumour mill had a lot more weight in the discussion. And often rumour mills generate the false positives and the false negatives.

So, how do you build a narrative around evidence and data?

That comes back to context. When you're building a narrative around information that you have, you need to decide whether you're going to be focusing on the actors, the circumstances, or the timeline. And then you just put down the data elements in some kind of structured way that is easy for humans to look at and understand. Sometimes people come to you with amazing stories about all sorts of crazy things happening. And if it's put to you the wrong way, it might sound crazy, or just confusing, and you might lose interest. But if you manage to organise it in such a way that even the laymen can come in and see a logical progression of events through time, or logical connections between people, or logical connections between scientific measurements - any kind of pattern can start to enlighten us and help us understand what's going on and why.

How would you shape a narrative for different kinds of audiences?

Coming back to the history of man, the way information was transmitted between people for the greatest part of the existence of humanity has been through the telling of tales - the sharing of stories around the camp fire, and so on. Because of that, one of the best methods for exchanging information is still just building a story to tell. You might be doing it for educational purposes, you might be doing it because you're trying to get a message out. But the narrative cannot be divorced from the humans that are telling it, and the humans who are listening to it. There's always this human element when we have narratives and I don't know if there's any other way to invoke narratives than in the contextualisation of data.

When we look at the journalistic world or the investigative world, the data is collected from a certain group of people but then the story is told to a completely different group of people. Does that matter?

When we see stories about wars, or stories about the heroes of the olden days, or whatever it is, it's always stories of somebody being told to somebody. And there are lots of reasons why you might want to tell that story. Normally, there is some kind of information. There might be some kind of entertainment value. And that's entirely fine, that's how we learn. I don't think the fact that the data is generated by different people than those who are the recipients of the story is a problem. I think it might actually be the highest goal of good data.

So if you try to influence somebody, inform their decisions and turn them toward your world view, what are the key elements of narrative you would use?

I'm not as good a storyteller as I would like to be. But there are lots of rhetorical mechanisms that people use, lots of traditional plot lines that we like to put things into. There's a thing called the Aarne-Thompson Catalogue of story types, which helps to put a structure around what exactly is happening in each particular story. Once you realise that every single story has more or less the same elements (there's often a protagonist, there's antagonist, a start and a finish, and all sorts of events along the way), then building a narrative is merely the act of taking the information that you've created and putting it into some kind of prepared way of telling a particular story.

Depending on which story you choose, and how you've aligned the information with the different roles that are being played in the story, you might be telling two very different stories. That's where manipulation comes in. What is the intent of the person that's telling you the story, or transmitting the narrative to you? Is it to enlighten you about facts? Is it to change your opinion or alter the way you think about a particular subject? Is it to misinform you, or mislead you? That comes back to why we need the raw data. We need to be able to look at the same information, and ask - os this narrative sound? Does it make sense? And if it doesn't, then we can build our own story.

There's a generation of people growing up around data who are less and less in favour of specialised story tellers, who think they can make their own story out of information. How important is it to curate and present information in a way that enables the listener and viewer to explore it for themselves, instead of being mere passive recipients?

We always say that history is written by the victors, the people who won. That's a somewhat cynical understanding of humanity, although it seems to hold true for much of history. When you look at lots of the common narratives around us, we must conclude that they were written by the people that won in that particular argument. If we want to build a future which is better for humans, then we need to make sure that everybody wins. One of the ways of doing that is making sure everybody has the ability to write history. So, gather the information together, make it available to everybody, and allow them to figure out their own story line. Most of the time, they'll come out with the same story, if the facts are accurate enough.

One of the really wonderful things in looking at social media like Twitter is that you see these very coherent narratives coming up from a million voices having concurrent communication with each other. It's pretty amazing how fast the truth emerges on top of the pile. Misleading facts get weeded out. That's where we need to go.

You observed that there's a whole generation of people on social media, on the Internet, so they can update instantly. For them a certain digital space enables them to occupy a certain territory that didn't exist before. And every moment when these people move outside of the virtual space into the real space they fail, principally because they're being stopped from replicating their free model of self-organising. What's your take on that?

A lot of the methods of self-organisation and governance that appear on the Internet are possible because of a couple of physical properties. One is that humans have not yet fully understood the economics of post-scarcity. Post-scarcity economics functions completely differently when suddenly everybody can have a voice in a conversation. There is no blocking, for instance. When we're having a conversation in real life, a lot of blocking occurs, in the sense that we can't all talk at the same time.

As a result Occupy, for instance, comes up with a lots of hand signals to try and mediate non-blocking communication of some form without losing the benefits of the voice channel. But at the same time all of these things are really new, and the people who are doing them don't fully grasp what the governance model of the future is going to look like. We know that we want it to be much more democratic, in the spirit of the Internet.

We try to learn some lessons from the Internet, but there's no one-to-one mapping. One of the things that is a challenge for the next couple of years is understanding what exactly doesn't work about the Occupy model, about the various direct democracy ideas that have been popping up all over the place, and getting a clear grip on governance as a concept. When we've understood how governance should work in a networked society, outside of the network itself and in a real world where we have real conflict, and real violence, and real scarcity, then we're certain to win. But just taking governance models from the Internet and placing them wholesale on society is probably not going to work.

What are the key properties of the Internet, or virtual space, that are important for activists? I'm thinking of things like protocols and network neutrality, the layer of services on top of the internet...

When the Internet started to exist, we started with a free-for-all, fairly anarchistic model of governance. Everybody was welcome to participate if they could dial into a friend or connect to somebody else on the net. And that worked very well for a very long time. But governments have always existed alongside the Internet. At the beginning they were more worried about the side effects of the Internet spilling over into real life - computer fraud, bank fraud, disruptions of telephony, and so on. But as time has passed, and more and more people are using the Internet, now that it's an 11-trillion-dollar-a-year economy, governments have realised that controlling the Internet is equivalent to controlling real life. The way you maintain control over a society that is demanding more and more democracy is to make sure that their ability to communicate is regulated by some kind of government entity, or at least something that is under the control of governments directly or indirectly. So when we talk about things like net neutrality, and self-hosting data, and demanding that internet service providers have liability limitations and that kind of thing, what we're doing is we're saying we need to limit the influence of government over the Internet itself.

As an example, about a week ago I received an email from Google containing court orders issued two years ago, demanding that Google hand over all the metadata associated with my account to an American court. This could not have happened without me knowing it if I were in control of all of my data myself. But because I am using Google services (although I don't use them for a lot of things, and they probably found nothing), the fact that I'm not in control of that data source, that I do not have sovereignty over it, means that the ability of government is vastly strengthened. So thinking of the value of the Internet in terms of activism, we like the it because it is still a mostly unregulated playground. We can do things on the internet that we would be killed for if we tried to do in real life.

That's pretty awesome. That means that we have a lever, a crow bar, a mechanism by which we can start to pry governance over our own lives out of the cold dead hands of government. And hopefully at the end of the day we will win and they will not. It's not necessarily that governments are anti-people or there must be a disconnect between people and their governments. But historically it has developed that way and we haven't managed to take it back. So we might come up with new governments, and we might come up with new methods of governance, and in fact we're probably going to have to. But while we have these old modes of governance which are highly corrupt, highly controlled by all sorts of vested interests, our ability to take control of our own lives is limited to where we can find wriggle space. The Internet is the biggest wriggle space in the world.

What would be your advice to activists on how to use this space?

My advice to activists would be - encrypt everything. Publish carefully but publish a lot. And try to understand the power dynamics and governance models of the services that you're using, who controls who controls what controls who... And make sure that the thing that's being controlled is not you.

What do you think is over-hyped about information and data?

Some people attribute all sorts of mythological properties to information and to computing, that don't actually exist. It is just a theology of the Internet slowly emerging and a theology of information. People hoping that it will solve problems that information alone cannot solve. Sometimes people come with these strange requests about whether something is possible, and you have to say that actually that's not how the universe works. But mostly its fairly okay still. Most of the places where people are attributing these mythical properties has to do with the fact that they don't necessarily understand the technology behind it. And if we just have these open discussions about how the technology works, and pull people up to a higher technological level, then I think much as with religion, these kind of mythologies of information will slowly erode.

Can you give me an example of this kind of mythological property?

One has to do with language. I'm a big command line user. I use the command line for most of the things I do on computers. Most people don't like command lines, they like 'clicky' windows. And often people say, - "I saw that you did this amazing thing, would you teach me how to do it?", and they may hand over a computer which doesn't have a command line interface. I say – "well, I can't do it because nobody created the menu items that you need to click on in order to perform the sequence of events and get this output, and you need to sit and learn how to code. You need to understand how the computer works."

This, when put to older people,  would be more along the lines of "oh, the computer decided to do this" - the anthropomorphisation of the device, and not understanding that the computer does exactly what it's told to. So when we're using these technologies and interfacing with information systems, there are linguistic properties that come in. One of them is simply that the graphical user interfaces are not an expressive language. There are things that just cannot be said by clicking on different parts of the screen in sequence. You need a more complicated grammar in order to do certain things, and no amount of development in user interfaces is going to make up for the grammar that's lost by switching away from telling the computer very directly what to do. So when people say "I'm not gonna learn that, because I can do it the other way," very frequently they're just plain wrong.

What are the implications of not understanding how the technology works, and relying on clickable menus, as many activists do?

An example I sometimes give people is this. Imagine I gave you ten thousand files and in each of those files are ten thousand lines, and I ask you to tell me how many of those lines in those ten thousand files begin with the letter M? If you're using a graphical user interface, the thing you do is to open each of these ten thousand files individually and then perform a search function through the user interface. And this means you're opening ten thousand files and doing the exact same thing. I can type in a roughly 30-character command and get the result within two or three seconds. That's not just because I know my way around the command line, it's because the command line has no bounds, it's a more expressive language. But the more interesting thing here is not that I can express that, but rather that it would never actually occur to the person who's never used the command line that counting the number of lines that begin with M in ten thousand files was a thing that you could practically do.

Learning the more complicated language, even if it does mean cracking out an old UNIX terminal and learning how to write software, is a very important way of expanding your understanding of how the technology works, how the information works, and how you can use these to your advantage. Learn to code.

You've done a lot of work with found or leaked data. What are the difficulties working with first lost and then found data?

When dealing with leaks, one of the things you need to be very careful of is the context of the leaks - who is leaking them and what their motives are. You need to understand why they're leaking. You need to understand what circumstances the leakers are in. You need to be mindful of any information that might expose the identity of the source, if indeed the source is unknown. You need to understand the geopolitical ramifications of the information, and understand that harm minimization must be a factor. You must understand who will be harmed by the data, in what way they will be harmed, and what mechanisms they might have to recover. You also need to make sure that the narrative is very solid, and that the data itself won't be misinterpreted in any way.

On the occasions that I've been involved with the large data dumps, one of the really big problems that people keep running into is that we don't necessarily know which bits to draw the focus towards, nor how to have differential focus. What's going to be the most interesting bit forwhich individuals? How do you target those individuals? How do you maximise the impact of the information? That's an unsolved problem. It might simply have to do with going for smaller dockets, smaller packets, smaller sets of data each time. Or it might be just making the analysis of data much easier, so that people can discover it on their own. But there's maybe another aspect, which is the media spin. Depending on the media and the politics, everybody who gets wind of this kind of information will try to alter it, change the story, make it somehow fit their world view, meet their agenda or achieve some goal that they have.

That can be really bad, in that the narratives that are being created from the data get pushed aside and replaced by other narratives, which might be nefarious, or false, or just silly, not based in reality. In the biggest leaks of our era, one of the things that happens very quickly is that as soon as an individual enters into the story, instead of focusing on the data (which takes investigation, research, and thought) you switch to talking about that individual's relative merits and flaws. And you forget the importance of the information that's been provided. It almost leads me to think that sources of information should not actually come forward at all, and protect their identity at any] cost. In part, that also means that the old anarchist maxim of "No gods, no masters, no heroes" should be held in high regard. We don't need heroes, we need data.

What's the value of visual programs and data visualisation?

A lot of people think very visually. Some people think very aurally, some think by moving their body or through talking. But people who have a visual understanding of the world can greatly benefit from narratives being put forward in some kind of visual form - whether that's photographs or infographics or data visualisations or whatever. It simplifies the process of going from a set of random data to understanding.

One of the sad things with modern data is that while there are a few wizards who are doing absolutely amazing visualisations and coming up with really great illustrations of what's happening, there aren't any good, easy-to-use tools that produce beautiful, easy to understand, correctly made infographics. The ones that do exist are young, not quite good enough, and they're very often very difficult to use. So, that's something where those graphics wizards and producers of beautiful things need to either learn to code, or team up with people who do know how to code, and solve the problem once and for all.

What do you think are the biggest problems now with data for activism?

The biggest threats we're facing right now come from the degradation of privacy rights and freedom of expression. They're coming in the form of state-sponsored monopolies, or radical monopolies, or the methods of states to clamp down on certain types of expression by hounding or badgering or making life difficult for people who are expressing themselves in certain ways. A lot of that is happening online, and in spaces where we previously had lots of ability to express ourselves. It's creeping very slowly and it's coming through all sorts of mechanisms, like agreements between governments and corporations, or through international trade agreements, or through little laws. Or in the case of the United States, press releases that change the entire way laws are interpreted.

We're losing ground very, very fast. We aren't going to be able to maintain the type of activism that we know is effective, and that we want to be doing, if we lose the ability to communicate with each other on our own terms about whatever we want. If we can avoid having to arm everybody and do bloody combat, then that's a good goal to attain. But at the end of the day, if it comes to the question of whether we have free speech and other human rights or whether we fight, I think it's worth fighting for those rights.

How will an absence of these rights change the ability of activists,citizens or hackers to investigate??

Access to all of the world's information is basically giving everybody the ability to investigate anything they want. That means that we can do everything – we can learn more about our interests or hobbies, our arts, our history, our politics. We can solve social problems, and we can do amazing things with all this information. You could say that the Internet is the greatest wonder of the world at the moment. But it's there currently by the will of certain people who have decided not to shut it down quite yet. We need to make sure that those people get stripped of their ability to cause harm to the most important infrastructure that we have. It's the way we communicate now. We are the Internet. Being able to communicate, it's invaluable, it's priceless. It changes everything about human interaction.

First published on July 10, 2015