ADRD at IFF: Learning Who Will Use the Datasets, and How

A persona created to represent a user of the Arab Digital Rights Datasets at IFF in March, 2017.
A persona created to represent a user of the Arab Digital Rights Datasets at IFF in March, 2017.

In early March, I and the Datasets’ legal adviser, Nani Jansen, led a session on the datasets at the Internet Freedom Festival (IFF), a weeklong gathering in Valencia, Spain, of digital rights advocates from all over the world. We used the session — Data Exploration Hackathon: Visualizing the Relationship between Rule of Law and Digital Rights in MENA and Beyond — to introduce phase 3 of the project and explore who might use this data and exactly how.

Among the very dedicated attendees of the session—who spent three hours with us in Taller 6, the smallest, stuffiest room at the otherwise muy cómodo Las Naves—were lawyers, journalists, advocacy directors and activists, human rights researchers and academics, as well as program officers from international agencies and donors.

This was the first public presentation of the Datasets since the previous year at IFF. We began by introducing the history of the project and how the methodology and categorization of laws has developed over the past several months, leading up to the current data collection phase, during which 13 legal and human rights researchers are identifying laws, regulations, draft laws, caselaw, and specific articles of interest related to digital rights in the legal frameworks of the 22 countries of the Arab League. We’ll be posting more about these processes here soon.

We spent the rest of the session in breakout groups gathering input on who our stakeholders are and what they want from such a dataset, by developing user personas and user stories. These outputs, commonly used by software and website developers to get a sense of who their users are, will ultimately help us develop the technical specifications for the technological interpretation of the dataset, which we expect to include both a simple website where users can conduct simple queries and perhaps a plan for an API.

For instance, one user persona/story went like this:

Basma, an independent activist, blogger is in her late twenties. Her first experiences in activism started in college. Basma writes about social and political issues on her blog and she has a dedicated following. She changes jobs frequently and has a small income from ads on her blog. Her political activities are a financial burden for he and she cannot afford unexpected expenses, such as fines or legal costs. Basma visits the Datasets frequently so she can stay up to date on laws that apply to her blog. She also finds data that she can use in her blogposts.

Another imagined Samya:

A freelance outreach coordinator on Internet freedom issues for international audience. She lives in France and once, when trying to communicate with her parents in Morocco, she realized that she couldn’t speak to them over VOIP. This promoted her to do background research on the issue, which she also does for outreach initiatives and campaigns she advises. She also needs to assess legal threats posed by her work and to her clients and their partners. She often needs to write situation assessments and other reports quickly, but must be sure that the information she’s citing is accurate, so as not to compromise her credibility or that of her clients. Also, if she can’t find the laws she needs, she must be able to explain why–so it’s crucial that she be able to assess how complete the Datasets are and how frequently they are updated.

Several journalist personas were also created, as follows:

I’m a professional, female, Arabic journalist working int he region and am a member of my national journalists’ syndicate. I need to know what are the current provisions of the laws so I can provide expert input into a government consultation/public hearing.

I’m a foreign freelance journalist (female, mid-20s) on a tight budget. I’m covering a story in Tunisia and I need to know the laws on defamation, freedom of expression, social media, etc., so that I can keep myself and my fixer safe.

I’m an experienced English-speaking journalist based in New York. I need to know which countries criminalize posting “false news” online, so that I can write an article about the dangers. If I can’t verify my information beyond a shadow of a doubt, my editor won’t run the story. Plus, I need examples of individuals who have been prosecuted under these laws. Oh, and I’m on a very tight deadline.

Another group developed a persona for a researcher, approaching the Datasets from an academic’s perspective:

Leila, a researcher investigating the state of digital rights across the MENA region, wants to conduct comparative research and longitudinal research, and to be able to correlate her findings with external themes. Specifically, she wants to know how the political changes of 2011 changed government attitudes towards the right to privacy in MENA countries, looking at the period from 2006 to 2016.

Finally, the last group imagined a policy analyst at a foreign ministry, an advocate/funder at an international media development organization, and a technologist/digital security expert. Here are there stories:

James, a technologist/digital security trainer needs an up-to-date reference source of locally verified information to pass on to his co-trainers in the field, so that they can do a pre-training assessment of the legality/risks/usefulness of various tools and practices, which will help them prioritize which topics to cover in the limited time they have.

Hannan, who develops partnerships with local organizations, wants an interactive, customizable index or map or database that will help her detect trends and even upload locally collected data to model/manipulate programmatic interventions. The data should be splice-able at national, regional, and global levels.

Giselle is a policy analyst at a foreign ministry that invests millions of dollars each year into internet freedom initiatives. She needs a queryable database of cyber-related laws so that she can look at trends and comparative data that can inform her critiques of flawed legislation and draft model language.

Many of these personas actually represented many of the people in the room. While this deviated somewhat from the typical aim of the user persona and user stories exercise, which is meant to get entrepreneurs, developers and technologists, away from building for what they think people need. But it’s hard to argue that these ideas don’t reflect the spectrum of needs we’re hoping that the Datasets serve.

At the same time, there are some perspectives that weren’t represented, such as that of lawyers—particularly human rights defenders—and activists, who we also think might find the Datasets useful for developing arguments in court or identifying problem areas for targeted policy reform. It was also suggested that we host a similar workshop (or series) back in Lebanon with only participants from the region, or only one kind of stakeholder, researchers, for example. This, it was suggested, would help us drill down even more into how this data can better benefit the primary communities it’s meant to serve.

Another hack that was suggested was to develop personas not as a subset of the stakeholder groups (journalists, activists, lawyers, etc.) but according to how they would use the data and/or their specific decision-making processes and workflows.

We’re planning to do that. But first, we’ll run a similar exercise at RightsCon next week, on Friday at noon, in the Demo Room. If you’re in Brussels, we’d love to see you there.



Datasets Featured in MEDMEDIA Projects Database

MedMedia is a cross-Mediterranean program, implemented by multiple stakeholders and designed to “complement ongoing campaigns to promote media freedoms and overcome the barriers to sectorial change.” As part of the initiative, a database of media development projects across the region has been created to help minimize overlap of efforts, among other aims. The Arab Digital Rights Datasets are included in this mapping and will be featured on the Med-MEDIA website in an upcoming blogpost.

EFF “Crime of Speech” Report References Datasets

The Electronic Frontier Foundation (EFF) has published the “Crime of Speech: How Arab Governments Use the Law to Silence Expression Online,” a new report by Wafa ben Hassine that looks at legal frameworks for online expression in the MENA region generally, and examines which kinds of laws are being used in four Arab-region countries to crackdown on online expression in particular. Ben Hassine completed the report during a six-month period as an Information Controls Fellow through the Open Technology Fund.

Among Ben Hassine’s key findings are

that law enforcement only applies them after it’s identified the journalist or protestor that it wants to arrest. The pattern is that authorities will find the offending speech and then choose the law that can be interpreted to most closely address it. The system results in a rule by law rather than rule of law: the goal is to arrest, try, and punish the individual—the law is merely a tool used to reach an already predetermined conviction.

The report relies heavily on the Arab Digital Rights Datasets and cross-references that data with “specific cases of arrest, detention, and imprisonment due to online activity, and where law enforcement targeted the individual under the guise of going after cybercrime or countering terrorism online.”

Like the legislative data, Ben Hassine’s data of arrests and detention is also openly accessible in CSV format.


A Webinar with on the Arab Digital Rights Datasets

Update August 10, 2016: Silk was purchased by Palantir and is no longer being maintained. We will be updating the dataset and porting over the data to a new online outpost in mid to late 2017.

In this webinar, I spoke with Jurian Bass and Sarah Aoun, from, a platform where people can easily upload and visualize their datasets. I talk about how SMEX created the open Arab Digital Rights Dataset and then used as a publishing platform to create, a multimedia research portal that “illuminates trends in how Arab governments are limiting digital rights, such as free expression and privacy online.”

Datasets Kick Off MENA Internet Policy Observatory Workshop in Istanbul

Earlier this month, the Annenberg School for Communication’s Internet Policy Observatory teamed up with Citizen Lab, ASL19, Social Media Exchange, 7iber, and Kadir Has University’s New Media Department to host an Internet Policy Research Methods Workshop focused on policy development in the MENA region. The program brought together young scholars and activists working in digital rights and the internet policy space in an intensive four-day practicum that provided a survey of both qualitative and quantitative, online and offline research methods with the goal of enhancing and advancing their advocacy efforts.

Advancing Policy Advocacy for Digital Rights in the MENA Region


SMEX had the privilege of framing the workshop by outlining the current state of digital rights in the MENA region in our session “Advancing Policy Advocacy for Digital Rights in the MENA Region (embedded above).” Through recent research and the Arab Digital Rights Datasets, which we had recently visualized using the Silk.coo platform, we focused on emerging legal and social trends and how civil society and citizens-at-large are responding to them.

We also highlighted several recent advocacy initiatives, their successes and failures, and explored how the availability—or lack thereof—of Internet policy data enhances (or prevents) advocacy efforts to protect free expression and privacy online.

In advance of the workshop, we shared the following briefing materials with participants:

Experimenting with the Datasets at 2014



With a greatly expanded dataset—including laws from 20 countries, nearly twice as many countries as the original dataset that covered six countries and Iran—SMEX participated in the inaugural workshop hosted by Small Media in London, last month.

The two-day workshop covered principles of data visualization and then gave teams comprising civil society organizations with datasets, designers, and coders a chance to play with their data and how to make it relevant to change processes.

This was the first time SMEX was able to see the ADRD in action. Our design and coding sprint culminated in the following presentation for a prototype (above). Clicking through the presentation will give you an idea of the kinds of questions we wanted to ask of the data, including:

  • Whether laws were passed more quickly in the wake of the Arab spring;
  • Ideas for how to cross-reference the legislation with other types of data, such as individual cases of detention and prosecution for alleged online speech crimes; and
  • An already well-established sense that since this is largely a user-generated dataset, that more work would need to be done on the methodology to make it a reliable source for research, reporting, and legal proceedings.

The Emerging Legal Framework for Free Expression Online in MENA

This paper surveys the emerging legal framework for online expression in the Arab region and is the foundation for a series of blogposts on the topic on the SMEX website. The research conducted for it, along with the initial data collection, originally spurred the idea of the datasets.