The Evolution of the Global Terrorism Database: Measuring Threats to Peace

For the past three years, the Global Terrorism Database (GTD), maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START), has supplied the data on terrorism used to create the Global Terrorism Index (GTI). When we started the GTD data collection in 2001 our goal was to do our best to collect objective information on all terrorist attacks occurring around the world. My colleagues and I argue in a recent book that all science begins with counting things—whether this be stars in the universe, earthquakes, or terrorist violence. In fact, it seems axiomatic that we cannot do much to reduce or stop terrorism if we cannot first define and count it. Collecting world-wide data on terrorism has been a challenging undertaking. Indeed, collecting comprehensive data on any type of human activity is complicated. Consider the fact that the first comprehensive collection of the English language—what we now call a dictionary--did not appear until 1755. Which means that when William Shakespeare wrote Romeo and Juliet he had no dictionary to rely on—which may be why he found it relatively easy to make up a few words! And indeed cataloguing a list of words is far less controversial than cataloguing terrorist attacks. In this article we review how the GTD is produced, how it is used by policymakers, scholars, and the public, and how understanding terrorism can ultimately promote peace. How has the GTD developed over time? The collection of worldwide terrorism data became much more feasible in the late 1960s when the introduction of portable cameras and satellite technology made it possible for the first time in human history to send pictures and stories almost instantaneously from point to point anywhere in the world. By 1970, there were a handful of individuals and companies in several countries collecting data on terrorist attacks from unclassified media sources. Many of those collecting data in these early days had armed forces backgrounds and had worked for military intelligence before starting new careers collecting data on political violence for private companies. These early terrorism event databases were originally handwritten or typed. Most of the information came from wire services—especially Reuters—and major newspapers with a global focus—The New York Times and the British Financial Times. In 2001, I led a team that gained access to the most comprehensive of these early event databases, which included information on terrorist attacks from 1970 to 1997. We called the digitized version of the original data the Global Terrorism Database. As soon as we had restored the original data our team began making plans to update it. We developed a coding scheme for collecting about 120 pieces of unique information on each terrorist attack we uncovered. As in past efforts, we relied on unclassified information from newspapers and the media to identify cases. Over time data collection increasingly depended on news media published on the Internet. The collection of the GTD has changed a great deal since we began. Today the data collection process begins with a universe of about 2 million articles published daily worldwide in order to identify the relatively small subset of articles that describe terrorist attacks. To convert this huge pipeline of information into the GTD a team led by veteran GTD managers Erin Miller, Michael Jensen, and Brian Wingenroth goes through four steps. First, we use Boolean filters that identify attack-specific keywords to help us decide which articles to include. Over time we have developed a set of filters that experience has shown are especially likely to identify articles about terrorist attacks. It turns out that three of the best words for correctly identifying terrorist attacks from the print and electronic media are attack, bomb, and blast. However, using search terms like these alone produces numerous irrelevant articles. Too often, they are referring to something like the “New York Islanders blast the Ottawa Senators.” And there are always surprises. For example, we have found that the word “terror” is not a very reliable predictor of a real terrorism event! In fact, the word “terror” is more often used in articles to clarify that a particular violent event was not an act of terrorism. Second, we next use natural language processing to remove duplicate stories. Experience has shown that removing articles that are more than 80 percent similar produces the most robust results. And duplicative publication of articles is a growing challenge for generating the GTD. For example, the coordinated terrorist attacks in Paris on November 13, 2015 generated tens of thousands of separate stories around the world. We cannot humanly process this volume so we need to come up with ways of removing duplicates. We have also had to devise scales for ranking the quality of sources so that we treat a story in The New York Times as more authoritative than a source with clear bias like the Voice of Jihad. Third, we use machine-learning models to classify the remaining articles as relevant or not relevant. This works much like Amazon.com classifies product recommendations—as our human coders continue to record new cases the information is used to refine the results. Based on these algorithms we reduce the stream of data to about 15,000 to 20,000 articles per month—small enough that we can finish the process manually. Finally, after passing through these three filters the potential cases are turned over to a staff of about 25 researchers for data entry. Data entry is organized around a series of domain specialties defined by the 120 variables being collected. Thus, we have separate teams that specialize in systematically recording the location of attacks, perpetrators, targets, casualties and consequences, and weapons and tactics. Based on these methods the current GTD includes about 142,000 attacks from 1970 to 2014. How is GTD used by policymakers, scholars, and the public? Use of the GTD has increased dramatically over time. Users include government offices and departments in the United States and abroad, scholars and researchers around the world, NGOs and think tanks, and agencies trying to prevent terrorist attacks. A flagship component of START’s website, the GTD portal had more than 219 million page views in 2015, increasing dramatically from the 17 million page views in 2012 and 11 million in 2009. Half of the web traffic is from the United States while more than 20 percent of visitors are from the European Union. The remaining 30 percent of visitors are from all over the world – Japan and China to Pakistan and Israel to Mexico and Brazil. Government use of START’s website and the GTD’s online portal increased 56% since 2012 (961% since 2009). Since the GTD was made available online in 2007, more than 22,200 people have downloaded the full dataset. In 2015 alone, more than 6,600 individuals in 150 countries and territories around the world downloaded the GTD nearly 9,000 times. Outside of the United States, it was most frequently downloaded in the United Kingdom, Germany, India, China, and Pakistan. The downloads included users in all 50 U.S. states, members of all five branches of the U.S. the military serving in the Americas, Africa, and Europe, and employees of the Intelligence Community, including INTERPOL, the FBI, TSA, the State Department, the Department of Homeland Security, the Department of Defense, and many others. The number of full downloads by government officials has more than doubled in the last four years. Starting in 2012, the US State Department began using the GTD for its official statistics on terrorism in its Country Reports on Terrorism to Congress each year. In the wake of the November 13, 2015 terrorist attacks in Paris, use of the GTD website and dataset surged, illustrating the worldwide craving for objective data to provide context for these horrible events. In November and December, START’s website recorded a record number of page views (more than 50,000), more than doubling the views of the previous two months. The GTD dataset saw a similar trend: in November 2015 alone, it was downloaded 1,663 times and 887 in December; both months topping the 2015 average of 748 downloads a month. It was not just the public who turned to the GTD after the Paris attacks; government usage also spiked as officials sought objective information on the attacks. More than 160 government officials downloaded the dataset in November and December alone. The GTD functions as a common good for policymakers at all levels of government around the world— effectively a force multiplier for those units that do not necessarily have the time or resources to gather and analyze comprehensive, objective data on patterns of terrorist attacks. This is also true of scholarly users who similarly lack resources to collect data independently. In fact, scholars from hundreds of universities, including all eight Ivy League institutions, most major state universities and private universities, as well as community colleges, minority serving institutions, and universities around the world have downloaded the GTD. Students and researchers routinely leverage the database for classroom exercises, theses and dissertations, and scholarly presentations and publications. Outside of academia, analysts from a variety of non-governmental organizations, intergovernmental organizations, and think tanks have used the data to support their efforts. For example, in 2014 a flight co-manager for Médecins Sans Frontières contacted START to let us know that he uses the GTD to help map safe routes for their medical teams in Afghanistan. Journalists often reach out to the GTD to inform their stories and their audiences worldwide. In just the week following the attacks, START researchers responded to more than 65 journalists from a dozen countries seeking data and analysis from the GTD. That information appeared in hundreds – if not thousands – of news stories throughout the world. How can important messages about terrorism and political violence be communicated? As recent reports on the Global Terrorism Index (GTI) produced by the Institute for Economics and Peace (IEP) have convincingly demonstrated, examining the characteristics of terrorism can provide a powerful argument for peace. Thus, in its 2015 report, the IEP provides a detailed analysis of the changing trends in terrorism since 2000, for 162 countries. It investigates the changing patterns of terrorism by geographic activity, methods of attack, organizations involved, and the national economic and political context. The GTI has also been compared to a range of socioeconomic indicators to determine the key underlying factors that have the closest statistical relationship to terrorism. In 2014 the total number of deaths from terrorism increased by 80% when compared to the prior year. This is the largest yearly increase in the last 15 years. Since the beginning of the 21st century, there has been over a nine-fold increase in the number of deaths from terrorism, rising from 3,329 in 2000 to 32,658 in 2014. Yet terrorism remains highly concentrated with most of the activity occurring in just five countries — Iraq, Nigeria, Afghanistan, Pakistan, and Syria. These countries accounted for 78% of the lives lost in 2014. Why are data on terrorism important? To fashion public policies that will promote world peace, we need to measure and understand threats to peace. We continue to work to improve the GTD as an objective source of information for identifying new global hot spots where terrorist threats are increasing as well as success stories where terrorism-related violence is declining. By allowing policymakers, scholars, and the public to link terrorism data to information on broader political, economic, and social processes we hope to provide objective information and facilitate rigorous scientific research to help shift the world’s focus to a more peaceful one. GTD data collection for 2014 is complete and the data are available to the public via START’s website at http://www.start.umd.edu/gtd/. Data collection for 2015 is ongoing, and START will publish an annual update in summer 2016. About the author: Dr. Gary LaFree is Director of START at the University of Maryland and a professor in the Department of Criminology and Criminal Justice. Much of LaFree's research is related to understanding criminal violence, and he is the senior member of the team that created and now maintains the Global Terrorism Database.