Why Natural Language Processing Is Crucial for Open-Source Intelligence Analysts
The need for NLP
Each day, people create about 2.5 quintillion bytes of data online. To put that into perspective, that’s 2.5 billion billion, or 2.5 followed by 18 zeros.
That data quantity is hard to comprehend. Even if you only consider the amount of publicly-available data created each day—which is likely significantly smaller—it’s still a lot to consider for practical applications like open-source intelligence (OSINT).
But despite this challenge, security and intelligence professionals have increasingly prioritized open-source data, using publicly-available content across the surface web, deep web, and dark web for investigations. The U.S. Defense Intelligence Agency revealed that about 80% of their reports are based on open sources. The OSINT Foundation has also emerged to support operationalizing OSINT practices for public sector use, and the value of social media data for security purposes has been demonstrated in cases like Russia’s invasion of Ukraine.
As the volume of data continues to grow, it’s important to have tools that can help derive meaningful insights from this data. Natural language processing (NLP) is one such tool that’s becoming increasingly valuable for intelligence tasks. In this blog, we’ll explore how NLP can help security and intelligence professionals process and analyze large volumes of data with greater accuracy and speed.
How data abundance impacts the intelligence cycle
Intelligence analysts, whether they validate cybersecurity threats or analyze social media for information environment assessments, are drowning in data. In addition to sources like social media, analysts often ingest data from field sensors, IoT devices, and other infrastructure. Across the board, intelligence analysts aren’t short on data—they’re short on tools to make it actionable quickly enough.
The intelligence cycle requires analysts to collect, process, and analyze data to create an intelligence report for dissemination. Many of these steps are performed manually, taking a significant amount of time and resources—especially for understaffed intelligence teams.
Analysts often battle alert fatigue and overlook insights that could be crucial for decisions higher up the chain of command. According to Forrester research, 70% of security decision-makers are emotionally impacted by managing threat alerts and almost 30% of alerts are dismissed due to time constraints. For intelligence cycles supporting crisis management or battlefield decisions, for example, this could give adversaries an advantage or jeopardize people and assets of interest.
How NLP is helping
NLP is a branch of artificial intelligence (AI) that enables computers to interpret and understand text and speech similarly to humans. In other words, NLP-powered applications can understand the full meaning of language, including its sentiment, meaning, and context. This allows intelligence professionals to interpret open-source web data—much of which is text-based—at scale.
By deriving meaning from text, NLP helps turn data into intelligence quickly and at scale. It also makes it easier to separate relevant data based on its content, reducing false positives and noise. For global investigations, NLP can also translate and interpret multilingual content more rapidly and accurately. While the nuance of human analysis can’t be replaced, these capabilities save analysts a significant amount of time and resources. They can also be used to help build more intuitive intelligence tools that simplify analyst workflows.
NLP helps intelligence analysts:
- Leverage a wider breadth of data sources without exhausting time and resources.
- Improve speed-to-information to drive more timely intelligence cycles and decisions.
- Utilize more user-friendly software, in turn improving the intelligence cycle’s speed and accuracy.
Applying NLP with Flashpoint
What does NLP look like in practice when gathering and analyzing open-source data? This depends on the software and which uses NLP has been trained for. As an example, Flashpoint’s open-source intelligence solution, Echosec, uses NLP to support three core goals: usability, access to diverse data sources (with a focus on social media), and speed-to-information. Its NLP is trained for:
- Detecting threats. Echosec tags social media posts with one or more threat categories, such as “identity hate” or “data disclosure,” if detected. Posts are also scored based on the system’s confidence in their categorization. This makes it easier for users to analyze and triage open-source content at a glance, quickly separating real risks from innocuous chatter. Threat detection applies to dozens of different languages, supporting investigations across global data sources and online communities.
- Extracting entities. Entities like people, organizations, phone numbers, emails, crypto wallets, and other identifiers often drive online investigations. Echosec uses NLP to automatically highlight and separate these entities so analysts don’t have to read through every result to find what they’re looking for.
- Detecting languages. Echosec shows analysts the language detected in each search result and uses NLP to automatically translate data into the analyst’s native language. This makes it easier to broaden data coverage beyond an analyst’s native language and saves time in external translation tools.
- Expanding geospatial data. Based on our estimates, only 1-3% of social media posts are precisely geotagged. This makes it hard for analysts to pinpoint where most content originates. Flashpoint’s Echosec Platform uses NLP to tag posts with likely coordinates based on landmarks, street names, and other identifiers mentioned. This returns significantly more results when users perform location-based searches, which are valuable for crisis response, conflict monitoring, and other geo-sensitive use cases.
How the Echosec Platform delivers expanded geospatial data
Flashpoint’s Echosec Platform uses NLP to tag posts with likely coordinates based on landmarks, street names, and other identifiers mentioned. This returns significantly more results when users perform location-based searches, which are valuable for crisis response, conflict monitoring, and other geo-sensitive use cases.
These NLP-powered features were developed to address in-demand requirements for intelligence professionals engaged in public sector security and defense. However, NLP could have several other social data applications, including summarizing high-level trends, detecting other types of content categories, or generating reports.
Helping intelligence teams combat data overload with NLP won’t happen overnight. However, tools like Echosec leverage NLP in support of one key goal: to make the analyst’s job easier. By prioritizing these technologies in their toolkits, intelligence leaders can drive more timely, informed intelligence, stay ahead of adversaries, and keep people and assets safer.
Use Flashpoint to hone your intelligence
Flashpoint is the leader in delivering clear, actionable intelligence that helps your security professionals and teams stay on top of risks and defend your organization. Start a free trial to experience how better intelligence equals better outcomes for your assets, infrastructure, and personnel.