Nile Intel

Methodology

How Nile Intel collects, structures, and scores events across the Horn of Africa

Overview

Nile Intel is an automated open-source intelligence (OSINT) platform that monitors 16+ news sources covering Sudan, South Sudan, and the broader Horn of Africa. It ingests articles via RSS feeds, clusters related reporting, and uses large language models to extract structured event data — including event type, severity, actors, regions, and verification status.

The goal is to provide timely, structured situational awareness for organizations operating in or monitoring the region — at a fraction of the cost and latency of traditional intelligence services.

Transparency note: Nile Intel is an automated system. All event extractions are produced by AI models applied to publicly available news reporting. They are not editorial judgments and should be cross-referenced with primary sources for operational decisions.

Pipeline

Every article passes through a six-stage pipeline from raw RSS feed to structured, queryable event data.

1

Source Ingestion

RSS feeds from 16+ news sources are polled every 15 minutes. Articles are filtered for relevance to Sudan, South Sudan, and adjacent regions using keyword matching on titles and descriptions.

2

Article Clustering

Related articles about the same event are grouped using cosine similarity on title and description text. This reduces 50+ daily articles into 10-15 distinct story clusters, preventing duplicate coverage from inflating event counts.

3

Extractive Summary

Each cluster gets an initial summary by selecting the longest, most detailed article description. This serves as a fast fallback when AI summarization is unavailable.

4

AI Event Extraction

A large language model (Llama 3.3 70B via Groq) reads all articles in each cluster and extracts structured fields: event type, subtype, severity (1-5), scope, country, regions, actors, verification status, and confidence score. The model also provides a rationale explaining its severity and verification decisions.

5

Validation & Quality Control

Each extraction is validated against a strict schema. Events that fail validation (missing required fields, invalid values) or have very low confidence (<0.3) are quarantined for review rather than published. This prevents hallucinated or poorly-supported events from entering the database.

6

Actor Normalization

Actor names are mapped to canonical forms using a dictionary of 80+ aliases. For example: "Govt of South Sudan", "GoSS", and "South Sudan government" all resolve to "Government of South Sudan". This enables consistent querying and trend analysis.

Severity Scale

Every event is assigned a severity score from 1-5 based on the scope, impact, and urgency of the reported situation.

Level Label Definition Examples
1 Routine Scheduled events, routine statements, standard reporting Government press briefings, scheduled UN meetings, routine humanitarian updates
2 Notable Localized incidents, policy changes, organizational shifts Minor clashes with no casualties, new policy announcements, staff rotations
3 Significant Regional displacement, major political shifts, economic disruptions Multi-day protests, significant troop movements, trade route disruptions
4 Major Large-scale violence, state-level crisis, major international intervention Multi-faction clashes with casualties, large-scale displacement (10K+), state of emergency
5 Critical War escalation, mass atrocity, national emergency Full-scale military offensive, reported mass atrocities, capital under siege

Source Tiering

Sources are classified into three reliability tiers. This classification is deterministic (not AI-assigned) and influences the verification status of extracted events.

Tier Criteria Sources
Tier 1 International wire services, major broadcasters, UN agencies with editorial standards and fact-checking processes BBC Africa, Reuters, Al Jazeera, The Guardian Africa, France24, UN News, VOA
Tier 2 Regional outlets with established track records, local knowledge, but potentially less editorial oversight Radio Tamazuj, Eye Radio, Sudan Tribune, Dabanga Radio, Africanews
Tier 3 Aggregators, diaspora media, or outlets with limited editorial processes Google News aggregates, Nyamilepedia

Verification Status

Each event receives one of three verification labels:

Event Classification

Events are classified into six primary types:

Quality Control

Quarantine System

Extractions that fail validation are quarantined rather than discarded. This serves two purposes:

  1. Safety: Low-quality extractions never reach the public database
  2. Learning: Quarantined records are reviewed to improve the extraction prompt and identify systematic failure modes

Deduplication

Each article cluster is hashed based on its constituent article titles. If a cluster has already been extracted (or quarantined), it is skipped. This prevents the same event from being counted multiple times across feed refresh cycles.

Event counting: Events are deduplicated across sources via clustering. All counts in the Event Archive represent unique events, not individual articles. When multiple outlets report the same event, it appears as a single event record with multi-source attribution.

Confidence Assessment

Each extraction is assessed as High, Medium, or Low confidence based on source agreement, extraction consistency, and verification status. Low-confidence extractions (below 0.3 on the internal scale) are automatically quarantined for review rather than published. Confidence levels are tracked over time to monitor extraction reliability.

Provenance

Every event record in the database includes full provenance metadata:

Limitations

Users should be aware of the following limitations:

Contact: For questions about methodology, data access, or partnership inquiries, reach out via the Weekly Brief subscription or the event archive alert signup.