Methodology

How Nile Intel collects, structures, and scores events across the Horn of Africa

Overview

Nile Intel is an automated open-source intelligence (OSINT) platform that monitors 16+ news sources covering Sudan, South Sudan, and the broader Horn of Africa. It ingests articles via RSS feeds, clusters related reporting, and uses large language models to extract structured event data — including event type, severity, actors, regions, and verification status.

The goal is to provide timely, structured situational awareness for organizations operating in or monitoring the region — at a fraction of the cost and latency of traditional intelligence services.

Transparency note: Nile Intel is an automated system. All event extractions are produced by AI models applied to publicly available news reporting. They are not editorial judgments and should be cross-referenced with primary sources for operational decisions.

Pipeline

Every article passes through a six-stage pipeline from raw RSS feed to structured, queryable event data.

Source Ingestion

RSS feeds from 16+ news sources are polled every 15 minutes. Articles are filtered for relevance to Sudan, South Sudan, and adjacent regions using keyword matching on titles and descriptions.

Article Clustering

Related articles about the same event are grouped using cosine similarity on title and description text. This reduces 50+ daily articles into 10-15 distinct story clusters, preventing duplicate coverage from inflating event counts.

Extractive Summary

Each cluster gets an initial summary by selecting the longest, most detailed article description. This serves as a fast fallback when AI summarization is unavailable.

AI Event Extraction

A large language model (Llama 3.3 70B via Groq) reads all articles in each cluster and extracts structured fields: event type, subtype, severity (1-5), scope, country, regions, actors, verification status, and confidence score. The model also provides a rationale explaining its severity and verification decisions.

Validation & Quality Control

Each extraction is validated against a strict schema. Events that fail validation (missing required fields, invalid values) or have very low confidence (<0.3) are quarantined for review rather than published. This prevents hallucinated or poorly-supported events from entering the database.

Actor Normalization

Actor names are mapped to canonical forms using a dictionary of 80+ aliases. For example: "Govt of South Sudan", "GoSS", and "South Sudan government" all resolve to "Government of South Sudan". This enables consistent querying and trend analysis.

Severity Scale

Every event is assigned a severity score from 1-5 based on the scope, impact, and urgency of the reported situation.

Level	Label	Definition	Examples
1	Routine	Scheduled events, routine statements, standard reporting	Government press briefings, scheduled UN meetings, routine humanitarian updates
2	Notable	Localized incidents, policy changes, organizational shifts	Minor clashes with no casualties, new policy announcements, staff rotations
3	Significant	Regional displacement, major political shifts, economic disruptions	Multi-day protests, significant troop movements, trade route disruptions
4	Major	Large-scale violence, state-level crisis, major international intervention	Multi-faction clashes with casualties, large-scale displacement (10K+), state of emergency
5	Critical	War escalation, mass atrocity, national emergency	Full-scale military offensive, reported mass atrocities, capital under siege

Source Tiering

Sources are classified into three reliability tiers. This classification is deterministic (not AI-assigned) and influences the verification status of extracted events.

Tier	Criteria	Sources
Tier 1	International wire services, major broadcasters, UN agencies with editorial standards and fact-checking processes	BBC Africa, Reuters, Al Jazeera, The Guardian Africa, France24, UN News, VOA
Tier 2	Regional outlets with established track records, local knowledge, but potentially less editorial oversight	Radio Tamazuj, Eye Radio, Sudan Tribune, Dabanga Radio, Africanews
Tier 3	Aggregators, diaspora media, or outlets with limited editorial processes	Google News aggregates, Nyamilepedia

Verification Status

Each event receives one of three verification labels:

Confirmed: Reported by 2+ Tier 1 sources, or corroborated by official statements/UN reports. High confidence in factual accuracy.
Reported: Reported by at least one credible source but not independently verified. The default for most events.
Unverified: Single-source reporting from Tier 3 sources, or where the AI extraction had low confidence. Should be treated as preliminary.

Event Classification

Events are classified into six primary types:

Security: Armed clashes, ceasefire violations, military operations, intercommunal violence
Political: Elections, peace talks, government formation, political protests, diplomatic developments
Humanitarian: Displacement, food insecurity, disease outbreaks, aid delivery, protection concerns
Economic: Trade disruptions, currency crises, oil production changes, sanctions
Infrastructure: Road/bridge destruction, power outages, telecommunications, construction projects
Legal: Court rulings, ICC proceedings, human rights investigations, legislative changes

Quality Control

Quarantine System

Extractions that fail validation are quarantined rather than discarded. This serves two purposes:

Safety: Low-quality extractions never reach the public database
Learning: Quarantined records are reviewed to improve the extraction prompt and identify systematic failure modes

Deduplication

Each article cluster is hashed based on its constituent article titles. If a cluster has already been extracted (or quarantined), it is skipped. This prevents the same event from being counted multiple times across feed refresh cycles.

Event counting: Events are deduplicated across sources via clustering. All counts in the Event Archive represent unique events, not individual articles. When multiple outlets report the same event, it appears as a single event record with multi-source attribution.

Confidence Assessment

Each extraction is assessed as High, Medium, or Low confidence based on source agreement, extraction consistency, and verification status. Low-confidence extractions (below 0.3 on the internal scale) are automatically quarantined for review rather than published. Confidence levels are tracked over time to monitor extraction reliability.

Provenance

Every event record in the database includes full provenance metadata:

Model version: Which AI model produced the extraction (e.g., llama-3.3-70b-versatile)
Prompt version: Which extraction prompt was used (versioned for auditability)
Source articles: URLs of all articles in the cluster that informed the extraction
Extraction timestamp: When the extraction was performed
Rationale: The model's explanation for its severity and verification decisions

Limitations

Users should be aware of the following limitations:

Source dependency: Nile Intel can only report on events covered by its monitored sources. Events in remote areas with no media access may not appear.
AI extraction errors: While validated, AI extractions can misclassify event types, miscalculate severity, or miss nuances that a human analyst would catch.
Latency: Events appear in the database within 15 minutes of RSS publication. This is faster than weekly reports but slower than social media monitoring.
Language coverage: Currently limited to English-language sources. Arabic and local language reporting is not directly ingested.
No primary reporting: Nile Intel does not have correspondents or conduct original investigations. All data derives from published news articles.

Contact: For questions about methodology, data access, or partnership inquiries, reach out via the Weekly Brief subscription or the event archive alert signup.