Netscore

Netscore is an ML-powered influence scoring tool that assigns each network interaction a score from 1 to 50, indicating the relative influence of that communication edge. It runs locally as a Python CLI tool.

How It Works

Netscore uses a pre-trained machine learning model (HistGradientBoostingRegressor) that engineers 26 features from your network data:

Graph centrality — PageRank and HITS scores for senders and recipients
Message volume — Communication frequency and volume percentiles
Turnaround speed — How quickly participants respond
Relationship metrics — Breadth, depth, and reciprocity of connections
Temporal patterns — Activity spans and recency of interactions

Requirements

Docker installed and running
A CSV edge file with sender, recipient, and datetime columns

Installation

Pull the Netscore Docker image from Docker Hub:

docker pull atodev/onadashboard:netscore

The image includes Python, all ML dependencies, and the pre-trained model — no local Python setup needed.

Usage

Place your edge CSV in your working directory, then run:

docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csv

This mounts your current directory into the container. The scored output file (your_edges_scored.csv) will appear in the same directory.

Input: A CSV with sender, recipient, and datetime columns. Column names are auto-detected (supports common variations like SenderAddress, From, Source, Recipient, To, Target, etc.).

Output: A your_edges_scored.csv file containing only 4 columns:

sender
recipient
datetime
netscore (1–50 integer)

Importing Scored Data into the Dashboard

Run docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csv to generate the scored CSV
In the ONA Dashboard, go to Load Data and import the scored CSV
The dashboard automatically detects the netscore column:
- A Netscore Range slider appears in the sidebar (below Time Range)
- Use it to filter the network by influence score
- The filter applies across all chart types (Network Graph, Sankey, Venn, Time Series)
In Graph Settings, select Netscore as the edge width mode to visually size edges by their influence score

Performance

Dataset Size	Approximate Time
10,000 rows	~1 second
100,000 rows	~5 seconds
500,000 rows	~25 seconds

Column Auto-Detection

The tool automatically detects columns by name pattern:

Column	Detected Names
Sender	sender, senderaddress, from, source, mail_from, organizer
Recipient	recipient, recipientaddress, to, target, mail_to
Date	date, datetime, timestamp, created, sent, received, time

If names don't match, the tool falls back to positional detection (columns 1, 2, and the first parseable date column).

Timestamp Formats

Supported formats include:

ISO 8601: 2024-01-15T09:30:00Z
Date only: 2024-01-15, 15/01/2024
Unix epoch (seconds): 1705312200
With timezone: 2024-01-15T09:30:00+05:00

ML Architecture

Why HistGradientBoostingRegressor

Gradient boosting was chosen over alternatives for this task:

Handles mixed feature types (log-scaled continuous, binary, percentile) without normalization
Robust to outliers in communication data (e.g., automated mass-senders)
Fast training and inference — handles 500K+ rows in seconds
Natively supports missing values (common in sparse network data)

HistGradientBoosting specifically uses histogram-based splitting, making it significantly faster than standard GradientBoosting on large datasets while maintaining comparable accuracy.

Hyperparameters: max_iter=50, max_depth=4, learning_rate=0.2, max_leaf_nodes=31, min_samples_leaf=50

Feature Engineering (26 Features)

Features are grouped into five categories, each capturing a different dimension of network influence:

Category	Features	Why
Graph centrality (5)	PageRank, HITS hub/authority scores, geometric mean	Captures structural importance — who sits at critical junctions in the network
Message volume (5)	Send/receive counts, percentile ranks, pair volume	Raw activity level — high-volume communicators have more influence
Turnaround speed (3)	Median reply latency (sender/recipient), pair average	Responsiveness signals engagement — fast repliers are more influential
Relationship breadth/depth (7)	Unique contacts, reciprocity ratio, avg messages per relationship	Distinguishes broad connectors from deep relationships
Temporal (3)	Activity span, pair recency	Recent and sustained activity indicates active influence

All volume and count features use log1p scaling to compress heavy-tailed distributions. Centrality scores are multiplied by 1e6 before log-scaling to preserve precision.

Output Scaling

Raw model predictions are mapped to a 1–50 integer range using MinMaxScaler, then rounded and clipped. This ensures scores are interpretable and comparable across datasets.

Model Artifact

The netscore_model.pkl file (126 KB) contains:

Trained HistGradientBoostingRegressor model
Fitted MinMaxScaler for output normalization
feature_cols list ensuring correct feature ordering

Top Features by Importance

Feature	Importance	Description
`pair_recency_d`	0.149	Days since last interaction on this edge
`s_volume`	0.093	Sender's total message count (log-scaled)
`r_volume`	0.090	Recipient's total message count (log-scaled)
`pair_volume`	0.072	Messages on this specific edge (log-scaled)
`s_pagerank`	0.061	Sender's PageRank centrality (log-scaled)

Recency dominates because recent communication is the strongest signal of active influence in organizational networks.

How It Works​

Requirements​

Installation​

Usage​

Importing Scored Data into the Dashboard​

Performance​

Column Auto-Detection​

Timestamp Formats​

ML Architecture​

Why HistGradientBoostingRegressor​

Feature Engineering (26 Features)​

Output Scaling​

Model Artifact​

Top Features by Importance​