Netscore
Netscore is an ML-powered influence scoring tool that assigns each network interaction a score from 1 to 50, indicating the relative influence of that communication edge. It runs locally as a Python CLI tool.
How It Works
Netscore uses a pre-trained machine learning model (HistGradientBoostingRegressor) that engineers 26 features from your network data:
- Graph centrality — PageRank and HITS scores for senders and recipients
- Message volume — Communication frequency and volume percentiles
- Turnaround speed — How quickly participants respond
- Relationship metrics — Breadth, depth, and reciprocity of connections
- Temporal patterns — Activity spans and recency of interactions
Requirements
- Docker installed and running
- A CSV edge file with sender, recipient, and datetime columns
Installation
Pull the Netscore Docker image from Docker Hub:
docker pull atodev/onadashboard:netscore
The image includes Python, all ML dependencies, and the pre-trained model — no local Python setup needed.
Usage
Place your edge CSV in your working directory, then run:
docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csv
This mounts your current directory into the container. The scored output file (your_edges_scored.csv) will appear in the same directory.
Input: A CSV with sender, recipient, and datetime columns. Column names are auto-detected (supports common variations like SenderAddress, From, Source, Recipient, To, Target, etc.).
Output: A your_edges_scored.csv file containing only 4 columns:
senderrecipientdatetimenetscore(1–50 integer)
Importing Scored Data into the Dashboard
- Run
docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csvto generate the scored CSV - In the ONA Dashboard, go to Load Data and import the scored CSV
- The dashboard automatically detects the
netscorecolumn:- A Netscore Range slider appears in the sidebar (below Time Range)
- Use it to filter the network by influence score
- The filter applies across all chart types (Network Graph, Sankey, Venn, Time Series)
- In Graph Settings, select Netscore as the edge width mode to visually size edges by their influence score
Performance
| Dataset Size | Approximate Time |
|---|---|
| 10,000 rows | ~1 second |
| 100,000 rows | ~5 seconds |
| 500,000 rows | ~25 seconds |
Column Auto-Detection
The tool automatically detects columns by name pattern:
| Column | Detected Names |
|---|---|
| Sender | sender, senderaddress, from, source, mail_from, organizer |
| Recipient | recipient, recipientaddress, to, target, mail_to |
| Date | date, datetime, timestamp, created, sent, received, time |
If names don't match, the tool falls back to positional detection (columns 1, 2, and the first parseable date column).
Timestamp Formats
Supported formats include:
- ISO 8601:
2024-01-15T09:30:00Z - Date only:
2024-01-15,15/01/2024 - Unix epoch (seconds):
1705312200 - With timezone:
2024-01-15T09:30:00+05:00
ML Architecture
Why HistGradientBoostingRegressor
Gradient boosting was chosen over alternatives for this task:
- Handles mixed feature types (log-scaled continuous, binary, percentile) without normalization
- Robust to outliers in communication data (e.g., automated mass-senders)
- Fast training and inference — handles 500K+ rows in seconds
- Natively supports missing values (common in sparse network data)
HistGradientBoosting specifically uses histogram-based splitting, making it significantly faster than standard GradientBoosting on large datasets while maintaining comparable accuracy.
Hyperparameters: max_iter=50, max_depth=4, learning_rate=0.2, max_leaf_nodes=31, min_samples_leaf=50
Feature Engineering (26 Features)
Features are grouped into five categories, each capturing a different dimension of network influence:
| Category | Features | Why |
|---|---|---|
| Graph centrality (5) | PageRank, HITS hub/authority scores, geometric mean | Captures structural importance — who sits at critical junctions in the network |
| Message volume (5) | Send/receive counts, percentile ranks, pair volume | Raw activity level — high-volume communicators have more influence |
| Turnaround speed (3) | Median reply latency (sender/recipient), pair average | Responsiveness signals engagement — fast repliers are more influential |
| Relationship breadth/depth (7) | Unique contacts, reciprocity ratio, avg messages per relationship | Distinguishes broad connectors from deep relationships |
| Temporal (3) | Activity span, pair recency | Recent and sustained activity indicates active influence |
All volume and count features use log1p scaling to compress heavy-tailed distributions. Centrality scores are multiplied by 1e6 before log-scaling to preserve precision.
Output Scaling
Raw model predictions are mapped to a 1–50 integer range using MinMaxScaler, then rounded and clipped. This ensures scores are interpretable and comparable across datasets.
Model Artifact
The netscore_model.pkl file (126 KB) contains:
- Trained
HistGradientBoostingRegressormodel - Fitted
MinMaxScalerfor output normalization feature_colslist ensuring correct feature ordering
Top Features by Importance
| Feature | Importance | Description |
|---|---|---|
pair_recency_d | 0.149 | Days since last interaction on this edge |
s_volume | 0.093 | Sender's total message count (log-scaled) |
r_volume | 0.090 | Recipient's total message count (log-scaled) |
pair_volume | 0.072 | Messages on this specific edge (log-scaled) |
s_pagerank | 0.061 | Sender's PageRank centrality (log-scaled) |
Recency dominates because recent communication is the strongest signal of active influence in organizational networks.