Skip to main content

Netscore

Netscore is an ML-powered influence scoring tool that assigns each network interaction a score from 1 to 50, indicating the relative influence of that communication edge. It runs locally as a Python CLI tool.

How It Works

Netscore uses a pre-trained machine learning model (HistGradientBoostingRegressor) that engineers 26 features from your network data:

  • Graph centrality — PageRank and HITS scores for senders and recipients
  • Message volume — Communication frequency and volume percentiles
  • Turnaround speed — How quickly participants respond
  • Relationship metrics — Breadth, depth, and reciprocity of connections
  • Temporal patterns — Activity spans and recency of interactions

Requirements

  • Docker installed and running
  • A CSV edge file with sender, recipient, and datetime columns

Installation

Pull the Netscore Docker image from Docker Hub:

docker pull atodev/onadashboard:netscore

The image includes Python, all ML dependencies, and the pre-trained model — no local Python setup needed.

Usage

Place your edge CSV in your working directory, then run:

docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csv

This mounts your current directory into the container. The scored output file (your_edges_scored.csv) will appear in the same directory.

Input: A CSV with sender, recipient, and datetime columns. Column names are auto-detected (supports common variations like SenderAddress, From, Source, Recipient, To, Target, etc.).

Output: A your_edges_scored.csv file containing only 4 columns:

  • sender
  • recipient
  • datetime
  • netscore (1–50 integer)

Importing Scored Data into the Dashboard

  1. Run docker run --rm -v $(pwd):/data atodev/onadashboard:netscore predict /data/your_edges.csv to generate the scored CSV
  2. In the ONA Dashboard, go to Load Data and import the scored CSV
  3. The dashboard automatically detects the netscore column:
    • A Netscore Range slider appears in the sidebar (below Time Range)
    • Use it to filter the network by influence score
    • The filter applies across all chart types (Network Graph, Sankey, Venn, Time Series)
  4. In Graph Settings, select Netscore as the edge width mode to visually size edges by their influence score

Performance

Dataset SizeApproximate Time
10,000 rows~1 second
100,000 rows~5 seconds
500,000 rows~25 seconds

Column Auto-Detection

The tool automatically detects columns by name pattern:

ColumnDetected Names
Sendersender, senderaddress, from, source, mail_from, organizer
Recipientrecipient, recipientaddress, to, target, mail_to
Datedate, datetime, timestamp, created, sent, received, time

If names don't match, the tool falls back to positional detection (columns 1, 2, and the first parseable date column).

Timestamp Formats

Supported formats include:

  • ISO 8601: 2024-01-15T09:30:00Z
  • Date only: 2024-01-15, 15/01/2024
  • Unix epoch (seconds): 1705312200
  • With timezone: 2024-01-15T09:30:00+05:00

ML Architecture

Why HistGradientBoostingRegressor

Gradient boosting was chosen over alternatives for this task:

  • Handles mixed feature types (log-scaled continuous, binary, percentile) without normalization
  • Robust to outliers in communication data (e.g., automated mass-senders)
  • Fast training and inference — handles 500K+ rows in seconds
  • Natively supports missing values (common in sparse network data)

HistGradientBoosting specifically uses histogram-based splitting, making it significantly faster than standard GradientBoosting on large datasets while maintaining comparable accuracy.

Hyperparameters: max_iter=50, max_depth=4, learning_rate=0.2, max_leaf_nodes=31, min_samples_leaf=50

Feature Engineering (26 Features)

Features are grouped into five categories, each capturing a different dimension of network influence:

CategoryFeaturesWhy
Graph centrality (5)PageRank, HITS hub/authority scores, geometric meanCaptures structural importance — who sits at critical junctions in the network
Message volume (5)Send/receive counts, percentile ranks, pair volumeRaw activity level — high-volume communicators have more influence
Turnaround speed (3)Median reply latency (sender/recipient), pair averageResponsiveness signals engagement — fast repliers are more influential
Relationship breadth/depth (7)Unique contacts, reciprocity ratio, avg messages per relationshipDistinguishes broad connectors from deep relationships
Temporal (3)Activity span, pair recencyRecent and sustained activity indicates active influence

All volume and count features use log1p scaling to compress heavy-tailed distributions. Centrality scores are multiplied by 1e6 before log-scaling to preserve precision.

Output Scaling

Raw model predictions are mapped to a 1–50 integer range using MinMaxScaler, then rounded and clipped. This ensures scores are interpretable and comparable across datasets.

Model Artifact

The netscore_model.pkl file (126 KB) contains:

  • Trained HistGradientBoostingRegressor model
  • Fitted MinMaxScaler for output normalization
  • feature_cols list ensuring correct feature ordering

Top Features by Importance

FeatureImportanceDescription
pair_recency_d0.149Days since last interaction on this edge
s_volume0.093Sender's total message count (log-scaled)
r_volume0.090Recipient's total message count (log-scaled)
pair_volume0.072Messages on this specific edge (log-scaled)
s_pagerank0.061Sender's PageRank centrality (log-scaled)

Recency dominates because recent communication is the strongest signal of active influence in organizational networks.