Clustering Threat Actor Infrastructure: ASN, Registrar, and TLS Fingerprints

Threat actor infrastructure cluster visualization

Infrastructure attribution is one of the most valuable and most misused capabilities in threat intelligence. Analysts frequently pivot from a single IOC to a broad infrastructure cluster using shared ASN or registrar as the primary cluster signal — a methodology that produces clusters too wide to be meaningful and narrows attribution only when combined with stronger secondary signals. The combination of ASN/registrar overlap with TLS certificate fingerprinting changes the cluster quality substantially.

Why Single-Dimension Clustering Fails

Shared ASN is the most commonly used infrastructure clustering signal, and the weakest one at scale. Autonomous System Numbers correspond to routing domains operated by hosting providers, ISPs, and cloud platforms. ASN 14618 (Amazon AWS US-East) hosts hundreds of thousands of IP addresses operated by thousands of different customers. Finding that two C2 IPs share the same ASN does not indicate shared operator — it indicates shared hosting preference, which is a trivially common characteristic.

Even at the level of specific hosting providers known to attract threat actor activity — certain bulletproof hosting providers in Eastern Europe and Southeast Asia that are documented as preferred infrastructure for specific APT groups — ASN-based clustering requires at minimum a secondary filter to produce meaningful cluster sizes. ASN-only pivots from a threat intelligence report produce so many associated IPs that the resulting block list is operationally unworkable.

Registrar overlap is marginally stronger as a signal because threat actors tend to use a small set of registrars that accept privacy-shielded registrations or have inconsistent abuse response policies. However, major privacy-forward registrars are used by millions of legitimate domains as well, and registrar-based clustering still produces wide, low-precision clusters without additional context.

TLS Certificate Fingerprinting as a Cluster Anchor

TLS certificates provide several clustering signals that are significantly stronger than ASN or registrar alone. The most useful are Subject Alternative Name (SAN) patterns, certificate organizational fields, issuer selection, and certificate validity period characteristics.

SAN wildcard and subdomain patterns: Threat actors configuring C2 infrastructure tend to use consistent naming patterns across multiple domains — either because they use scripted infrastructure deployment tools or because specific naming conventions are embedded in the C2 framework's default configuration. A certificate issued for *.api.{domain} with a specific subdomain structure across 8 different domains registered in the same 72-hour window is a strong clustering signal even when the domains themselves appear unrelated.

Let's Encrypt issuance rate anomalies: Let's Encrypt is widely used for legitimate websites and is also popular for C2 infrastructure because it provides valid certificates with no manual approval process and leaves minimal registration trail. Analyzing Let's Encrypt certificate issuance patterns from certificate transparency logs can identify clusters of domains receiving certificates within short time windows — characteristic of infrastructure deployment scripts that provision multiple C2 nodes simultaneously.

Self-signed certificate structure: When threat actors use self-signed certificates (common for internal C2 management infrastructure or when they want to avoid CT log visibility), the certificate generation parameters can be distinctive. Specific field values in the Subject or Issuer distinguished name — common names like "Internet Widgits Pty Ltd" from OpenSSL default configuration, or specific organizational unit strings — appear repeatedly across infrastructure clusters because the same generation script is used. These self-signed certificate signatures can be extracted via JA3S fingerprinting of the TLS ServerHello and used as cluster identifiers across multiple IPs.

The Multi-Dimensional Clustering Model

Effective infrastructure clustering uses multiple signals with explicit weighting rather than sequential single-signal pivoting. The cluster model ThreatPulsar uses for infrastructure attribution assigns candidate IPs and domains a pairwise similarity score across four dimensions:

Certificate chain similarity (weight: 0.35): Shared root CA, intermediate CA, or certificate fingerprint patterns. This is the highest-weight signal because certificate infrastructure choices are more operator-specific than hosting choices.

Registration proximity (weight: 0.25): Whether the domains were registered within the same 72-hour window, using the same registrar, and with the same privacy shielding configuration. Registration clustering indicates coordinated infrastructure deployment rather than coincidental shared hosting.

ASN and hosting provider (weight: 0.20): Lowest-weight primary signal. Contributes to the cluster score but cannot by itself produce a high-confidence attribution. Specific bulletproof hosting ASNs with documented threat actor associations contribute higher scores than generic cloud hosting ASNs.

Passive DNS co-resolution patterns (weight: 0.20): Whether the IPs and domains in the candidate cluster have been observed resolving from the same set of source IPs, and whether they share resolution timing patterns. C2 infrastructure managed by the same operator often shows similar DNS TTL configurations and resolution timing patterns because the same DNS management tool is used.

Pairs with a combined similarity score above 0.75 are grouped into the same cluster. Clusters are then compared against the existing threat actor profile database to identify whether the characteristic patterns match any previously attributed group.

Practical Limitations and Attribution Confidence

Infrastructure attribution has a fundamental limitation: infrastructure is operationally reusable and sometimes sold or leased between operators. An IP address that was definitively used in a Lazarus Group campaign in 2023 may have been repurposed by a completely different threat actor in 2025. Historical attribution in threat intelligence reports does not constitute current attribution.

ThreatPulsar attributes infrastructure to threat actor clusters with explicit time-bounded confidence: a cluster match is reported with a timestamp of the most recent corroborating evidence, and the confidence score decays as that evidence ages. A cluster match with supporting evidence from 6 months ago receives a lower current confidence score than a match with evidence from last week. This decay prevents historical attribution from creating permanent false associations that mislead current incident response.

The goal of infrastructure clustering is not to produce definitive attribution — that requires additional corroborating evidence from multiple intelligence sources and, in nation-state cases, government reporting. The goal is to generate actionable hypotheses for threat hunters: "this C2 IP shares infrastructure characteristics with APT-associated clusters. Hunt for related indicators using this cluster's TLS certificate pattern and registration characteristics."

Using Infrastructure Clusters for Proactive Hunting
The most operationally valuable application of infrastructure clustering is proactive hunting: given a cluster of confirmed malicious infrastructure, identify related infrastructure that has not yet been used against the organization or reported publicly. This allows SOC teams to pre-block emerging C2 infrastructure before it becomes active against them.

The hunting workflow starts from a confirmed C2 IP, expands to the full cluster via the multi-dimensional model, and then searches for cluster members that have not yet appeared on any commercial threat feed — because they were recently provisioned and have not yet generated enough activity to be independently identified. These are the highest-value proactive blocks: confirmed-malicious by cluster association, not yet reported, and therefore not in any feed-based block list.

This clustering approach to proactive hunting connects directly to the YARA rule generation methodology described in our article on generating YARA rules from enriched IOC clusters — both apply cluster context to extend detection coverage beyond individual indicators.

Conclusion

Infrastructure clustering is a useful tool when the cluster signal is multidimensional and the attribution confidence is explicitly bounded. Single-dimension clustering on ASN or registrar alone produces attribution that is too wide to be operationally useful. Adding TLS certificate characteristics, registration timing patterns, and passive DNS co-resolution as additional clustering dimensions narrows the cluster to a size where the attribution hypothesis is meaningful and the proactive hunting value is real.

The honest framing for infrastructure attribution is that it narrows the hypothesis space, not that it resolves it. Clusters identify plausible shared operators; they do not identify them definitively. Using infrastructure clusters as hunting leads rather than as attribution verdicts is the operationally appropriate model.

Back to Insights