On "Normalizing" or "Scaling" Cybersecurity Metrics and Measuring The Right Thing For The Right Entities

Introduction

One challenge of cyber security is deciding where to prioritize one’s limited cyber security resources. Who’s doing “okay?” Who desperately needs help? Who should we just quarantine until they can get their chaotic users under control? At a minimum, can we at least rank countries according to who’s worst/best? We need to have some basis for prioritizing our limited time and resources.

Focusing On Botted Hosts Sending Spam

To make this problem concrete, consider the terrific spam-related metrics shared by the Spamhaus Composite Blocking List (CBL).¹ On Wednesday, March 9th, 2016, the CBL knew about 9,121,043 IP addresses associated with spam-sending botted hosts. That’s a LOT of malware-infected systems!

As shown in table 1, just three countries – India, Vietnam and China – account for roughly 1/3rd of all CBL listings, and a total of just 10% of all countries (20 out of 200) collectively accounted for ~3/4ths of all CBL listings:

Country	Listings	% Total Listings	% Cumulative Total Listings	Rank
Total	9,121,043	—	—	—
IN	1,182,291	12.96	12.96	1
VN	998,743	10.95	23.91	2
CN	766,659	8.41	32.32	3
RU	511,132	5.60	37.92	4
BR	437,531	4.80	42.72	5
ID	424,372	4.65	47.37	6
IR	347,215	3.81	51.18	7
US	227,315	2.49	53.67	8
TH	218,422	2.39	56.06	9
MX	201,497	2.21	58.27	10
PK	189,572	2.08	60.35	11
IT	178,898	1.96	62.31	12
AR	173,898	1.96	62.31	13
TW	163,190	1.79	66	14
JP	150,565	1.65	67.65	15
DE	147,173	1.61	69.27	16
TR	137,183	1.50	70.77	17
EG	133,232	1.46	72.23	18
AU	130,114	1.43	73.66	19
VE	118,022	1.29	74.95	20

Surely we can all agree that those 20 countries represent the “worst of the worst” when it comes to malware infected systems used to send spam? Surprisingly, no.

Normalizing By Country Population

Representatives of some of those “top” countries, perhaps feeling a bit self-defensive, may be quick to point out that they’ve got huge populations, so it really isn’t “fair” to just compare “raw counts” between countries. E.g., India has 1,182,291 CBL listings, but spread over a population of over 1.25 billion Indians, that’s a rate of just (1,182,291 / 1,250,000,000) * 100 = 0.0945%

By comparison, Italy has 178,898 CBL listings, but a population of only 60 million, which yields a rate of (178,898 / 60,000,000) * 100 = 0.2981%.

Dividing Italy’s 0.2981% by India’s 0.0945%, we can see that Italy is currently 3.155 times “more infested” than India on a per-capita basis.

Arguably, then, Italy should “obviously” be prioritized ahead of India when it comes to any hypothetical anti-bot “clean up campaign,” right? Actually, no. Malware infections per capita represent a measure of infection density. Infection density is important if you’re thinking about efficient infection cleanup, but largely irrelevant if your goal is to reduce the impact of the bots on mail servers and their admins.

Ranking According to Actual Pain Delivered Toward A Target

Mail admins running mail servers under siege from spam really don’t care about “infection rates per capita.” They care about the spam traffic they’re seeing. Fortunately, the CBL has data about that as well. Focusing on spam that’s being delivered (vs. botted hosts potentially able to send spam) changes the picture dramatically:

Table 2. Spam Sent To One CBL Spamtrap, By Top 20 Origin Countries, Past Three Days

Country	Traffic	% Traffic	% Cumulative Traffic	Traffic Rank	Bot Rank	Spams/Bots
Total	158,082,638	100.00	—	—	—	—
US	52,440,285	33.17	33.17	1	8	231
BR	27,016,447	17.09	50.26	2	5	62
VN	15,119,324	9.56	59.83	3	2	15
RU	7,248,463	4.59	64.41	4	4	14
IN	5,384,253	3.41	67.82	5	1	4
MX	4,685,557	2.96	70.78	6	10	23
AR	3,257,637	2.06	72.84	7	13	18
PL	2,638,323	1.67	74.51	8	22	22
UA	2,557,023	1.62	76.13	9	23	22
ES	2,177,246	1.38	77.51	10	21	18
CO	1,639,560	1.04	78.54	11	34	34
IT	1,546,022	0.98	79.52	12	12	8
TR	1,360,135	0.86	80.38	13	17	9
TW	1,230,689	0.78	81.16	14	14	7
ID	1,212,763	0.77	81.93	15	6	2
IQ	1,179,500	0.75	82.67	16	36	28
CN	1,170,168	0.74	83.41	17	3	1
AT	1,034,575	0.65	84.07	18	38	25
RO	983,530	0.62	84.69	19	25	10
DE	969,088	0.61	85.30	20	16	6

Now the problem is clearly not the huge number of botted hosts in India, but the huge volumes of spam coming from botted systems in the United States. That is, while there are a relatively small number of botted hosts in the United States, those botted hosts are typically particularly aggressive, sending an average of 231 spam/bot, while by comparison, the botted hosts in India send a measly 4 spam/bot. Treating all botted hosts as if they were essentially equivalent is clearly unwarranted. Some of those infected systems are heavy artillery, while others are mere cap guns.

[We must also remember to take the above numbers with a “grain of salt” since that the statistics in the above table are based on a single CBL spamtrap site, which although they are among the best data currently available, may not be reflective of global spam flows overall]

Teasing Apart US Spam Traffic Sources

Let’s now drill down on spam traffic from US ISPs. Where does the majority of US-origin spam traffic actually come from?

For the purpose of this analysis, ISPs are identified by their Autonomous System Number, or “ASN.” Many ISPs use only a single ASN, but some (such as rr.com) are associated with multiple ASNs, often as a result of legacy networks getting acquired as a result of mergers and acquisitions. When multiple ASNs are associated with the same ISP, we’ve aggregated them for the purposes of this analysis.

Table 3. US ASNs, Ranked By Traffic/ASN, Last Three Days, 500K+ spam/ASN

ASN	Listings	% Total	Traffic	% Traffic	Rank	Spams/Bots
Total	9,098,790	100.00	158,082,638	—	—	17.37
Total rr.com (sum of all rr.com ASNs shown)	11,944	0.13	6,093,048	3.85	—	—
AS20001 rr.com US	2,427	0.03	1,595,433	1.01	7	657
AS10796 rr.com US	2,714	0.03	1,361,222	0.86	10	501
AS11427 rr.com US	2,180	0.02	1,068,145	0.68	14	489
AS11426 rr.com US	1,808	0.02	799,539	0.51	26	442
AS11351 rr.com US	1,583	0.02	747,733	0.47	28	472
AS12271 rr.com US	1,232	0.01	520,966	0.33	41	422
AS20115 charter.net US	4,998	0.05	5,322,681	3.37	3	1,064
AS46892 Winnebago US	102	0.00	2,609,218	1.65	5	25,580
AS12083 knology.net US	945	0.01	1,308,289	0.83	11	1,384
AS11232 midco.net US	308	0.00	1,025,612	0.65	15	3,329
AS33548 unwiredbb.com US	91	0.00	996,577	0.63	16	10,951
AS7922 comcast.net US	19,793	0.22	908,030	0.57	19	45
AS11979 blue.net US	175	0.00	883,478	0.56	21	5,048
AS30036 fortrex.com US	790	0.01	827,467	0.52	22	1,047
AS33363 mybrighthouse.com US	2,123	0.02	819,728	0.52	23	386
AS10835 vcn.com US	2,123	0.02	816,028	0.52	25	9,600
AS19108 suddenlink.net US	1,083	0.01	711,898	0.45	30	657
AS46606 Unified Layer, US	855	0.01	656,722	0.42	31	768
AS33588 bresnan.net US	309	0.00	626,872	0.40	32	2,028
AS174 cogentco.com US	1,436	0.02	566,408	0.36	37	394
AS5056 netins.net US	262	0.00	514,315	0.33	43	1,963
AS22773 cox.net US	2,575	0.03	510,427	0.32	44	198

Unfiltered Pain

There is one other reality that we must remember: the spam that the CBL sees all gets blocked (at least if you’re using the CBL as part of your spam filtering, as many sites do).

Thus, ironically, if we were to prioritize working on the ISPs that are most broadly represented in the CBL, you’d (in some ways) be wasting your time: the spam from those hosts is already getting blocked, at least at sites that use the CBL.

So now you can see the problem. We need to identify the hosts that are successfully delivering spam IN SPITE of block list entries and other anti-spam heuristics.

Documenting the “false negatives” that get through filtering is a hard and largely thankless job, and one that relies on inherently error-prone mechanisms such as users pushing a “this is spam” button, or perhaps the processing of mail streams by multiple categorization engines.

Author: Joe St Sauver, Ph.D., Scientist, Farsight Security, Inc., Member of CyberGreen’s Statistics Experts Group

¹ http://www.abuseat.org/