Scroll to navigation

ASNCOUNTER(1) collect hits per ASN and netblock ASNCOUNTER(1)

NAME

asncounter — collect hits per ASN and netblock

DESCRIPTION

Count the number of hits (HTTP, packets, etc) per autonomous system number (ASN) and related network blocks.

This is useful when you get a lot of traffic on a server to figure out which network is responsible for the traffic, to direct abuse complaints or block whole networks, or on core routers to figure out who your peers are and who you might want to seek particular peering agreements with.

SYNOPSIS

asncounter OPTIONS [ADDRESS ...]

OPTIONS

show this help message and exit
where to store pyasn cache files, default: ~/.cache/pyasn
disable prefix count
disable ASN count
disable ASN to name resolution in output
only show top N entries, default: 10
input file, default: stdin
input format, default: line
--scapy-filter SCAPY_FILTER
BPF filter to apply to incoming packets, default: ip and not src host 0.0.0.0 and not src net 192.168.0.0/24
open an interface instead of stdin for packets, implies -I scapy, auto-detects by default
write stats or final prometheus metrics to the given file, default: stdout
output format, choices: tsv, prometheus, null, default: tsv
start a prometheus server on the given port, default disabled, port 8999 if unspecified
download a recent RIB cache file and exit
run a REPL thread in main loop
setup a REPL socket with manhole
more debugging output
zero or more IP addresses to parse directly from the command line, before the input stream is read. disables the default stdin reading and --input-format cannot be changed.

INPUT FORMATS

The --input-formats option warrants more discussion.

line

The line input formats treats each line in the stream as an IP address, counting as one hit.

Empty lines are skipped, and comments – whatever follows the pound (#) sign – are trimmed. Whatever cannot be parsed as an IP address is loggd as a warning and skipped.

This, for example, counts as a hit on two different IP addresses, for a total of two hits. It will also yield a warning:

192.0.2.1 # comment
2001:DB8::1
# comment
garbage that generates a warning
    

tuple

Same as the line input format, except the count is specified in a second, whitespace-separated field.

This, for example, will count one hit for the first IP address, and two for the second one, and will generate a warning.

192.0.2.1 1 # comment
2001:DB8::1 2
# comment
garbage that generates a warning
    

The “count” field can be anything: to represent a count, but also sizes, timings, asncounter doesn’t care.

The counts are actually parsed as floats, as Python understand them.

The default output format (tsv) will round the numbers to the nearest even integer. This, for example, adds up to 5, which might be surprising to some (because Python rounds 2.5 to 2 and not 3):

192.0.2.2 3.4
192.0.2.2 2.5
    

This is known as the “rounding half to even” rule or the IEEE 754 standard.

If the --output-format is set to prometheus floats will be recorded as accurately as Python allows. In that context, the above correctly sums up to 5.9.

tcpdump

The tcpdump format is a bit of an odd ball: it parses a tcpdump(1) line with a regular expression to extract the source IP address, and counts that.

It could be extended to count the packets sizes but currently does not do so. Likewise, it only tracks the left (source) side of packets, and not the destination, but could be extended to track both.

This approach likely can’t deal with a multi-gigabit per second small packet attack (2 million packets per second or more). But in a real production environment, it could easily deal with regular the 100-200 megabit per second traffic, where tcpdump and asncounter each took about 2% of one core to handle about 3-5 thousand packets per second.

scapy

The scapy input format is also special: instead of parsing text lines, it parses packets.

With the --interface flag, it will open the default interface unless one is provided (e.g. --interface is generally equivalent to --interface eth0 if eth0 is the primary interface). This requires elevated privileges.

This is much slower than the tcpdump parser (close to full 100% CPU usage) in a 100-200mbps scenario like above, but could eventually be leveraged to implement byte counts, which are harder to extract from tcpdump because of the variability of its output.

This only counts packets, regardless of direction, and, like tcpdump, only keeps track of source IP addresses. Like tcpdump, it could also be improved by tracking sizes instead of counts, but does not currently do so.

OUTPUT FORMATS

The --output-format argument also warrants a little more discussion.

tsv

TSV stands for Tab-Separated Values. It’s a poorly designed output formats that dumps two tables where rows are separated by newlines and columns by tabs. One table shows per ASN counts, the other shows per prefix counts.

As mentioned in the above tuple section, counts are rounded when recorded in tsv mode. This is to simplify the display, in theory, the underlying https://docs.python.org/3/library/collections.html#collections.Counterfl Counter supports floats as well.

If more precision, long term storage, or alerting are needed, the prometheus output format is preferred.

This format is useful because it doesn’t require any dependency outside of the standard library (and, obviously, pyasn).

prometheus

The prometheus output format keeps tracks of counters inside Prometheus data structures. With the --port flag, it will open up a port (defaulting to 8999) where metrics will be exposed over HTTP, without any special security, on all interfaces.

Otherwise, upon completion, results will be written in a textfile collector-compatible format.

null

The null output formats doesn’t display anything. It can be used for debugging, but internally uses the same recorder as the tsv format.

EXAMPLES

Simple web log counter

This extracts the IP addresses from current access logs and reports ratios:

> awk '{print $2}' /var/log/apache2/*access*.log | asncounter
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count   percent ASN AS
12779   69.33   66496   SAMPLE, CA
3361    18.23   None    None
366 1.99    66497   EXAMPLE, FR
337 1.83    16276   OVH, FR
321 1.74    8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
309 1.68    14061   DIGITALOCEAN-ASN, US
128 0.69    16509   AMAZON-02, US
77  0.42    48090   DMZHOST, GB
56  0.3 136907  HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
53  0.29    17621   CNCGROUP-SH China Unicom Shanghai network, CN
total: 18433
count   percent prefix  ASN AS
12779   69.33   192.0.2.0/24    66496   SAMPLE, CA
3361    18.23   None        
298 1.62    178.128.208.0/20    14061   DIGITALOCEAN-ASN, US
289 1.57    51.222.0.0/16   16276   OVH, FR
272 1.48    2001:DB8::/48   66497   EXAMPLE, FR
235 1.27    172.160.0.0/11  8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
94  0.51    2001:DB8:1::/48 66497   EXAMPLE, FR
72  0.39    47.128.0.0/14   16509   AMAZON-02, US
69  0.37    93.123.109.0/24 48090   DMZHOST, GB
53  0.29    27.115.124.0/24 17621   CNCGROUP-SH China Unicom Shanghai network, CN
    

This can also be done in real time of course:

tail -F /var/log/apache2/*access*.log | awk '{print $2}' | asncounter
    

The above report will be generated when the process is killed. Send SIGHUP to show a report without interrupting the parser:

pkill -HUP asncounter
    

You can count sizes with --input-format=tuple as well. Assuming the size field is in the 10th column, this will sum sizes instead of just number of hits:

tail -F /var/log/apache2/*access*.log | awk '{print $1 $10}' |
asncounter --input-format=tuple
    

If logs hold that information, you can also add up processing times, for example.

tcpdump parser

Extract IP addresses from incoming TCP/UDP packets on eth0 and report the top 5:

> tcpdump -c 10000 -q -i eth0 -n -Q in "(udp or tcp)" | asncounter --top 5 --input-format tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz
INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...
INFO: loading /root/.cache/pyasn/asnames.json
ASN     count   AS
136907  7811    HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
8075    254     MICROSOFT-CORP-MSN-AS-BLOCK, US
62744   164     QUINTEX, US
24940   114     HETZNER-AS, DE
14618   82      AMAZON-AES, US
prefix  count
166.108.192.0/20        1294
188.239.32.0/20 1056
166.108.224.0/20        970
111.119.192.0/20        951
124.243.128.0/18        667
    

A query similar to the HTTP log parser might be:

tcpdump -q -i eth0 -n -Q in "tcp and (port 80 or port 443)" | grep
'Flags \[S\]' | asncounter --input-format=tcpdump --repl
    

... otherwise you will get different results from a pure packet count, as various connections will yield different number of packets! The above counts connection attempts, which is still different than an actual HTTP hit, as the connection could be refused before it reaches the webserver or aborted before it gets logged properly.

It’s still a good estimate, and is especially useful if you do not log IP addresses, for example on high traffic caching servers.

Note that we use grep above because tcpdump’s tcp[tcpflags] & tcp-syn != 0 only works for IPv4 packets, a disappointing (but understandable) limitation.

scapy parser

Extract IP addresses directly from the network interface, bypassing tcpdump entirely:

asncounter --interface
    

REPL

With --repl, you will drop into a Python shell where you can interactively get real-time statistics:

> awk '{print $2}' /var/log/apache2/*access*.log | asncounter --repl --top 2
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: starting interactive console, use recorder.display_results() to show current results
INFO: recorder.asn_counter and .prefix_counter dictionaries have the full data
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
>>> recorder.display_results()
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count   percent ASN AS
13008   69.38   66496   SAMPLE, CA
3422    18.25   None    None
total: 18748
count   percent prefix  ASN AS
13008   69.38   192.0.2.0/24    66496   SAMPLE, CA
3422    18.25   None        
total: 18748
>>> recorder.asn_counter
Counter({66496: 13008, None: 3422, [...]})
>>> recorder.prefix_counter
Counter({'192.0.2.0/24': 13008, None: 3422, [...]})
    

So you can get the actual number of hits for an AS, even if it’s not listed in the --top entries with:

>>> recorder.asn_counter.get(66496)
13008
    

Blocking whole networks

asncounter does not block anything: it only counts. Another mechanism needs to be used to actually block attackers or act on the collected data.

If you want to block the network blocks, you can use the shown netblocks directly in (say) Linux’s netfilter firewall, or Nginx’s access module geo module. For example, this will reject traffic from a network with iptables:

iptables -I INPUT -s 192.0.2.0/24 -j REJECT 
    

or with nftables:

nft insert rule inet filter INPUT 'ip saddr 192.0.2.0/24 reject'
    

This will likely become impractical with large number of networks, look into IP sets to scale that up.

With Nginx, you can block a network with the deny directive:

deny 192.0.2.0/24;
    

This will return a 403 status code. If you want to be fancier, you can return a tailored status code and build a larger list with the geo module:

geo $geo_map_deny {

default 0;
192.0.2.0/24 1; } if ($geo_map_deny) {
return 429; }

Many networks can be listed in the geo block relatively effectively.

[pyasn][] doesn’t (unfortunately) provide an easy command line interface to extract the data you need to block an entire AS. For that, you need to revert to some Python. From inside the --repl loop:

print("\n".join(sorted(recorder.asn_all_prefixes(64496))))
    

This will give you the list of ALL prefixes associated with AS64496, which is actually empty in this case, as AS64496 is an example AS from RFC5398.

Note the list of prefixes is not aggregated by default. If netaddr is installed, you can pass aggregate=True to reduce the set.

Aggregating results

It might be worth aggregating large number of netblocks for performance reasons. Network block announcements can be spread in multiple contiguous blocks for various reasons and can often be unified in smaller sets. For IPv4-only, iprange is good (and fast) enough:

> grep -v :: networks > networks-ipv4
> iprange < networks-ipv4 > networks-ipv4-filtered
> wc -l networks*

588 networks
495 networks-ipv4
181 networks-ipv4-filtered

If you have it installed, the netaddr Python package can also do that for you, and it supports IPv6:

import netaddr
print("\n".join([str(n) for n in netaddr.cidr_merge(recorder.asndb.get_as_prefixes(64496))]))
    

Note that asncounter can aggregate those results directly now, for example:

print(recorder.asn_all_prefixes_str(66496, aggregate=True))
    

... but, as above, it requires the netaddr package to be available.

Selective blocking

A more delicate approach is to block all network blocks from a specific ASN that have been found in the result sets, instead of blocking the entire netblock.

The recorder.as_prefixes and recorder.as_prefixes_str functions can do this for you, merging multiple ASNs and aggregating with netaddr as well:

print(recorder.asn_prefixes_str(66496, 66497, aggregate=True))
    

Note that the asn_prefixes selectors are not implemented in Prometheus mode.

Remember you can extract the list of current ASNs and prefixes just by looking at the dictionary keys as well:

print("\n".join(recorder.asn_counter.keys()))
print("\n".join(recorder.prefix_counter.keys()))
    

FILES

~/.cache/pyasn/
Default storage location for pyasn cache files.
/run/$UID/asncounter-manhole-$PID or ~/.local/.state/asncounter-manhole-$PID
Default location for the debugging manhole socket, if enabled.

LIMITATIONS

only counts, does not calculate bandwidth, but could be extended to do so
does not actually do any sort of mitigation or blocking, purely an analysis tool; if you want such mitigation, hook up asncount in Prometheus and AlertManager with web hooks, this is not a fail2ban rewrite
test coverage is relatively low, 37% as of this writing. most critical paths are covered, although not the scapy parser or the RIB file download procedures
requires downloading RIB files, could be improved by talking directly with a BGP router daemon like Bird or FRR
only a small set of tcpdump outputs have been tested
the REPL shell does not have proper readline support (keyboard arrows, control characters like “control a” do not work)

Note that this documentation and test code uses sample AS numbers from RFC5398, IPv4 addresses from RFC5737, and IPv6 addresses from RFC3849. Some more well known entities (e.g. Amazon, Facebook) have not been redacted from the output for clarity.

Performance considerations

As mentioned above, this will unlikely tolerate multi-gigabit denial of service attacks. The tcpdump parser, however, is pretty fast and should be able to sustain a saturated gigabit link under normal conditions. The scapy parser is slower.

Memory usage seems reasonable: on startup, it uses about 250MB of memory, and a long-running process with about 40 000 blocks was using about 400MB.

By extrapolation, it is expect that data on the full routing table (currently 1.2 million entries) could be held within 12 GB of memory, although that would be a rare condition, only occurring on a core router with traffic from literally the entire internet.

Security considerations

There’s an unknown in the form of the C implementation of a Radix tree in pyasn. asncounter itself should be fairly safe: it does not trust its inputs, and the worse it can do is likely a resource exhaustion attack on high traffic.

It can run completely unprivileged as long as it has access to the input files, although in many scenarios people will not bother to drop privileges before calling it and it will not, itself, attempt to do so.

Privileges can be dropped with systemd-run, for example:

systemd-run --pipe --property=DynamicUser=yes \

--property=CacheDirectory=asncounter \
--setenv=XDG_CACHE_HOME=/var/cache/asncounter \
-- asncounter

This interacts poorly with --repl option, as it tries to reopen the tty for stdin. You might have better luck with sharing a debug socket with --manhole:

systemd-run --pipe --property=DynamicUser=yes \

--property=CacheDirectory=asncounter \
--setenv=XDG_CACHE_HOME=/var/cache/asncounter \
-- asncounter --manhole=/var/cache/asncounter/asncounter-manhole

Then you can open a Python debugging shell for further diagnostics with:

nc -U /var/cache/asncounter/asncounter-manhole
    

AUTHOR

Antoine Beaupré anarcat@debian.org

SEE ALSO

tcpdump(8), fail2ban(1)