Marian Marinov — mm@yuhu.biz
Given an IP address and a table of CIDR prefixes,
which prefix matches it?
And if several match — return the most specific one.
That's Longest-Prefix Match (LPM).
10.0.0.0/8 ā "private-rfc1918"
10.1.0.0/16 ā "office"
10.1.42.0/24 ā "dev-team"
0.0.0.0/0 ā "default"
lookup("10.1.42.7")
# ā "dev-team" (longest match)
lookup("10.1.99.1")
# ā "office" (next longest)
lookup("8.8.8.8")
# ā "default" (fallback)
Every packet needs a routing decision. Millions per second. This is the canonical LPM problem.
GeoIP, ASN lookup, ACL enforcement, rate limiting per network, spam filtering by netblock, CDN routing.
Net::Patricia is the incumbent.
Fast, but a C extension from the 90s with an aging API and
no first-class IPv6.
We want: fast, IPv6-native, arbitrary Perl values as payload, pure-Perl fallback.
# Linear scan
for my $cidr (@table) {
if (ip_in_cidr($ip, $cidr)) {
push @matches, $cidr;
}
}
my $best = longest(@matches);
Simple. Correct. O(N) per lookup.
Real BGP full table today:
~1,000,000 IPv4 prefixes
~250,000 IPv6 prefixes
At a million prefixes and 100 nanoseconds per compare, that's 100 milliseconds per packet. You get 10 packets per second per core.
We need a data structure, not a loop.
Binary trie (Patricia): 1 bit per level
Great tool. But each level costs one pointer chase. Up to 32 chases for IPv4, 128 for IPv6 ā one bit at a time.
Can we do better by reading more bits per step?
Balanced (Allotment) Routing Table
IPv4 = 4 octets ā at most 4 levels. IPv6 = 16 octets ā at most 16 levels. Fewer pointer chases per lookup.
Every prefix of length 0..7 within an 8-bit stride maps to a slot in a complete binary tree with indices 1..255.
A node has up to 256 children ā but most are empty. Store only what's there.
Most real-world tables are sparse. Path compression turns a would-be million-node trie into tens of thousands of nodes.
Same API. Pick your trade-off.
use Net::BART; # or ::XS, or ::Go
my $t = Net::BART->new;
$t->insert("10.0.0.0/8", "private");
$t->insert("10.1.0.0/16", "office");
$t->insert("10.1.42.0/24", "dev-team");
$t->insert("0.0.0.0/0", "default");
$t->insert("2001:db8::/32", "documentation"); # IPv6, same call
my ($val, $ok) = $t->lookup("10.1.42.7"); # ā ("dev-team", 1)
my ($val, $ok) = $t->lookup("10.1.99.1"); # ā ("office", 1)
my ($val, $ok) = $t->get("10.1.0.0/16"); # exact match
$t->contains("10.1.42.7"); # ā 1 (bool)
my ($old, $ok) = $t->delete("10.1.42.0/24"); # ā ("dev-team", 1)
$t->walk(sub { # iterate all
my ($prefix, $value) = @_;
say "$prefix => $value";
});
printf "IPv4=%d IPv6=%d total=%d\n",
$t->size4, $t->size6, $t->size;
One API. Three implementations. IPv4 and IPv6 in the same table.
$self->[0] is ~30% faster than
$self->{prefixes}. Nodes are
arrayrefs, not hashrefs.
The lookup loop inlines get_child
and lpm_test directly — no method dispatch
per octet.
_do_insert() is a plain sub, not a method.
Skipping $self-> matters when you recurse 4Ć per call.
index() + substr() is
~3Ć faster than a regex for the canonical
A.B.C.D form.
256-entry lookup array ā Perl has no built-in popcount, but a table gives constant-time counts on each byte.
The 256-entry @LOOKUP_TBL
turns per-node LPM into four 64-bit ANDs.
Build it once at module load, reuse forever.
Same tests, same cross-check corpus — all four Perl implementations (BART, BART::XS, BART::Go, Patricia) agree on 10,000/10,000 random lookups.
All timings include IP string parsing — realistic workload.
Net::BART::XS is 3.5Ć faster than Net::Patricia and hits 28% of the native Go ceiling from Perl land.
| Operation | Net::Patricia | Net::BART | Net::BART::XS | Net::BART::Go | Go native |
|---|---|---|---|---|---|
| Insert | 3.2 µs | 14.3 µs | 0.77 µs | 0.67 µs | 0.26 µs |
| Lookup | 1.9 µs | 9.8 µs | 0.56 µs | 0.91 µs | 0.06 µs |
| Contains | n/a | 4.4 µs | 0.34 µs | 0.32 µs | 0.008 µs |
| Get / exact | 2.5 µs | 10.4 µs | 0.50 µs | 0.91 µs | 0.07 µs |
| Delete | 5.9 µs | 31.3 µs | 1.27 µs | 1.60 µs | 0.50 µs |
XS wins on lookup/get/delete (direct C, no cgo hop). Go wins on contains/insert (no string alloc across boundary; GC amortization).
Lookup ops/sec at 100, 1K, 10K, 100K prefixes
XS scales best ā the trie depth is capped at 4 (IPv4), so 100ā100K is almost flat. Patricia bit-tries pay per address bit.
Native Go with pre-parsed addresses, no Perl boundary
| Operation (100K) | Go native (parsed) | Go native (strings) | Net::BART::XS | Net::BART::Go |
|---|---|---|---|---|
| Insert | 3,000K | 4,559K | 1,200K | 1,400K |
| Lookup | 7,443K | 7,175K | 2,000K | 1,100K |
| Contains | 57,249K | 21,688K | 2,700K | 2,900K |
| Get | 10,299K | 6,848K | 1,700K | 899K |
| Delete | 1,903K | ā | 695K | 516K |
Native Go contains at 17ns per call ā 57M ops/sec.
That's the algorithmic limit. Pure BART with no language overhead.
Perl callers pay the SV allocation, refcount, and (for Go) the cgo hop.
Getting 28% of the ceiling from Perl feels like a fair trade.
| Net::BART::XS | Net::BART::Go | Net::Patricia | Net::BART | |
|---|---|---|---|---|
| Language | C (XS) | Go (cgo + XS) | C (XS) | Pure Perl |
| Lookup / 100K | 0.56 µs | 0.91 µs | 1.9 µs | 9.8 µs |
| Deps | C compiler | C compiler + Go 1.21+ | libpatricia | none |
| IPv6 | native | native | separate trie | native |
| Values | any scalar | strings | int / closure | any scalar |
| Best for | throughput | upstream parity | legacy code | portability |
Default choice: Net::BART::XS.
Fall back to Net::BART when you can't compile. Reach for ::Go if you
want to inherit gaissmai's ongoing work upstream.
cpanm Net::BART
# or from source
git clone https://.../Net-BART
perl Makefile.PL
make && make test
use Net::BART::XS;
my $t = Net::BART::XS->new;
$t->insert($_->[0], $_->[1])
for @cidr_table;
my ($v, $ok) =
$t->lookup($ip);
metacpan.org/pod/Net::BART
github.com/gaissmai/bart ā upstream Go
Knuth, TAOCP vol. 4, fascicle 7 ā ART origin
Marian Marinov Ā· mm@yuhu.biz Ā· Summer 2026 Perl Community Conference