Net::BART — Searching for your IP
Summer 2026 Perl Community Conference

Net::BART

Searching for your IP

Marian Marinov — mm@yuhu.biz

The question we're solving

Given an IP address and a table of CIDR prefixes,
which prefix matches it?

And if several match — return the most specific one.
That's Longest-Prefix Match (LPM).

10.0.0.0/8       → "private-rfc1918"
10.1.0.0/16      → "office"
10.1.42.0/24     → "dev-team"
0.0.0.0/0        → "default"

lookup("10.1.42.7")
    # → "dev-team"     (longest match)

lookup("10.1.99.1")
    # → "office"       (next longest)

lookup("8.8.8.8")
    # → "default"      (fallback)

Where LPM matters

Routers & firewalls

Every packet needs a routing decision. Millions per second. This is the canonical LPM problem.

Web infrastructure

GeoIP, ASN lookup, ACL enforcement, rate limiting per network, spam filtering by netblock, CDN routing.

Perl land

Net::Patricia is the incumbent.
Fast, but a C extension from the 90s with an aging API and no first-class IPv6.

We want: fast, IPv6-native, arbitrary Perl values as payload, pure-Perl fallback.

The naive approach doesn't scale

# Linear scan
for my $cidr (@table) {
    if (ip_in_cidr($ip, $cidr)) {
        push @matches, $cidr;
    }
}
my $best = longest(@matches);

Simple. Correct. O(N) per lookup.

Real BGP full table today:
~1,000,000 IPv4 prefixes
~250,000 IPv6 prefixes

At a million prefixes and 100 nanoseconds per compare, that's 100 milliseconds per packet. You get 10 packets per second per core.

We need a data structure, not a loop.

The classical answer: Patricia tries

Ā· 0 1 One bit per level → up to 32 levels for IPv4, 128 for IPv6

Binary trie (Patricia): 1 bit per level

  • Net::Patricia — battle-tested C extension
  • Compressed binary trie: skip runs of shared bits
  • Lookup ā‰ˆ O(W) where W is address width
  • ~500K lookups/sec at 100K prefixes

Great tool. But each level costs one pointer chase. Up to 32 chases for IPv4, 128 for IPv6 — one bit at a time.

Can we do better by reading more bits per step?

BART

Balanced (Allotment) Routing Table

The lineage

  • Knuth, TAOCP vol. 4 fasc. 7
    introduces ART — Allotment Routing Tables
  • gaissmai/bart (Go, 2023) — a modern implementation
    with popcount-compressed nodes
  • Net::BART (Perl, 2026) — this talk

The three ideas

  • Multibit trie: read 8 bits at a time
  • ART indexing: a whole octet worth of prefixes → O(1) LPM per node
  • Popcount arrays: store only occupied slots

Idea #1 — Multibit trie, stride 8

root (depth 0) octet 1 → child 10 … … 10.* depth 1 … … 1 10.1.* depth 2 42 10.1.42.* depth 3 "dev-team" Lookup 10.1.42.7 → 4 octets, 4 pointer chases vs. 32 for a bit trie

IPv4 = 4 octets → at most 4 levels. IPv6 = 16 octets → at most 16 levels. Fewer pointer chases per lookup.

Idea #2 — ART: LPM in O(1) per node

Every prefix of length 0..7 within an 8-bit stride maps to a slot in a complete binary tree with indices 1..255.

/0 /1 /2 /3 /4..7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 indices 16..255 — deeper prefix lengths /4../7 within the stride Look up octet 0x0A (00001010b): idx = (octet >> 1) + 128 = 5 + 128 = 133 walk ancestors 133 → 66 → 33 → 16 → 8 → 4 → 2 → 1 first hit in this node's bitset wins → LPM All this happens without pointer traversal. A precomputed @LOOKUP_TBL gives the ancestor bitset directly. LPM = bitset AND ancestors → highest set bit. Constant time. Two 256-bit ANDs. Done.

Idea #3 — Popcount sparse arrays

A node has up to 256 children — but most are empty. Store only what's there.

Bitset256 4 Ɨ uint64 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 2 4 7 11 …255 Values array only the "1" slots child at 2 → Bart child at 4 → Leaf child at 7 → Fringe child at 11 → Bart Lookup child at slot 7: 1. Test bit 7 → set? yes. 2. rank = popcount(bitset & ((1 << 7) - 1)) = 2 3. return values[2] — O(1) Why it wins 32 bytes of bitset + N ptrs, not 256 ptrs. Hardware POPCNT — 1 cycle on modern x86. Cache-friendly, compact, no hashing.

Path compression: Leaf & Fringe nodes

Bart root octet 10 (multiple children below) 10.* 10.1.* 10.1.42.* Full Bart chain (needed — branching) octet 192 LeafNode 192.168.7.44/29 value: "office" Only one non-aligned prefix under this octet → skip the whole chain, store here. "I'll expand later if a sibling arrives" octet 8 → /16 exactly FringeNode 8.0.0.0/16 value: "level3" Stride-aligned prefixes (/8, /16, /24, /32) need no address data. Just the payload. Minimum memory.

Most real-world tables are sparse. Path compression turns a would-be million-node trie into tens of thousands of nodes.

Lookup = walk + backtrack

Bart @ depth 0 has 0.0.0.0/0 → "def" 10 Bart @ depth 1 has 10.0.0.0/8 → "priv" 1 Bart @ depth 2 has 10.1.0.0/16 → "office" 42 Bart @ depth 3 no child at octet 7 Backtrack: LPM per node "office" (win!) query IP → 10.1.42.7 Descend as far as we can, then LPM-test each stacked node on the way back up.

Net::BART ships in three flavors

Your Perl code use Net::BART[::XS|::Go]; Net::BART pure Perl Blessed arrays, no XS. Precomputed popcount + ancestor tables. Zero dependencies. Ships everywhere. Debuggable in-language. ~102K lookups/sec @ 100K prefixes. Perl 5.10+ with 64-bit ints Net::BART::XS C via XS Hand-rolled C in bart.h. Hardware POPCNT + CLZ. Values are Perl SV* — any scalar. 1.8M lookups/sec. 3–5Ɨ faster than Net::Patricia. Needs a C compiler. The recommended path. Net::BART::Go Go via cgo + XS Wraps upstream gaissmai/bart. Go shared library + tiny XS shim. Reference-implementation fidelity. 1.1M lookups/sec. Best on contains / insert. Needs Go 1.21+. Track upstream bugfixes for free.

Same API. Pick your trade-off.

The API

use Net::BART;                    # or ::XS, or ::Go
my $t = Net::BART->new;

$t->insert("10.0.0.0/8",    "private");
$t->insert("10.1.0.0/16",   "office");
$t->insert("10.1.42.0/24",  "dev-team");
$t->insert("0.0.0.0/0",     "default");
$t->insert("2001:db8::/32", "documentation");   # IPv6, same call

my ($val, $ok) = $t->lookup("10.1.42.7");        # → ("dev-team", 1)
my ($val, $ok) = $t->lookup("10.1.99.1");        # → ("office", 1)
my ($val, $ok) = $t->get("10.1.0.0/16");         # exact match
   $t->contains("10.1.42.7");                    # → 1 (bool)
my ($old, $ok) = $t->delete("10.1.42.0/24");     # → ("dev-team", 1)

$t->walk(sub {                                   # iterate all
    my ($prefix, $value) = @_;
    say "$prefix => $value";
});

printf "IPv4=%d IPv6=%d total=%d\n",
    $t->size4, $t->size6, $t->size;

One API. Three implementations. IPv4 and IPv6 in the same table.

Pure Perl tricks — how we hit 100K ops/sec

Array-based blessed objects

$self->[0] is ~30% faster than $self->{prefixes}. Nodes are arrayrefs, not hashrefs.

Inlined hot paths

The lookup loop inlines get_child and lpm_test directly — no method dispatch per octet.

Non-method recursion

_do_insert() is a plain sub, not a method. Skipping $self-> matters when you recurse 4Ɨ per call.

Fast IPv4 parser

index() + substr() is ~3Ɨ faster than a regex for the canonical A.B.C.D form.

Byte popcount table

256-entry lookup array — Perl has no built-in popcount, but a table gives constant-time counts on each byte.

Precomputed ancestor sets

The 256-entry @LOOKUP_TBL turns per-node LPM into four 64-bit ANDs. Build it once at module load, reuse forever.

The C and Go layers

Net::BART::XS

  • Single-header bart.h, ~1000 LOC
  • __builtin_popcountll, __builtin_clzll
  • Values stored as raw SV* — no marshalling
  • Direct C call from Perl — no cgo boundary
  • Manual refcounting via SvREFCNT_inc/dec

Net::BART::Go

  • Go module builds libbartgo.so with -buildmode=c-shared
  • Exports C-ABI functions via //export
  • Perl XS layer opens the shared lib, calls the exports
  • Pays ~200ns cgo hop per call
  • Handles ptrs opaquely; strings marshalled at boundary

Same tests, same cross-check corpus — all four Perl implementations (BART, BART::XS, BART::Go, Patricia) agree on 10,000/10,000 random lookups.

Benchmark environment

  • Linux x86_64, single core
  • Perl 5.40 with 64-bit integers
  • Go 1.24.4 + gaissmai/bart 0.26.1
  • Net::Patricia 1.24 (libpatricia)
  • Net::CIDR::Lite 0.22 (pure Perl)
  • Random IPv4 prefixes, /8..//32
  • Table sizes: 100 / 1K / 10K / 100K
  • 50,000 random lookups per run
  • Best of 3 runs reported
  • All five agree on 10,000/10,000 lookups

All timings include IP string parsing — realistic workload.

Lookup throughput @ 100K prefixes

ops/sec — 50K random IPv4 lookups, best of 3 4M 8M 12M 16M 20M Go native (pre-parsed) 15,517K  Ā·  0.06 µs Go native (from string) 7,899K  Ā·  0.13 µs Net::BART::XS (C) 1,800K  Ā·  0.56 µs Net::BART::Go (cgo) 1,100K  Ā·  0.91 µs Net::Patricia (C) 516K  Ā·  1.90 µs Net::BART (pure Perl) 102K  Ā·  9.80 µs

Net::BART::XS is 3.5Ɨ faster than Net::Patricia and hits 28% of the native Go ceiling from Perl land.

Per-operation latency @ 100K prefixes

Operation Net::Patricia Net::BART Net::BART::XS Net::BART::Go Go native
Insert 3.2 µs 14.3 µs 0.77 µs 0.67 µs 0.26 µs
Lookup 1.9 µs 9.8 µs 0.56 µs 0.91 µs 0.06 µs
Contains n/a 4.4 µs 0.34 µs 0.32 µs 0.008 µs
Get / exact2.5 µs 10.4 µs 0.50 µs 0.91 µs 0.07 µs
Delete 5.9 µs 31.3 µs 1.27 µs 1.60 µs 0.50 µs

XS wins on lookup/get/delete (direct C, no cgo hop). Go wins on contains/insert (no string alloc across boundary; GC amortization).

Scaling with table size

Lookup ops/sec at 100, 1K, 10K, 100K prefixes

0 1M 2M 3M 4M 100 1K 10K 100K BART::XS BART::Go Patricia BART (Perl) Prefixes in table Lookups per second

XS scales best — the trie depth is capped at 4 (IPv4), so 100→100K is almost flat. Patricia bit-tries pay per address bit.

Where's the ceiling?

Native Go with pre-parsed addresses, no Perl boundary

Operation (100K) Go native (parsed) Go native (strings) Net::BART::XS Net::BART::Go
Insert 3,000K 4,559K 1,200K 1,400K
Lookup 7,443K 7,175K 2,000K 1,100K
Contains 57,249K 21,688K 2,700K 2,900K
Get 10,299K 6,848K 1,700K 899K
Delete 1,903K — 695K 516K

Native Go contains at 17ns per call — 57M ops/sec.
That's the algorithmic limit. Pure BART with no language overhead.

Perl callers pay the SV allocation, refcount, and (for Go) the cgo hop.
Getting 28% of the ceiling from Perl feels like a fair trade.

Choosing an implementation

Net::BART::XS Net::BART::Go Net::Patricia Net::BART
Language C (XS) Go (cgo + XS) C (XS) Pure Perl
Lookup / 100K 0.56 µs 0.91 µs 1.9 µs 9.8 µs
Deps C compiler C compiler + Go 1.21+ libpatricia none
IPv6 native native separate trie native
Values any scalar strings int / closure any scalar
Best for throughput upstream parity legacy code portability

Default choice: Net::BART::XS.
Fall back to Net::BART when you can't compile. Reach for ::Go if you want to inherit gaissmai's ongoing work upstream.

Get it & hack on it

Install

cpanm Net::BART

# or from source
git clone https://.../Net-BART
perl Makefile.PL
make && make test

Try it

use Net::BART::XS;
my $t = Net::BART::XS->new;
$t->insert($_->[0], $_->[1])
    for @cidr_table;
my ($v, $ok) =
    $t->lookup($ip);

What's still open

  • Overlaps / supernet iteration API
  • Persistent (copy-on-write) tables — the Go original supports this
  • Serialization / mmap-friendly on-disk format
  • Benchmarks against real BGP full tables (RIPE RIS / RouteViews)
Thanks & Questions

Net::BART

Searching for your IP

metacpan.org/pod/Net::BART
github.com/gaissmai/bart  ā€” upstream Go
Knuth, TAOCP vol. 4, fascicle 7  ā€” ART origin

Marian Marinov  Ā·  mm@yuhu.biz  Ā·  Summer 2026 Perl Community Conference

Net::BART Ā· Marian Marinov Ā· Perl Conf 2026
Note: You are viewing an auto-generated directory index.