Latest release Apache-2.0 by NAVER

Move a terabyte to a hundred nodes.
Without melting your storage.

p2pcp is a peer-to-peer file replication CLI. The more peers join, the faster the transfer goes — because every peer becomes a source. One static binary, zero companion services.

bash
$ p2pcp -dst /data -peer-list peer1,peer2,peer3 -exit-complete

How it works

One source is a bottleneck. A hundred peers aren't.

Traditional

Every node pulls from one storage. Bandwidth divides as nodes pile on.

p2pcp

Every peer shares chunks. Throughput grows as the fleet grows.

Built to scale

Terabytes of source. Hundreds of peers. Limited only by your network.

p2pcp is designed to fan out terabyte-class source data across hundreds of peer nodes — throughput grows as the fleet grows, bounded by your network rather than the tool.

In production at NAVER

100GiB+
source data per run
100nodes
replicated in parallel
<1min
from a single 10 GbE seed, zstd on

Reference deployment inside NAVER, with zstd compression enabled. Numbers vary with topology, chunk size, compressibility of the data, and storage characteristics.

Features

All you need is a binary. No master, no scheduler, no infrastructure to stand up.

Single static binary

One Linux binary per node. No runtime deps, no daemon to keep alive, no companion control plane — drop it in and run.

Same binary, every node

Role set by flags: -src to seed, -dst to pull. No master, no scheduler, no manager process to keep alive.

Discovery without infrastructure

A static -peer-list a,b,c works on day one. DNS SRV/A and HTTP registry are optional add-ons, never prerequisites.

Kubernetes-native patterns

Drops cleanly into an init container, sidecar, or DaemonSet. Point at a Headless Service FQDN and peers find each other via SRV/A — no operator, no CRD.

Throughput scales with peers

Chunked parallel transfer between every peer. Aggregate bandwidth grows with the fleet rather than collapsing on the source.

Verified & tunable

xxh3 digests per chunk, -verify-on-complete, optional zstd compression, tunable chunk size (8–128 MiB) and per-peer concurrency.

Quick start

Three commands. Three nodes. Done.

  1. 1

    Start a seeder

    One node holds the source and serves chunks.

    seeder $ p2pcp -src /data/files
  2. 2

    Pull from peers

    Each peer downloads from every other peer, in parallel. seeder, peer1, peer2 are resolvable hostnames (or host:port) — pass the real ones for your fleet.

    peer  $ p2pcp -dst /data/dst -peer-list seeder,peer1,peer2 -exit-complete
  3. 3

    Or use DNS discovery

    Point at a Kubernetes headless service — peers find each other.

    peer  $ p2pcp -dst /data/dst -peer-list-srv p2pcp.default.svc.cluster.local

Full CLI reference, tuning guide, and external-registry protocol in the README.

Kubernetes pattern

Init pulls. Sidecar serves. The next replica is even faster.

When a Deployment scales out, every new pod's init container pulls the dataset from the seeder and every running peer in parallel. The sidecar then keeps serving that data — so each successive replica reaches steady state faster than the last.

# deployment.yaml — excerpt
spec:
  template:
    spec:
      initContainers:
        - name: p2pcp-init
          image: p2pcp:2
          args:
            - -dst=/data
            - -peer-list=seeder.default.svc.cluster.local       # source of truth
            - -peer-list-srv=my-app.default.svc.cluster.local   # sibling pods (Headless Service)
            - -exit-complete
          volumeMounts:
            - { name: data, mountPath: /data }
      containers:
        - name: app
          image: my-app
          volumeMounts:
            - { name: data, mountPath: /data, readOnly: true }
        - name: p2pcp-sidecar
          image: p2pcp:2
          args: [-src=/data]                                          # serve to next replicas
          volumeMounts:
            - { name: data, mountPath: /data, readOnly: true }
      volumes:
        - name: data
          emptyDir: {}

A Headless Service selecting the pods is required so SRV records resolve to sibling replicas. The seeder can be any reachable host running p2pcp -src — a standalone pod, a StatefulSet, or a node outside the cluster.