Budgeting rebalances, answered

Rebalancing buys inbound liquidity by paying a circular route back to yourself. The cost has to stay below what the channel will earn routing that liquidity out again, or every cycle loses money. This note answers how the budget is set, why it escalates, what stops it overpaying, and what happens when a channel simply cannot be refilled profitably.

Where does the budget come from?

From one measured number — the price of the most recent successful refill into that channel — nudged up when recent attempts have failed:

escalated = (last_refill_ppm OR 500) × (1 + 0.20 × failures_since_last_success)
budget    = min(escalated, 5000)                              # if CALIBRATING
budget    = min(escalated, earned_ppm × 1.25, 5000)           # if CALIBRATED

The escalation term does two jobs with one mechanism. A fresh channel with no history bootstraps at REBALANCE_DEFAULT_BUDGET_PPM (500) and climbs 20% per consecutive failure until a route is found; an established channel whose cheap routes have dried up re-escalates from its last known price until a new one is discovered. One success resets the counter and re-anchors the budget to the price actually paid.

500 → 600 → 720 → 864 → 1037 → 1244 → 1493 → 1791 → 2150 → 2580
  a ~2300-ppm peer bootstraps in ~9 cron cycles (≈18h at the 2h cron)

What is the profitability gate?

Escalation alone would happily pay 2500 ppm to refill a channel that only earns 200 — the classic "buy expensive, sell cheap" bleed. The gate (Layer 1) caps a calibrated channel's budget at earned_ppm × REBALANCE_PROFIT_HORIZON (1.25, ≈ break-even on the recoup price): never pay more to refill than the channel can earn back in about a horizon's worth of cycles.

earned_ppm = Σ fee_earned / Σ amount_out   # trailing, by outbound channel

What makes a channel "calibrated" vs "calibrating"?

Evidence. A channel is calibrated only if its trailing OUT-volume clears EARNED_PPM_MIN_VOLUME_SATS (2M sats over EARNED_PPM_WINDOW_DAYS = 21), so the earned-ppm ratio is trustworthy — then the cap applies. If the standard 21 days are too quiet, the window widens — doubling (21 → 42 → 84 → EARNED_PPM_MAX_LOOKBACK_DAYS = 90) until the volume suffices — so a channel that went silent because it's depleted stays calibrated on its older history instead of shedding the cap at a cliff. A channel with too little volume even over the full lookback is still calibrating (earned_ppm comes back as None) and keeps full escalation, no cap: capping a low-volume channel would kill the price discovery escalation exists for and it would never bootstrap. The gate is opt-in by evidence.

What happens to a channel that can't be refilled profitably?

It stops getting ground on. A calibrated channel whose escalation keeps exceeding the cap is flagged profit_capped; if it also fails REBALANCE_STRUCTURAL_FAIL_THRESHOLD (10) times while capped, it is marked structural. At that point plan_rebalances drops it from planning (refilling is a losing trade), recompute_signals stamps a structural_flag_ts and fires a one-time structural_liquidity alert, and the fix becomes a capital decision — open inbound, splice, or resize — rather than more paid rebalances. (With the optional inbound-fee ladder enabled, it first gets a negative inbound-fee probe to pull an organic refill before being flagged.)

Can it read the price instead of grinding for it?

Yes. Both the escalation ladder and the accelerator discover the clearing price by failing over several runs. But LND will tell you the price for free: a QueryRoutes dry-run uses the exact same pathfinder as SendPaymentV2 — mission-control liquidity included — without sending a payment. So the planner probes it and acts on the real number now instead of groping toward it over days. Each depleted target gets an affordable_ceiling_ppm (the profit cap for a calibrated channel, else the 5000 hard max — the most it could ever justify paying). For each target it runs one probe per overfull source, at the smallest chunk, capped at that ceiling — and that single set of probes does both jobs:

Pricing. Price the bid off the cheapest feasible source — raise this run's bid up to its live cost (bounded by the ceiling) and try that source first, so an affordable refill lands now and via the cheapest source, instead of ~5 escalations or paying whichever source happens to be most overfull. It only ever raises the bid and never above the ceiling, so it reaches the right price faster but never overpays.
Early-out. If no source has a route within the ceiling, refilling isn't a price problem, it's a capital one. The planner skips the wasted attempt and records a synthetic failed cycle, so the channel still climbs to its structural flag and surfaces the capital decision. The early-out replaces the wasted attempts, not the stranding they'd eventually trigger.

Each source's cost includes the target peer's fee to forward the final hop into our channel. The probe routes to the target peer as its destination, so LND charges nothing for the hop into it — but the real rebalance is circular (us → source → … → target_peer → us), where the target peer is an intermediate forwarding into our channel and does charge. That fee is added back from a single channel-policy lookup (the same omission the peer-finder corrects on the first hop, here on the last). Without it a direct neighbour reads a deceptively low or zero ppm with its inbound fee invisible.

Why probe every source rather than just the most-overfull one? Because feasibility is existential — one working source proves a channel is refillable, and a cheaper source might exist — but infeasibility is universal: only all sources failing justifies a drop. Since the drop advances stranding, a single source's no-route must never strand a channel nor set the price. And it probes at the smallest chunk for pricing too: a fixed base fee amortises over fewer sats, so the small-chunk ppm is the worst case — a safe upper bound for the cap that still lets larger routes settle under it (and one probe covers refills that can only go through in chunks). The bid is just a cap; the executor still pays the cheapest route under it.

Guards: calibrated-only (a calibrating channel keeps discovering by real attempts); a probe that's unavailable — LND down — is treated as unknown, never as no-route, so a transport blip can never strand a channel; it runs only in the planner (the budget function stays call-free, since the fee engine reads it for every channel every run); and any probe error leaves the budget untouched. The blind earn-ceiling accelerator above is now its fallback for when the probe is off or unavailable.

How is a rebalance actually executed?

As a circular self-payment over LND's Router. It runs through SendPaymentV2 (POST /v2/router/send) with a hard fee ceiling and a 120-second timeout per attempt: max_fee_sats = amount × budget_ppm / 1e6 × 1.1 # 10% cushion for base fees

The ppm budget becomes that fee_limit_sat — a 500,000-sat rebalance at ~490 ppm allows roughly 270 sats of fee. SendPaymentV2 does its own multi-path pathfinding under the limit; if it can't complete within the ceiling or the timeout it fails cleanly (costing nothing), and the chunking below takes over. Manual circular rebalances done with lncli are detected later by sync_routing matching the self-payment, so the dashboard and the budget signal stay complete.

One subtlety with two channels to the same peer: the payment can only pin its last hop by pubkey, and LND's non-strict forwarding pools sibling liquidity at forward time anyway — so a chunk may settle on either sibling. The books follow the sats: the executor reads the invoice's settled HTLC records (ground truth for the landing channel) and writes the refill — and therefore last_refill_ppm and the fee floor — against the channel that actually received it.

What if the full amount won't route?

The executor halves the amount and retries, down to a 100k floor. Each chunk that lands is written as its own success row at the price it actually paid, so last_refill_ppm reflects what was really spent.

want 800k → fails → 400k → ok      (recorded at its ppm)
remaining 400k → fails → 200k → ok (recorded at its ppm)

How do primary and fallback routes get chosen?

The planner emits, per depleted target, one or more primary pairs (sources whose surplus covers the deficit) plus fallback pairs (every other overfull source against the same target). At execution two ledgers — target_deficits and source_remaining — decide which actually fire, with no explicit "did the primary fail" branch. That has its own note: primary and fallback rebalance routes.

Why one measured signal instead of a scoring model?

A median lags every move by half its window — quoting a stale price after cheap routes dry up, and undershooting after they return. Failure-escalation supplies the only smoothing that helps (it ratchets up when reality says the last price stopped working) without averaging in prices that no longer exist. The principle that fell out: prefer one number you measured to several you estimated.

How does this loop back to fees?

Each landed chunk writes a new last_refill_ppm, which is the same number that sets the channel's outbound fee floor (last_refill × REBALANCE_FEE_MARGIN). A pricier refill raises what we charge to drain that liquidity again; a cheaper one lowers it. Budget and fee floor move together off one signal — which is also why the rebalance ceiling (5000 ppm) equals the fee hard ceiling: a channel can always charge enough to recover what we were willing to pay to fill it. The fee note covers the floor side, and idle-channel pricing covers what happens when that floor would otherwise strand a channel.