angzarr_client.readiness

Readiness probes and health-status supervisor for runner servers.

Python port of client-rust/main/src/readiness.rs. Audit #68.

A runner exposes its readiness through grpc.health.v1.Health. While any probe is failing, the per-kind service name reports NOT_SERVING; once every probe is green, it flips to SERVING. Probes are evaluated on a fixed cadence (default 30s, override via ANGZARR_READINESS_PROBE_INTERVAL) with a per-probe timeout (default 2s, override via ANGZARR_READINESS_PROBE_TIMEOUT).

Aggregation is binary — all up is SERVING, anything else is NOT_SERVING. The health server itself always responds, so liveness (“the process answers”) and readiness (“it’s safe to send traffic”) share one wire surface and are distinguished by the response status.

Async architecture (audit #68 design call): the runner runs on grpc.aio.server so the supervisor lives as an asyncio task on the same event loop, mirroring Rust’s tokio-async supervisor 1:1. Probes have async check() -> bool signatures; per-probe timeouts are enforced via asyncio.wait_for.

Attributes

DEFAULT_PROBE_INTERVAL

Default cadence for re-evaluating output-domain probes (seconds).

DEFAULT_PROBE_TIMEOUT

Default per-probe timeout (seconds).

ENV_INTERVAL

ENV_TIMEOUT

ENV_BUS_ENDPOINT

optional async-bus endpoint (Kafka / RabbitMQ / SQS /

Classes

Probe

Single readiness probe — evaluated once per supervisor tick.

TransportSignal

Side of a TransportProbe used by the runner to mark

TransportProbe

One-shot transport probe — flipped True once the listener has

OutputDomainProbe

Per-output-domain coordinator probe — attempts to open a connection

BusProbe

Async-bus reachability probe (audit #74) — covers the path that

Functions

probe_config_from_env(→ tuple[float, float])

Read supervisor cadence + per-probe timeout from env, falling

run_supervisor(→ None)

Poll every probe on each tick, aggregate (all_okSERVING,

Module Contents

angzarr_client.readiness.DEFAULT_PROBE_INTERVAL = 30.0

Default cadence for re-evaluating output-domain probes (seconds).

angzarr_client.readiness.DEFAULT_PROBE_TIMEOUT = 2.0

Default per-probe timeout (seconds).

angzarr_client.readiness.ENV_INTERVAL = 'ANGZARR_READINESS_PROBE_INTERVAL'
angzarr_client.readiness.ENV_TIMEOUT = 'ANGZARR_READINESS_PROBE_TIMEOUT'
angzarr_client.readiness.ENV_BUS_ENDPOINT = 'ANGZARR_BUS_ENDPOINT'

optional async-bus endpoint (Kafka / RabbitMQ / SQS / SNS / NATS / etc.). When set, a single BusProbe covers reachability of the async path for every async-only saga / PM target. When unset, no bus probe is added — async-only targets are simply not part of readiness.

Type:

Audit #74

angzarr_client.readiness.probe_config_from_env() tuple[float, float]

Read supervisor cadence + per-probe timeout from env, falling back to DEFAULT_PROBE_INTERVAL / DEFAULT_PROBE_TIMEOUT.

Same env-var contract as Rust. Bad values (non-numeric) silently fall back to defaults — matches Rust’s .parse::<u64>().ok().

class angzarr_client.readiness.Probe

Bases: abc.ABC

Single readiness probe — evaluated once per supervisor tick.

property name: str
Abstractmethod:

Stable identifier for log lines.

abstractmethod check() bool
Async:

True when the underlying dependency is currently healthy.

class angzarr_client.readiness.TransportSignal

Side of a TransportProbe used by the runner to mark ‘bound and serving’.

mark_bound() None
is_bound() bool
class angzarr_client.readiness.TransportProbe(signal: TransportSignal)

Bases: Probe

One-shot transport probe — flipped True once the listener has bound and the server is accepting traffic. From that point its result never changes.

classmethod new() tuple[TransportProbe, TransportSignal]
property name: str

Stable identifier for log lines.

async check() bool

True when the underlying dependency is currently healthy.

class angzarr_client.readiness.OutputDomainProbe(domain: str, endpoint: _Endpoint)

Bases: Probe

Per-output-domain coordinator probe — attempts to open a connection to the downstream domain’s command-handler coordinator endpoint.

Audit #74: built only for sync output domains (declared via @saga(sync=True) or @process_manager(sync_targets=[...])). Async-only targets ride the bus; see BusProbe.

classmethod for_domain(domain: str) OutputDomainProbe
property name: str

Stable identifier for log lines.

async check() bool

True when the underlying dependency is currently healthy.

class angzarr_client.readiness.BusProbe(endpoint: _Endpoint)

Bases: Probe

Async-bus reachability probe (audit #74) — covers the path that async-only saga / PM targets ride. The endpoint is operator-supplied via ENV_BUS_ENDPOINT and points at whatever broker the deployment uses (Kafka, RabbitMQ, SQS/SNS, NATS, etc.).

Connection-only — confirms the broker is reachable, not that publishes will succeed end-to-end. Same contract as OutputDomainProbe for sync targets.

classmethod from_env() BusProbe | None
property name: str

Stable identifier for log lines.

async check() bool

True when the underlying dependency is currently healthy.

async angzarr_client.readiness.run_supervisor(probes: list[Probe], health_servicer, service_names: list[str], interval: float, timeout: float) None

Poll every probe on each tick, aggregate (all_okSERVING, else NOT_SERVING), publish to every service name. Loops until cancelled.

health_servicer is an async grpc_health.v1.health.aio.HealthServicer — its set(service, status) is awaitable.