Skip to content

A Pasta Scotta

“‘A pasta scotta nun se recupera — ma ‘o sistema, chillo sì.” (Overcooked pasta cannot be saved — but the system, that can be.)

Every distributed system misbehaves eventually. A node runs hot, a message gets lost in the GarlicBreadcast queue, a configuration typo cascades into cascading timeouts. In Napoli we say ‘a pasta scotta — the pasta is overcooked — when something has gone wrong through inattention or bad luck. This guide helps you diagnose exactly what kind of overcooking you are dealing with, and how to fix it before your users notice.

Guida Diagnostica Rapida

When the cluster starts behaving strangely, run the built-in diagnostic suite first. It will classify your problem and point you toward the right section of this guide:

Terminal window
npx pasta diagnose --verbose
# => Running 24 diagnostic checks...
# => [OK] Network reachability: all nodes responding
# => [WARN] Node napoli-03: response latency 2400ms (threshold: 1000ms)
# => [FAIL] GarlicBreadcast queue depth: 18,442 messages (threshold: 1000)
# => [OK] Consensus log: no gaps detected
# => Diagnosis: PEPERONCINO — queue congestion on napoli-03

For a deeper inspection of a specific node, use the node:inspect command:

Terminal window
npx pasta node:inspect napoli-03 --metrics --tail-logs 50

Tabella Diagnostica

The following table maps observable symptoms to their most common causes and recommended fixes.

SintomoCausa ProbabileFix Consigliato
All writes return TIMEOUT_ERROREQuorum lost — majority of nodes unreachableSee Disaster Recovery
Reads stale by > 30 secondsFollower node fell behind on WAL replayRestart the lagging node: npx pasta node:restart <name>
GarlicBreadcast messages not deliveredQueue congestion or subscriber disconnectedInspect queue depth; scale consumer threads
KitchenManager fails to startInvalid .ricetta configurationRun npx pasta config:validate and fix reported errors
CPU usage > 90% on leader nodeLarge consensus batch or runaway sagaProfile with npx pasta node:profile --duration 60s
Memory climbing steadily (no plateau)Subscription leak — handler never unsubscribedAudit bus.subscribe() calls; ensure unsubscribe() on teardown
Nodes cannot discover each otherDNS resolution failure or firewall rule changeCheck kitchen.discovery.seedNodes in .ricetta
RicettaParser rejects valid YAMLTab characters instead of spacesYAML requires spaces — your editor may be inserting tabs
Log output silent (no SUSSURRO lines)Logger sink misconfigured or log level set too highCheck logger.level — must be SUSSURRO for debug output
/sono-vivo returns 503Node is alive but internally degradedCheck Termometro subsystem — a critical dependency failed health-check
Consensus rounds taking > 5 secondsClock skew between nodes exceeding toleranceVerify NTP sync on all nodes; max skew tolerance is 500ms
Saga stuck in IN_PROGRESS indefinitelyCompensating transaction failed silentlyQuery saga state; trigger manual compensation step

Codici d’a Disgrazia

The ErrorRegistry maps all system errors to a four-level severity hierarchy. Each error code follows the pattern <SUBSYSTEM>_<CONDITION>_<DETAIL>. The table below documents the complete error code catalogue organised by severity.

BRUSCHETTA — Informational Anomalies

BRUSCHETTA errors are worth logging but require no immediate action. They are the system clearing its throat.

Codice ErroreDescrizioneCausa TipicaAzione
RICETTA_FIELD_DEPRECATEDA configuration field is deprecated but still functionalOld .ricetta from a previous versionUpdate config at your next maintenance window
GARLICBREADCAST_DUPLICATE_MESSAGEA message was delivered more than once (at-least-once semantics)Network retry on acknowledgement timeoutEnsure consumers are idempotent
DISPENSA_CACHE_MISSItem not found in local cache, falling back to primary storeCold start or eviction under memory pressureNormal during warmup; monitor miss rate trend
TERMOMETRO_CHECK_SLOWA health-check probe took > 500ms to respondTemporary I/O spike on the nodeLog and watch; escalates to PEPERONCINO if sustained
LOGGER_SINK_FLUSH_DELAYEDLog sink buffer did not flush within expected windowHigh write throughput or slow sinkReduce log verbosity or increase sink buffer size

PEPERONCINO — Warnings

PEPERONCINO errors indicate degraded operation. The kitchen is still serving, but something is not right. An on-call engineer should investigate within 30 minutes.

Codice ErroreDescrizioneCausa TipicaAzione
NODE_RESPONSE_LATENCY_HIGHA node’s P99 latency exceeded the warning threshold (1000ms)GC pause, I/O saturation, or hot shardProfile the node; consider redistributing load
GARLICBREADCAST_QUEUE_DEPTH_HIGHMessage queue depth > 1,000 messagesConsumer slower than producerScale consumer replicas or increase thread pool
PESTO_CONSENSUS_ELECTION_SLOWLeader election took > 3 secondsNetwork jitter between nodesCheck cross-node RTT; verify firewall permits election traffic
DISPENSA_REPLICATION_LAGA follower is > 10 seconds behind the leader WALFollower overloaded or network throughput limitedInspect follower resources; consider removing from rotation temporarily
RICETTA_SCHEMA_UNKNOWN_FIELDUnknown field in .ricetta — possible typoMistyped configuration keyRun npx pasta config:validate to identify the offending field
SAGA_COMPENSATION_PARTIALA saga compensation rolled back only some stepsTransient failure during rollbackInspect saga state; retry compensation manually

VESUVIO — Critical Failures

VESUVIO errors mean the kitchen is materially impaired. Quorum may be at risk. Page on-call immediately; target resolution within 15 minutes.

Codice ErroreDescrizioneCausa TipicaAzione
CLUSTER_QUORUM_WARNINGOnly N+1 nodes healthy (one failure away from quorum loss)Node crash, OOM kill, or network partitionRestore the failed node immediately; do not perform rolling restarts
GARLICBREADCAST_DEAD_LETTER_OVERFLOWDead-letter queue exceeded capacity — messages being droppedPersistent consumer failureFix consumer; drain dead-letter queue manually after fix
DISPENSA_SNAPSHOT_FAILEDScheduled backup did not completeStorage quota exceeded or I/O error on backup targetClear storage; verify backup target connectivity; force manual snapshot
TERMOMETRO_DEPENDENCY_CRITICALA critical external dependency (DB, cache) is unreachableDependency outage or misconfigured connection stringTreat as dependency incident; Pasta Protocol will degrade gracefully until resolved
PESTO_CONSENSUS_LOG_GAPGap detected in the consensus log — some operations may be missingNode rejoining after prolonged absenceRun npx pasta consensus:repair --node <name> to replay missing entries
NODE_MEMORY_CRITICALNode memory usage > 90%Memory leak or unexpectedly large datasetForce GC with npx pasta node:gc <name>; plan node restart in off-peak window

TERREMOTO — Fatal Events

TERREMOTO errors mean the cluster has stopped. No reads. No writes. Niente. Execute the Disaster Recovery runbook immediately. Post-mortem is mandatory for every TERREMOTO.

Codice ErroreDescrizioneCausa TipicaAzione
CLUSTER_QUORUM_LOSTFewer than a majority of nodes are responding — cluster haltedMultiple simultaneous node failures or network split-brainExecute Disaster Recovery runbook from Step 1
CONSENSUS_LOG_CORRUPTEDThe WAL is corrupted and cannot be replayedDisk failure or incomplete shutdownRestore from last clean backup; do not attempt manual log repair
DISPENSA_DATA_LOSS_DETECTEDRead-back of written data returns inconsistent resultsStorage-level corruption or botched migrationHalt all writes immediately; engage data team; restore from backup
KITCHEN_PANICAn unhandled exception crashed the KitchenManager processBug in application code or Pasta Protocol internalsCheck crash dump at ~/.pasta/crash-<timestamp>.log; report to maintainers if PP-internal
TERREMOTO_SPLIT_BRAINTwo nodes both believe they are the leaderClock skew > 500ms combined with network partitionManually fence the stale leader; consult consensus log to determine true leader

Strumenti di Debug Avanzato

For issues not covered by the diagnostic table, the following tools provide deeper inspection:

Terminal window
# Dump the full consensus log to inspect operation history
npx pasta consensus:dump --last 1000 --format json > consensus-dump.json
# Trace a specific message through the GarlicBreadcast pipeline
npx pasta bus:trace --message-id "msg_abc123" --verbose
# Replay WAL from a specific offset (dry-run — no writes)
npx pasta wal:replay --from-offset 88421 --dry-run
# Export Termometro health snapshot
npx pasta health:export --format prometheus > health-$(date +%s).prom

Ricordatevi: ‘a diagnosi sbagliata è peggio d’a malattia. (Remember: a wrong diagnosis is worse than the disease.)