# Observability

EntryTarget provides built-in observability via Prometheus metrics and Grafana dashboards.

## Architecture

```
Ledger Binary (:8080/metrics)
    |
    | scrapes every 10s
    v
Prometheus
    |
    | queries
    v
Grafana (dashboards & alerts)
```

## Metrics Endpoint

The `/metrics` endpoint exposes all metrics in Prometheus text format:

```bash
curl http://<host>:8080/metrics
```

No authentication is required. Restrict access via Security Groups in AWS.

## Setting Up Prometheus

### Prometheus Configuration

```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'ledger'
    scrape_interval: 10s
    static_configs:
      - targets: ['<ledger-host>:8080']
```

### Verify Scraping

Check Prometheus targets at `http://<prometheus-host>:9090/targets`. The ledger target should show status **UP**.

**Common issue:** If the target shows DOWN, ensure:

* The Prometheus container can reach the ledger host
* Security Groups allow traffic from Prometheus to port 8080
* If running Docker locally, use `host.docker.internal:8080` instead of `localhost`

## Key Dashboards

### Transaction Throughput

```promql
rate(ledger_transactions_total[1m])
```

Shows the total transactions per second, including rejected ones.

### Success vs. Failure Rate

```promql
rate(ledger_transactions_success_total[1m])
rate(ledger_transactions_insufficient_funds_total[1m])
rate(ledger_transactions_error_total[1m])
```

### HTTP Latency (p99)

```promql
histogram_quantile(0.99, rate(ledger_http_duration_ms_bucket{path="/transaction"}[1m]))
```

### Batch Efficiency

```promql
# Average batch size
rate(ledger_batch_size_sum[1m]) / rate(ledger_batch_size_count[1m])

# Average batch duration
rate(ledger_batch_duration_ms_sum[1m]) / rate(ledger_batch_duration_ms_count[1m])

# Average commit time
rate(ledger_batch_commit_ms_sum[1m]) / rate(ledger_batch_commit_ms_count[1m])
```

### Idempotency Health

```promql
# Current record count
ledger_idempotency_records_current

# Cleanup throughput
rate(ledger_idempotency_cleanup_removed_total[5m])

# Cleanup errors (should be 0)
rate(ledger_idempotency_cleanup_errors_total[5m])
```

## Alerting Recommendations

| Alert                       | Condition                                                        | Severity |
| --------------------------- | ---------------------------------------------------------------- | -------- |
| High error rate             | `rate(ledger_transactions_error_total[5m]) > 0.01`               | Critical |
| High latency                | `p99 > 100ms` for `/transaction`                                 | Warning  |
| Idempotency cleanup failing | `ledger_idempotency_cleanup_errors_total` increasing             | Warning  |
| Batch commit time high      | `p99 ledger_batch_commit_ms > 50`                                | Warning  |
| No transactions processed   | `rate(ledger_transactions_total[5m]) == 0` during business hours | Info     |

## Important Notes

* **Metrics reset on restart** — in-memory counters reset when the server restarts. Prometheus stores historical data, so Grafana dashboards show continuity.
* **TPS measurement** — `ledger_transactions_total` counts all transactions including rejected ones (insufficient funds, errors). For committed-only TPS, use `ledger_transactions_success_total`.
* **Centralized monitoring** — in a multi-customer setup, use one Prometheus instance per region scraping all customer instances.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://entrytarget.gitbook.io/docs/operations/observability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
