1
0
mirror of https://github.com/TwiN/gatus.git synced 2026-03-22 20:10:07 +00:00

feat(suite): Implement Suites (#1239)

* feat(suite): Implement Suites

Fixes #1230

* Update docs

* Fix variable alignment

* Prevent always-run endpoint from running if a context placeholder fails to resolve in the URL

* Return errors when a context placeholder path fails to resolve

* Add a couple of unit tests

* Add a couple of unit tests

* fix(ui): Update group count properly

Fixes #1233

* refactor: Pass down entire config instead of several sub-configs

* fix: Change default suite interval and timeout

* fix: Deprecate disable-monitoring-lock in favor of concurrency

* fix: Make sure there are no duplicate keys

* Refactor some code

* Update watchdog/watchdog.go

* Update web/app/src/components/StepDetailsModal.vue

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: Remove useless log

* fix: Set default concurrency to 3 instead of 5

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
TwiN
2025-09-05 15:39:12 -04:00
committed by GitHub
parent 10cabb9dde
commit d668a14703
74 changed files with 7513 additions and 652 deletions

136
README.md
View File

@@ -45,6 +45,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
- [Configuration](#configuration)
- [Endpoints](#endpoints)
- [External Endpoints](#external-endpoints)
- [Suites (ALPHA)](#suites-alpha)
- [Conditions](#conditions)
- [Placeholders](#placeholders)
- [Functions](#functions)
@@ -122,7 +123,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
- [Monitoring an endpoint using STARTTLS](#monitoring-an-endpoint-using-starttls)
- [Monitoring an endpoint using TLS](#monitoring-an-endpoint-using-tls)
- [Monitoring domain expiration](#monitoring-domain-expiration)
- [disable-monitoring-lock](#disable-monitoring-lock)
- [Concurrency](#concurrency)
- [Reloading configuration on the fly](#reloading-configuration-on-the-fly)
- [Endpoint groups](#endpoint-groups)
- [How do I sort by group by default?](#how-do-i-sort-by-group-by-default)
@@ -247,7 +248,8 @@ If you want to test it locally, see [Docker](#docker).
| `endpoints` | [Endpoints configuration](#endpoints). | Required `[]` |
| `external-endpoints` | [External Endpoints configuration](#external-endpoints). | `[]` |
| `security` | [Security configuration](#security). | `{}` |
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). | `false` |
| `concurrency` | Maximum number of endpoints/suites to monitor concurrently. Set to `0` for unlimited. See [Concurrency](#concurrency). | `3` |
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). **Deprecated**: Use `concurrency: 0` instead. | `false` |
| `skip-invalid-config-update` | Whether to ignore invalid configuration update. <br />See [Reloading configuration on the fly](#reloading-configuration-on-the-fly). | `false` |
| `web` | Web configuration. | `{}` |
| `web.address` | Address to listen on. | `0.0.0.0` |
@@ -309,6 +311,8 @@ You can then configure alerts to be triggered when an endpoint is unhealthy once
| `endpoints[].ui.dont-resolve-failed-conditions` | Whether to resolve failed conditions for the UI. | `false` |
| `endpoints[].ui.badge.response-time` | List of response time thresholds. Each time a threshold is reached, the badge has a different color. | `[50, 200, 300, 500, 750]` |
| `endpoints[].extra-labels` | Extra labels to add to the metrics. Useful for grouping endpoints together. | `{}` |
| `endpoints[].always-run` | (SUITES ONLY) Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
| `endpoints[].store` | (SUITES ONLY) Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
You may use the following placeholders in the body (`endpoints[].body`):
- `[ENDPOINT_NAME]` (resolved from `endpoints[].name`)
@@ -366,6 +370,99 @@ Where:
You must also pass the token as a `Bearer` token in the `Authorization` header.
### Suites (ALPHA)
Suites are collections of endpoints that are executed sequentially with a shared context.
This allows you to create complex monitoring scenarios where the result from one endpoint can be used in subsequent endpoints, enabling workflow-style monitoring.
Here are a few cases in which suites could be useful:
- Testing multi-step authentication flows (login -> access protected resource -> logout)
- API workflows where you need to chain requests (create resource -> update -> verify -> delete)
- Monitoring business processes that span multiple services
- Validating data consistency across multiple endpoints
| Parameter | Description | Default |
|:----------------------------------|:----------------------------------------------------------------------------------------------------|:--------------|
| `suites` | List of suites to monitor. | `[]` |
| `suites[].enabled` | Whether to monitor the suite. | `true` |
| `suites[].name` | Name of the suite. Must be unique. | Required `""` |
| `suites[].group` | Group name. Used to group multiple suites together on the dashboard. | `""` |
| `suites[].interval` | Duration to wait between suite executions. | `10m` |
| `suites[].timeout` | Maximum duration for the entire suite execution. | `5m` |
| `suites[].context` | Initial context values that can be referenced by endpoints. | `{}` |
| `suites[].endpoints` | List of endpoints to execute sequentially. | Required `[]` |
| `suites[].endpoints[].store` | Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
| `suites[].endpoints[].always-run` | Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
**Note**: Suite-level alerts are not supported yet. Configure alerts on individual endpoints within the suite instead.
#### Using Context in Endpoints
Once values are stored in the context, they can be referenced in subsequent endpoints:
- In the URL: `https://api.example.com/users/[CONTEXT].userId`
- In headers: `Authorization: Bearer [CONTEXT].authToken`
- In the body: `{"user_id": "[CONTEXT].userId"}`
- In conditions: `[BODY].server_ip == [CONTEXT].serverIp`
#### Example Suite Configuration
```yaml
suites:
- name: item-crud-workflow
group: api-tests
interval: 5m
context:
price: "19.99" # Initial static value in context
endpoints:
# Step 1: Create an item and store the item ID
- name: create-item
url: https://api.example.com/items
method: POST
body: '{"name": "Test Item", "price": "[CONTEXT].price"}'
conditions:
- "[STATUS] == 201"
- "len([BODY].id) > 0"
- "[BODY].price == [CONTEXT].price"
store:
itemId: "[BODY].id"
alerts:
- type: slack
description: "Failed to create item"
# Step 2: Update the item using the stored item ID
- name: update-item
url: https://api.example.com/items/[CONTEXT].itemId
method: PUT
body: '{"price": "24.99"}'
conditions:
- "[STATUS] == 200"
alerts:
- type: slack
description: "Failed to update item"
# Step 3: Fetch the item and validate the price
- name: get-item
url: https://api.example.com/items/[CONTEXT].itemId
method: GET
conditions:
- "[STATUS] == 200"
- "[BODY].price == 24.99"
alerts:
- type: slack
description: "Item price did not update correctly"
# Step 4: Delete the item (always-run: true to ensure cleanup even if step 2 or 3 fails)
- name: delete-item
url: https://api.example.com/items/[CONTEXT].itemId
method: DELETE
always-run: true
conditions:
- "[STATUS] == 204"
alerts:
- type: slack
description: "Failed to delete item"
```
The suite will be considered successful only if all required endpoints pass their conditions.
### Conditions
Here are some examples of conditions you can use:
@@ -2921,17 +3018,34 @@ endpoints:
> using the `[DOMAIN_EXPIRATION]` placeholder on an endpoint with an interval of less than `5m`.
### disable-monitoring-lock
Setting `disable-monitoring-lock` to `true` means that multiple endpoints could be monitored at the same time (i.e. parallel execution).
### Concurrency
By default, Gatus allows up to 5 endpoints/suites to be monitored concurrently. This provides a balance between performance and resource usage while maintaining accurate response time measurements.
While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted
by the evaluation of multiple endpoints at the same time, therefore, the default value for this parameter is `false`.
You can configure the concurrency level using the `concurrency` parameter:
There are three main reasons why you might want to disable the monitoring lock:
- You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so
technically, if you create 100 endpoints with a 1 seconds interval, Gatus will send 100 requests per second)
- You have a _lot_ of endpoints to monitor
- You want to test multiple endpoints at very short intervals (< 5s)
```yaml
# Allow 10 endpoints/suites to be monitored concurrently
concurrency: 10
# Allow unlimited concurrent monitoring
concurrency: 0
# Use default concurrency (3)
# concurrency: 3
```
**Important considerations:**
- Higher concurrency can improve monitoring performance when you have many endpoints
- Conditions using the `[RESPONSE_TIME]` placeholder may be less accurate with very high concurrency due to system resource contention
- Set to `0` for unlimited concurrency (equivalent to the deprecated `disable-monitoring-lock: true`)
**Use cases for higher concurrency:**
- You have a large number of endpoints to monitor
- You want to monitor endpoints at very short intervals (< 5s)
- You're using Gatus for load testing scenarios
**Legacy configuration:**
The `disable-monitoring-lock` parameter is deprecated but still supported for backward compatibility. It's equivalent to setting `concurrency: 0`.
### Reloading configuration on the fly