mirror of
https://github.com/TwiN/gatus.git
synced 2026-03-22 20:10:07 +00:00
feat(suite): Implement Suites (#1239)
* feat(suite): Implement Suites Fixes #1230 * Update docs * Fix variable alignment * Prevent always-run endpoint from running if a context placeholder fails to resolve in the URL * Return errors when a context placeholder path fails to resolve * Add a couple of unit tests * Add a couple of unit tests * fix(ui): Update group count properly Fixes #1233 * refactor: Pass down entire config instead of several sub-configs * fix: Change default suite interval and timeout * fix: Deprecate disable-monitoring-lock in favor of concurrency * fix: Make sure there are no duplicate keys * Refactor some code * Update watchdog/watchdog.go * Update web/app/src/components/StepDetailsModal.vue Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: Remove useless log * fix: Set default concurrency to 3 instead of 5 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
136
README.md
136
README.md
@@ -45,6 +45,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
|
||||
- [Configuration](#configuration)
|
||||
- [Endpoints](#endpoints)
|
||||
- [External Endpoints](#external-endpoints)
|
||||
- [Suites (ALPHA)](#suites-alpha)
|
||||
- [Conditions](#conditions)
|
||||
- [Placeholders](#placeholders)
|
||||
- [Functions](#functions)
|
||||
@@ -122,7 +123,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
|
||||
- [Monitoring an endpoint using STARTTLS](#monitoring-an-endpoint-using-starttls)
|
||||
- [Monitoring an endpoint using TLS](#monitoring-an-endpoint-using-tls)
|
||||
- [Monitoring domain expiration](#monitoring-domain-expiration)
|
||||
- [disable-monitoring-lock](#disable-monitoring-lock)
|
||||
- [Concurrency](#concurrency)
|
||||
- [Reloading configuration on the fly](#reloading-configuration-on-the-fly)
|
||||
- [Endpoint groups](#endpoint-groups)
|
||||
- [How do I sort by group by default?](#how-do-i-sort-by-group-by-default)
|
||||
@@ -247,7 +248,8 @@ If you want to test it locally, see [Docker](#docker).
|
||||
| `endpoints` | [Endpoints configuration](#endpoints). | Required `[]` |
|
||||
| `external-endpoints` | [External Endpoints configuration](#external-endpoints). | `[]` |
|
||||
| `security` | [Security configuration](#security). | `{}` |
|
||||
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). | `false` |
|
||||
| `concurrency` | Maximum number of endpoints/suites to monitor concurrently. Set to `0` for unlimited. See [Concurrency](#concurrency). | `3` |
|
||||
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). **Deprecated**: Use `concurrency: 0` instead. | `false` |
|
||||
| `skip-invalid-config-update` | Whether to ignore invalid configuration update. <br />See [Reloading configuration on the fly](#reloading-configuration-on-the-fly). | `false` |
|
||||
| `web` | Web configuration. | `{}` |
|
||||
| `web.address` | Address to listen on. | `0.0.0.0` |
|
||||
@@ -309,6 +311,8 @@ You can then configure alerts to be triggered when an endpoint is unhealthy once
|
||||
| `endpoints[].ui.dont-resolve-failed-conditions` | Whether to resolve failed conditions for the UI. | `false` |
|
||||
| `endpoints[].ui.badge.response-time` | List of response time thresholds. Each time a threshold is reached, the badge has a different color. | `[50, 200, 300, 500, 750]` |
|
||||
| `endpoints[].extra-labels` | Extra labels to add to the metrics. Useful for grouping endpoints together. | `{}` |
|
||||
| `endpoints[].always-run` | (SUITES ONLY) Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
|
||||
| `endpoints[].store` | (SUITES ONLY) Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
|
||||
|
||||
You may use the following placeholders in the body (`endpoints[].body`):
|
||||
- `[ENDPOINT_NAME]` (resolved from `endpoints[].name`)
|
||||
@@ -366,6 +370,99 @@ Where:
|
||||
You must also pass the token as a `Bearer` token in the `Authorization` header.
|
||||
|
||||
|
||||
### Suites (ALPHA)
|
||||
Suites are collections of endpoints that are executed sequentially with a shared context.
|
||||
This allows you to create complex monitoring scenarios where the result from one endpoint can be used in subsequent endpoints, enabling workflow-style monitoring.
|
||||
|
||||
Here are a few cases in which suites could be useful:
|
||||
- Testing multi-step authentication flows (login -> access protected resource -> logout)
|
||||
- API workflows where you need to chain requests (create resource -> update -> verify -> delete)
|
||||
- Monitoring business processes that span multiple services
|
||||
- Validating data consistency across multiple endpoints
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|:----------------------------------|:----------------------------------------------------------------------------------------------------|:--------------|
|
||||
| `suites` | List of suites to monitor. | `[]` |
|
||||
| `suites[].enabled` | Whether to monitor the suite. | `true` |
|
||||
| `suites[].name` | Name of the suite. Must be unique. | Required `""` |
|
||||
| `suites[].group` | Group name. Used to group multiple suites together on the dashboard. | `""` |
|
||||
| `suites[].interval` | Duration to wait between suite executions. | `10m` |
|
||||
| `suites[].timeout` | Maximum duration for the entire suite execution. | `5m` |
|
||||
| `suites[].context` | Initial context values that can be referenced by endpoints. | `{}` |
|
||||
| `suites[].endpoints` | List of endpoints to execute sequentially. | Required `[]` |
|
||||
| `suites[].endpoints[].store` | Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
|
||||
| `suites[].endpoints[].always-run` | Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
|
||||
|
||||
**Note**: Suite-level alerts are not supported yet. Configure alerts on individual endpoints within the suite instead.
|
||||
|
||||
#### Using Context in Endpoints
|
||||
Once values are stored in the context, they can be referenced in subsequent endpoints:
|
||||
- In the URL: `https://api.example.com/users/[CONTEXT].userId`
|
||||
- In headers: `Authorization: Bearer [CONTEXT].authToken`
|
||||
- In the body: `{"user_id": "[CONTEXT].userId"}`
|
||||
- In conditions: `[BODY].server_ip == [CONTEXT].serverIp`
|
||||
|
||||
#### Example Suite Configuration
|
||||
```yaml
|
||||
suites:
|
||||
- name: item-crud-workflow
|
||||
group: api-tests
|
||||
interval: 5m
|
||||
context:
|
||||
price: "19.99" # Initial static value in context
|
||||
endpoints:
|
||||
# Step 1: Create an item and store the item ID
|
||||
- name: create-item
|
||||
url: https://api.example.com/items
|
||||
method: POST
|
||||
body: '{"name": "Test Item", "price": "[CONTEXT].price"}'
|
||||
conditions:
|
||||
- "[STATUS] == 201"
|
||||
- "len([BODY].id) > 0"
|
||||
- "[BODY].price == [CONTEXT].price"
|
||||
store:
|
||||
itemId: "[BODY].id"
|
||||
alerts:
|
||||
- type: slack
|
||||
description: "Failed to create item"
|
||||
|
||||
# Step 2: Update the item using the stored item ID
|
||||
- name: update-item
|
||||
url: https://api.example.com/items/[CONTEXT].itemId
|
||||
method: PUT
|
||||
body: '{"price": "24.99"}'
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
alerts:
|
||||
- type: slack
|
||||
description: "Failed to update item"
|
||||
|
||||
# Step 3: Fetch the item and validate the price
|
||||
- name: get-item
|
||||
url: https://api.example.com/items/[CONTEXT].itemId
|
||||
method: GET
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].price == 24.99"
|
||||
alerts:
|
||||
- type: slack
|
||||
description: "Item price did not update correctly"
|
||||
|
||||
# Step 4: Delete the item (always-run: true to ensure cleanup even if step 2 or 3 fails)
|
||||
- name: delete-item
|
||||
url: https://api.example.com/items/[CONTEXT].itemId
|
||||
method: DELETE
|
||||
always-run: true
|
||||
conditions:
|
||||
- "[STATUS] == 204"
|
||||
alerts:
|
||||
- type: slack
|
||||
description: "Failed to delete item"
|
||||
```
|
||||
|
||||
The suite will be considered successful only if all required endpoints pass their conditions.
|
||||
|
||||
|
||||
### Conditions
|
||||
Here are some examples of conditions you can use:
|
||||
|
||||
@@ -2921,17 +3018,34 @@ endpoints:
|
||||
> using the `[DOMAIN_EXPIRATION]` placeholder on an endpoint with an interval of less than `5m`.
|
||||
|
||||
|
||||
### disable-monitoring-lock
|
||||
Setting `disable-monitoring-lock` to `true` means that multiple endpoints could be monitored at the same time (i.e. parallel execution).
|
||||
### Concurrency
|
||||
By default, Gatus allows up to 5 endpoints/suites to be monitored concurrently. This provides a balance between performance and resource usage while maintaining accurate response time measurements.
|
||||
|
||||
While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted
|
||||
by the evaluation of multiple endpoints at the same time, therefore, the default value for this parameter is `false`.
|
||||
You can configure the concurrency level using the `concurrency` parameter:
|
||||
|
||||
There are three main reasons why you might want to disable the monitoring lock:
|
||||
- You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so
|
||||
technically, if you create 100 endpoints with a 1 seconds interval, Gatus will send 100 requests per second)
|
||||
- You have a _lot_ of endpoints to monitor
|
||||
- You want to test multiple endpoints at very short intervals (< 5s)
|
||||
```yaml
|
||||
# Allow 10 endpoints/suites to be monitored concurrently
|
||||
concurrency: 10
|
||||
|
||||
# Allow unlimited concurrent monitoring
|
||||
concurrency: 0
|
||||
|
||||
# Use default concurrency (3)
|
||||
# concurrency: 3
|
||||
```
|
||||
|
||||
**Important considerations:**
|
||||
- Higher concurrency can improve monitoring performance when you have many endpoints
|
||||
- Conditions using the `[RESPONSE_TIME]` placeholder may be less accurate with very high concurrency due to system resource contention
|
||||
- Set to `0` for unlimited concurrency (equivalent to the deprecated `disable-monitoring-lock: true`)
|
||||
|
||||
**Use cases for higher concurrency:**
|
||||
- You have a large number of endpoints to monitor
|
||||
- You want to monitor endpoints at very short intervals (< 5s)
|
||||
- You're using Gatus for load testing scenarios
|
||||
|
||||
**Legacy configuration:**
|
||||
The `disable-monitoring-lock` parameter is deprecated but still supported for backward compatibility. It's equivalent to setting `concurrency: 0`.
|
||||
|
||||
|
||||
### Reloading configuration on the fly
|
||||
|
||||
Reference in New Issue
Block a user