feat(suite): Implement Suites (#1239)

* feat(suite): Implement Suites Fixes #1230 * Update docs * Fix variable alignment * Prevent always-run endpoint from running if a context placeholder fails to resolve in the URL * Return errors when a context placeholder path fails to resolve * Add a couple of unit tests * Add a couple of unit tests * fix(ui): Update group count properly Fixes #1233 * refactor: Pass down entire config instead of several sub-configs * fix: Change default suite interval and timeout * fix: Deprecate disable-monitoring-lock in favor of concurrency * fix: Make sure there are no duplicate keys * Refactor some code * Update watchdog/watchdog.go * Update web/app/src/components/StepDetailsModal.vue Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: Remove useless log * fix: Set default concurrency to 3 instead of 5 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-22 20:10:07 +00:00 · 2025-09-05 15:39:12 -04:00
parent 10cabb9dde
commit d668a14703
74 changed files with 7513 additions and 652 deletions
--- a/README.md
+++ b/README.md
@@ -45,6 +45,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
 - [Configuration](#configuration)
  - [Endpoints](#endpoints)
  - [External Endpoints](#external-endpoints)
+  - [Suites (ALPHA)](#suites-alpha)
  - [Conditions](#conditions)
    - [Placeholders](#placeholders)
    - [Functions](#functions)
@@ -122,7 +123,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
  - [Monitoring an endpoint using STARTTLS](#monitoring-an-endpoint-using-starttls)
  - [Monitoring an endpoint using TLS](#monitoring-an-endpoint-using-tls)
  - [Monitoring domain expiration](#monitoring-domain-expiration)
-  - [disable-monitoring-lock](#disable-monitoring-lock)
+  - [Concurrency](#concurrency)
  - [Reloading configuration on the fly](#reloading-configuration-on-the-fly)
  - [Endpoint groups](#endpoint-groups)
  - [How do I sort by group by default?](#how-do-i-sort-by-group-by-default)
@@ -247,7 +248,8 @@ If you want to test it locally, see [Docker](#docker).
 | `endpoints`                  | [Endpoints configuration](#endpoints).                                                                                                   | Required `[]`              |
 | `external-endpoints`         | [External Endpoints configuration](#external-endpoints).                                                                                 | `[]`                       |
 | `security`                   | [Security configuration](#security).                                                                                                     | `{}`                       |
-| `disable-monitoring-lock`    | Whether to [disable the monitoring lock](#disable-monitoring-lock).                                                                      | `false`                    |
+| `concurrency`                | Maximum number of endpoints/suites to monitor concurrently. Set to `0` for unlimited. See [Concurrency](#concurrency).                   | `3`                        |
+| `disable-monitoring-lock`    | Whether to [disable the monitoring lock](#disable-monitoring-lock). **Deprecated**: Use `concurrency: 0` instead.                        | `false`                    |
 | `skip-invalid-config-update` | Whether to ignore invalid configuration update. <br />See [Reloading configuration on the fly](#reloading-configuration-on-the-fly).     | `false`                    |
 | `web`                        | Web configuration.                                                                                                                       | `{}`                       |
 | `web.address`                | Address to listen on.                                                                                                                    | `0.0.0.0`                  |
@@ -309,6 +311,8 @@ You can then configure alerts to be triggered when an endpoint is unhealthy once
 | `endpoints[].ui.dont-resolve-failed-conditions` | Whether to resolve failed conditions for the UI.                                                                                            | `false`                    |
 | `endpoints[].ui.badge.response-time`            | List of response time thresholds. Each time a threshold is reached, the badge has a different color.                                        | `[50, 200, 300, 500, 750]` |
 | `endpoints[].extra-labels`                      | Extra labels to add to the metrics. Useful for grouping endpoints together.                                                                 | `{}`                       |
+| `endpoints[].always-run`                        | (SUITES ONLY) Whether to execute this endpoint even if previous endpoints in the suite failed.                                              | `false`                    |
+| `endpoints[].store`                             | (SUITES ONLY) Map of values to extract from the response and store in the suite context (stored even on failure).                           | `{}`                       |

 You may use the following placeholders in the body (`endpoints[].body`):
 - `[ENDPOINT_NAME]` (resolved from `endpoints[].name`)
@@ -366,6 +370,99 @@ Where:
 You must also pass the token as a `Bearer` token in the `Authorization` header.


+### Suites (ALPHA)
+Suites are collections of endpoints that are executed sequentially with a shared context. 
+This allows you to create complex monitoring scenarios where the result from one endpoint can be used in subsequent endpoints, enabling workflow-style monitoring.
+
+Here are a few cases in which suites could be useful:
+- Testing multi-step authentication flows (login -> access protected resource -> logout)
+- API workflows where you need to chain requests (create resource -> update -> verify -> delete)
+- Monitoring business processes that span multiple services
+- Validating data consistency across multiple endpoints
+
+| Parameter                         | Description                                                                                         | Default       |
+|:----------------------------------|:----------------------------------------------------------------------------------------------------|:--------------|
+| `suites`                          | List of suites to monitor.                                                                          | `[]`          |
+| `suites[].enabled`                | Whether to monitor the suite.                                                                       | `true`        |
+| `suites[].name`                   | Name of the suite. Must be unique.                                                                  | Required `""` |
+| `suites[].group`                  | Group name. Used to group multiple suites together on the dashboard.                                | `""`          |
+| `suites[].interval`               | Duration to wait between suite executions.                                                          | `10m`         |
+| `suites[].timeout`                | Maximum duration for the entire suite execution.                                                    | `5m`          |
+| `suites[].context`                | Initial context values that can be referenced by endpoints.                                         | `{}`          |
+| `suites[].endpoints`              | List of endpoints to execute sequentially.                                                          | Required `[]` |
+| `suites[].endpoints[].store`      | Map of values to extract from the response and store in the suite context (stored even on failure). | `{}`          |
+| `suites[].endpoints[].always-run` | Whether to execute this endpoint even if previous endpoints in the suite failed.                    | `false`       |
+
+**Note**: Suite-level alerts are not supported yet. Configure alerts on individual endpoints within the suite instead.
+
+#### Using Context in Endpoints
+Once values are stored in the context, they can be referenced in subsequent endpoints:
+- In the URL: `https://api.example.com/users/[CONTEXT].userId`
+- In headers: `Authorization: Bearer [CONTEXT].authToken`
+- In the body: `{"user_id": "[CONTEXT].userId"}`
+- In conditions: `[BODY].server_ip == [CONTEXT].serverIp`
+
+#### Example Suite Configuration
+```yaml
+suites:
+  - name: item-crud-workflow
+    group: api-tests
+    interval: 5m
+    context:
+      price: "19.99"  # Initial static value in context
+    endpoints:
+      # Step 1: Create an item and store the item ID 
+      - name: create-item
+        url: https://api.example.com/items
+        method: POST
+        body: '{"name": "Test Item", "price": "[CONTEXT].price"}'
+        conditions:
+          - "[STATUS] == 201"
+          - "len([BODY].id) > 0"
+          - "[BODY].price == [CONTEXT].price"
+        store:
+          itemId: "[BODY].id"
+        alerts:
+          - type: slack
+            description: "Failed to create item"
+            
+      # Step 2: Update the item using the stored item ID
+      - name: update-item
+        url: https://api.example.com/items/[CONTEXT].itemId
+        method: PUT
+        body: '{"price": "24.99"}'
+        conditions:
+          - "[STATUS] == 200"
+        alerts:
+          - type: slack
+            description: "Failed to update item"
+        
+      # Step 3: Fetch the item and validate the price
+      - name: get-item
+        url: https://api.example.com/items/[CONTEXT].itemId
+        method: GET
+        conditions:
+          - "[STATUS] == 200"
+          - "[BODY].price == 24.99"
+        alerts:
+          - type: slack
+            description: "Item price did not update correctly"
+            
+      # Step 4: Delete the item (always-run: true to ensure cleanup even if step 2 or 3 fails)
+      - name: delete-item
+        url: https://api.example.com/items/[CONTEXT].itemId
+        method: DELETE
+        always-run: true
+        conditions:
+          - "[STATUS] == 204"
+        alerts:
+          - type: slack
+            description: "Failed to delete item"
+```
+
+The suite will be considered successful only if all required endpoints pass their conditions.
+
+
 ### Conditions
 Here are some examples of conditions you can use:

@@ -2921,17 +3018,34 @@ endpoints:
 > using the `[DOMAIN_EXPIRATION]` placeholder on an endpoint with an interval of less than `5m`.


-### disable-monitoring-lock
-Setting `disable-monitoring-lock` to `true` means that multiple endpoints could be monitored at the same time (i.e. parallel execution).
+### Concurrency
+By default, Gatus allows up to 5 endpoints/suites to be monitored concurrently. This provides a balance between performance and resource usage while maintaining accurate response time measurements.

-While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted
-by the evaluation of multiple endpoints at the same time, therefore, the default value for this parameter is `false`.
+You can configure the concurrency level using the `concurrency` parameter:

-There are three main reasons why you might want to disable the monitoring lock:
- You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so
-technically, if you create 100 endpoints with a 1 seconds interval, Gatus will send 100 requests per second)
- You have a _lot_ of endpoints to monitor
- You want to test multiple endpoints at very short intervals (< 5s)
+```yaml
+# Allow 10 endpoints/suites to be monitored concurrently
+concurrency: 10
+
+# Allow unlimited concurrent monitoring
+concurrency: 0
+
+# Use default concurrency (3)
+# concurrency: 3
+```
+
+**Important considerations:**
+- Higher concurrency can improve monitoring performance when you have many endpoints
+- Conditions using the `[RESPONSE_TIME]` placeholder may be less accurate with very high concurrency due to system resource contention
+- Set to `0` for unlimited concurrency (equivalent to the deprecated `disable-monitoring-lock: true`)
+
+**Use cases for higher concurrency:**
+- You have a large number of endpoints to monitor
+- You want to monitor endpoints at very short intervals (< 5s)  
+- You're using Gatus for load testing scenarios
+
+**Legacy configuration:**
+The `disable-monitoring-lock` parameter is deprecated but still supported for backward compatibility. It's equivalent to setting `concurrency: 0`.


 ### Reloading configuration on the fly