GHSA-23j4-mw76-5v7h
MEDIUMScrapy allows redirect following in protocols other than HTTP
Blast Radius
scrapyReal-time download stats are indexed for npm and PyPI packages. This vulnerability affects PyPI packages — download data is not available via public APIs for these ecosystems.
Description
Impact
Scrapy was following redirects regardless of the URL protocol, so redirects were working for data://, file://, ftp://, s3://, and any other scheme defined in the DOWNLOAD_HANDLERS setting.
However, HTTP redirects should only work between URLs that use the http:// or https:// schemes.
A malicious actor, given write access to the start requests (e.g. ability to define start_urls) of a spider and read access to the spider output, could exploit this vulnerability to:
- Redirect to any local file using the
file://scheme to read its contents. - Redirect to an
ftp://URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project. - Redirect to any
s3://URL to read its content using the S3 credentials configured in the spider or project.
For file:// and s3://, how the spider implements its parsing of input data into an output item determines what data would be vulnerable. A spider that always outputs the entire contents of a response would be completely vulnerable, while a spider that extracted only fragments from the response could significantly limit vulnerable data.
Patches
Upgrade to Scrapy 2.11.2.
Workarounds
Replace the built-in retry middlewares (RedirectMiddleware and MetaRefreshMiddleware) with custom ones that implement the fix from Scrapy 2.11.2, and verify that they work as intended.
References
This security issue was reported by @mvsantos at https://github.com/scrapy/scrapy/issues/457.
Affected Packages
| Ecosystem | Package | Vulnerable range | Fix |
|---|---|---|---|
| 🐍PyPI | scrapy | all versions | 2.11.2 |
Detection & mitigation playbook
Open-source dependencyDetect
Scan your dependency tree (package-lock.json, pnpm-lock.yaml, requirements.txt, go.sum, etc.) for scrapy. O3's reachability analysis confirms whether the vulnerable code path is actually invoked in your application, so you act on real exposure instead of every transitive match.
Fix
Update scrapy to 2.11.2 or later, then make sure no transitive (indirect) dependency still pins the vulnerable range — O3 confirms GHSA-23j4-mw76-5v7h is resolved across your whole dependency graph.
Workarounds
If you can't upgrade right away: gate or disable the affected feature, validate untrusted input at the boundary, and avoid passing attacker-controlled data into the vulnerable path. O3's runtime protection blocks exploitation in production as an interim safeguard until the upgrade lands.
How O3 protects you
O3 pinpoints whether GHSA-23j4-mw76-5v7h is reachable in your code and exactly where to fix it, then blocks exploitation in production at runtime until the patched version is deployed.
Tailored to GHSA-23j4-mw76-5v7h. Runtime protection reduces exposure until a permanent patch is applied and verified — it complements patching, it doesn't replace it.
Frequently Asked Questions
Is GHSA-23j4-mw76-5v7h in your dependencies?
O3 detects GHSA-23j4-mw76-5v7h across PyPI dependencies and uses function-level reachability to confirm whether the vulnerable code path is actually reachable — not just present. No false positives.