Your RSA-2048 keys break in 2030. Find every one of them before attackers do.
🐍 PyPI

GHSA-23j4-mw76-5v7h

MEDIUM

Scrapy allows redirect following in protocols other than HTTP

Published
May 14, 2024
Updated
Nov 28, 2024
Affected
1 pkg
Patched
1 / 1
Exploits
None indexed

Blast Radius

1 pkg affected
🐍scrapy

Real-time download stats are indexed for npm and PyPI packages. This vulnerability affects PyPI packages — download data is not available via public APIs for these ecosystems.

Description

Impact

Scrapy was following redirects regardless of the URL protocol, so redirects were working for data://, file://, ftp://, s3://, and any other scheme defined in the DOWNLOAD_HANDLERS setting.

However, HTTP redirects should only work between URLs that use the http:// or https:// schemes.

A malicious actor, given write access to the start requests (e.g. ability to define start_urls) of a spider and read access to the spider output, could exploit this vulnerability to:

  • Redirect to any local file using the file:// scheme to read its contents.
  • Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.
  • Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.

For file:// and s3://, how the spider implements its parsing of input data into an output item determines what data would be vulnerable. A spider that always outputs the entire contents of a response would be completely vulnerable, while a spider that extracted only fragments from the response could significantly limit vulnerable data.

Patches

Upgrade to Scrapy 2.11.2.

Workarounds

Replace the built-in retry middlewares (RedirectMiddleware and MetaRefreshMiddleware) with custom ones that implement the fix from Scrapy 2.11.2, and verify that they work as intended.

References

This security issue was reported by @mvsantos at https://github.com/scrapy/scrapy/issues/457.

Affected Packages

1 total 1 fixed
EcosystemPackageVulnerable rangeFix
🐍PyPIscrapyall versions2.11.2

Detection & mitigation playbook

Open-source dependency
  1. Detect

    Scan your dependency tree (package-lock.json, pnpm-lock.yaml, requirements.txt, go.sum, etc.) for scrapy. O3's reachability analysis confirms whether the vulnerable code path is actually invoked in your application, so you act on real exposure instead of every transitive match.

  2. Fix

    Update scrapy to 2.11.2 or later, then make sure no transitive (indirect) dependency still pins the vulnerable range — O3 confirms GHSA-23j4-mw76-5v7h is resolved across your whole dependency graph.

  3. Workarounds

    If you can't upgrade right away: gate or disable the affected feature, validate untrusted input at the boundary, and avoid passing attacker-controlled data into the vulnerable path. O3's runtime protection blocks exploitation in production as an interim safeguard until the upgrade lands.

  4. How O3 protects you

    O3 pinpoints whether GHSA-23j4-mw76-5v7h is reachable in your code and exactly where to fix it, then blocks exploitation in production at runtime until the patched version is deployed.

Tailored to GHSA-23j4-mw76-5v7h. Runtime protection reduces exposure until a permanent patch is applied and verified — it complements patching, it doesn't replace it.

Frequently Asked Questions

### Impact Scrapy was following redirects regardless of the URL protocol, so redirects were working for `data://`, `file://`, `ftp://`, `s3://`, and any other scheme defined in the `DOWNLOAD_HANDLERS` setting. However, HTTP redirects should only work between URLs that use the `http://` or `https://` schemes. A malicious actor, given write access to the start requests (e.g. ability to define `start_urls`) of a spider and read access to the spider output, could exploit this vulnerability to: - Redirect to any local file using the `file://` scheme to read its contents. - Redirect to an `ftp://
O3 Security · Impact-Aware SCA

Is GHSA-23j4-mw76-5v7h in your dependencies?

O3 detects GHSA-23j4-mw76-5v7h across PyPI dependencies and uses function-level reachability to confirm whether the vulnerable code path is actually reachable — not just present. No false positives.