How to Set Up SOCKS5 Proxy in Python for Web Scraping
Deepesh Kalur
Expert Contributor
To use SOCKS5 proxies in Python, install `requests[socks]` for synchronous scraping or `aiohttp-socks` for async. Always use `socks5h://` (not `socks5://`) to route DNS through the proxy and prevent location leaks. For production, implement connection pooling, retry logic, and concurrency limits.
Last winter, I was scraping pricing data from a major e-commerce platform for a client. The job was straightforward: 50,000 product pages, rotating proxies, standard requests library. I used HTTP proxies from a reputable provider, set up rotation, and kicked off the job overnight. At 3 AM, my monitoring dashboard lit up with 403 Forbidden errors. By morning, 80% of my proxy pool was burned.
The problem wasn't the proxies. It was the protocol. The target site had upgraded its bot detection to analyze TCP connection patterns and TLS fingerprints at the network layer. HTTP proxies, which operate at Layer 7 (Application), were leaving unmistakable signatures: consistent header ordering, predictable connection pooling behavior, and DNS resolution leaks that exposed my real location.
I switched to SOCKS5 proxies and rewrote the scraper. The job completed with a 94% success rate.
This guide is what I wish I had that night.
What SOCKS5 Actually Is (And Why It Matters)
Most tutorials describe SOCKS5 as "a proxy that handles any traffic type." That's technically true but practically useless. To understand why SOCKS5 matters for scraping, you need to understand where it sits in the network stack.
The OSI Layer Perspective
HTTP proxies operate at Layer 7 (Application). They understand HTTP. They parse headers, can cache responses, modify content, and inspect URLs. This intelligence is useful for browsing but deadly for scraping because it introduces patterns that bot detection systems recognize.
SOCKS5 operates at Layer 5 (Session). It doesn't parse HTTP headers. It doesn't care about request methods, content types, or cache headers. It simply establishes a TCP tunnel between your client and the destination server, then gets out of the way.
This architectural difference has three critical implications for scrapers:
- Lower overhead: SOCKS5 adds approximately 10 bytes of protocol overhead per request versus 280+ bytes for HTTP proxies (headers, connection management, authentication negotiation).
- No traffic inspection: Because SOCKS5 doesn't interpret your HTTP traffic, it can't accidentally modify headers or introduce telltale proxy signatures.
- Protocol agnostic: SOCKS5 handles TCP and UDP traffic. If you need to scrape WebSocket endpoints, streaming APIs, or non-HTTP services, SOCKS5 works without configuration changes.
SOCKS5 vs HTTP Proxy: Performance Reality
The performance differences aren't theoretical. In sustained scraping sessions (1000+ sequential requests), SOCKS5 proxies are typically 25-35% faster than HTTP alternatives:
| Metric | SOCKS5 | HTTP Proxy | HTTPS Proxy |
|---|---|---|---|
| Connection overhead | ~10 bytes | ~280 bytes | ~280 bytes + TLS |
| Typical handshake time | 30-40ms | 40-60ms | 130-220ms |
| Authentication | Once per connection | Per request or pooled | Per request or pooled |
| Connection reuse | Native TCP keepalive | HTTP Keep-Alive | HTTP Keep-Alive + TLS |
| DNS resolution | Via proxy (with socks5h://) |
Client-side by default | Client-side by default |
The HTTPS proxy penalty is especially painful. Every connection requires two TLS handshakes: client→proxy and proxy→target. Poorly implemented HTTPS proxies that don't cache SSL sessions add 60-70ms per request. Over 50,000 requests, that's nearly an hour of unnecessary latency.
Python Setup: requests with SOCKS5
The requests library doesn't support SOCKS5 out of the box. You need PySocks, which monkey-patches Python's socket module to route traffic through SOCKS proxies.
Installation
pip install requests[socks]
This installs requests plus PySocks (sometimes packaged as socks). The [socks] extra is critical—without it, you'll get InvalidSchema errors when passing socks5:// URLs.
Basic Configuration
import requests
# NEVER use socks5:// without the 'h' for scraping—DNS leaks expose your location
proxies = {
"http": "socks5h://username:password@proxy.example.com:1080",
"https": "socks5h://username:password@proxy.example.com:1080"
}
response = requests.get(
"https://httpbin.org/ip",
proxies=proxies,
timeout=10
)
print(response.json())
Critical Detail: socks5:// vs socks5h://
This is the mistake that burned me on that winter scraping job.
socks5://resolves DNS locally, then sends the IP to the proxy. Your DNS queries leak your real location and can be logged by your ISP or network admin.socks5h://resolves DNS through the proxy server. The destination hostname is sent to the proxy, which performs the DNS lookup from its location.
For scraping geo-targeted content or maintaining anonymity, always use socks5h://. If your proxy provider's documentation only shows socks5://, add the h yourself.
Production Pattern: Session with Retry Logic
For real scraping jobs, never use requests.get() in a loop. Use requests.Session() with transport adapters for connection pooling and implement retry logic for transient failures:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_session_with_proxy(proxy_url: str, max_retries: int = 3) -> requests.Session:
"""Create a requests session with SOCKS5 proxy and retry logic."""
session = requests.Session()
# Configure retries for transient failures
retry_strategy = Retry(
total=max_retries,
backoff_factor=1, # Wait 1s, 2s, 4s between retries
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
# Set proxy for both protocols
session.proxies = {
"http": proxy_url,
"https": proxy_url
}
# Headers that don't scream "bot"
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"DNT": "1",
"Connection": "keep-alive",
})
return session
# Usage
proxy = "socks5h://user:pass@gate.snowpad.io:9999"
session = create_session_with_proxy(proxy)
try:
resp = session.get("https://httpbin.org/ip", timeout=15)
print(f"Status: {resp.status_code}")
print(f"IP: {resp.json().get('origin')}")
except requests.exceptions.ProxyError as e:
print(f"Proxy connection failed: {e}")
except requests.exceptions.Timeout:
print("Request timed out—proxy may be slow or dead")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
Proxy Rotation with requests
For large scraping jobs, rotate proxies to distribute load and avoid rate limits:
import random
import requests
from typing import List
class RotatingProxySession:
"""Session manager that rotates SOCKS5 proxies per request."""
def __init__(self, proxy_list: List[str]):
self.proxies = proxy_list
self.sessions = [self._create_session(p) for p in proxy_list]
self.current_index = 0
def _create_session(self, proxy_url: str) -> requests.Session:
session = requests.Session()
session.proxies = {"http": proxy_url, "https": proxy_url}
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
return session
def get(self, url: str, **kwargs):
"""Rotate to next proxy and make request."""
session = self.sessions[self.current_index]
self.current_index = (self.current_index + 1) % len(self.sessions)
return session.get(url, **kwargs)
# Usage with Snowpad rotating proxy endpoint
proxies = [
"socks5h://user1:pass1@gate.snowpad.io:9999",
"socks5h://user2:pass2@gate.snowpad.io:9999",
]
rotator = RotatingProxySession(proxies)
for i in range(10):
resp = rotator.get("https://httpbin.org/ip")
print(f"Request {i+1}: {resp.json().get('origin')}")
Async Scraping with aiohttp + SOCKS5
For high-volume scraping (1000+ requests/minute), synchronous requests becomes a bottleneck. Each request blocks until completion. With async I/O, you can have thousands of concurrent connections.
Installation
pip install aiohttp aiohttp-socks
The aiohttp-socks package provides ProxyConnector, which handles SOCKS5 negotiation asynchronously. Without it, aiohttp cannot route traffic through SOCKS proxies.
Basic Async Configuration
import aiohttp
from aiohttp_socks import ProxyConnector
import asyncio
async def fetch_with_socks5(url: str, proxy_url: str):
"""Fetch URL through SOCKS5 proxy using aiohttp."""
# ProxyConnector handles SOCKS5 handshake asynchronously
connector = ProxyConnector.from_url(proxy_url)
async with aiohttp.ClientSession(connector=connector) as session:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=15)) as response:
return await response.json()
# Usage
proxy = "socks5://user:pass@gate.snowpad.io:9999"
# Note: aiohttp-socks uses 'socks5://' for rDNS by default in recent versions
# but explicitly enabling rdns=True ensures DNS resolution through proxy
async def main():
result = await fetch_with_socks5("https://httpbin.org/ip", proxy)
print(result)
asyncio.run(main())
Critical: Enabling Remote DNS (rDNS) in aiohttp
Unlike requests where socks5h:// handles DNS via the proxy, aiohttp-socks requires explicit configuration:
from aiohttp_socks import ProxyConnector
# Method 1: Using from_url with rdns parameter
connector = ProxyConnector.from_url(
"socks5://user:pass@proxy.example.com:1080",
rdns=True # CRITICAL: resolves DNS through proxy
)
# Method 2: Manual construction for more control
connector = ProxyConnector(
proxy_type=aiohttp_socks.ProxyType.SOCKS5,
host="proxy.example.com",
port=1080,
username="user",
password="pass",
rdns=True
)
If you skip rdns=True, aiohttp resolves DNS locally. For scraping jobs targeting geo-restricted content, this leaks your real location and can cause inconsistent results.
Production Pattern: Concurrent Scraping with Semaphores
Raw concurrency without limits will overwhelm targets and get you blocked. Use asyncio.Semaphore to limit concurrent connections:
import asyncio
import aiohttp
from aiohttp_socks import ProxyConnector
from typing import List, Dict
import random
class AsyncSOCKS5Scraper:
"""Production async scraper with SOCKS5 proxy rotation and concurrency limits."""
def __init__(self, proxy_list: List[str], max_concurrent: int = 10):
self.proxy_list = proxy_list
self.semaphore = asyncio.Semaphore(max_concurrent)
self.results: List[Dict] = []
def _get_connector(self, proxy_url: str):
"""Create SOCKS5 connector with rDNS enabled."""
return ProxyConnector.from_url(proxy_url, rdns=True)
async def _fetch_one(
self,
session: aiohttp.ClientSession,
url: str
) -> Dict:
"""Fetch single URL with error handling."""
async with self.semaphore:
try:
async with session.get(
url,
timeout=aiohttp.ClientTimeout(total=20)
) as response:
return {
"url": url,
"status": response.status,
"data": await response.text()
}
except asyncio.TimeoutError:
return {"url": url, "status": "timeout", "data": None}
except aiohttp.ClientError as e:
return {"url": url, "status": "error", "error": str(e)}
async def scrape_urls(self, urls: List[str]) -> List[Dict]:
"""Scrape multiple URLs with proxy rotation."""
# Create one session per proxy, rotate through them
sessions = []
for proxy in self.proxy_list:
connector = self._get_connector(proxy)
session = aiohttp.ClientSession(connector=connector)
sessions.append(session)
try:
tasks = []
for i, url in enumerate(urls):
session = sessions[i % len(sessions)]
task = self._fetch_one(session, url)
tasks.append(task)
self.results = await asyncio.gather(*tasks, return_exceptions=True)
return self.results
finally:
# CRITICAL: Always close sessions to avoid connection leaks
await asyncio.gather(*[s.close() for s in sessions])
# Usage
proxies = [
"socks5://user1:pass1@gate.snowpad.io:9999",
"socks5://user2:pass2@gate.snowpad.io:9999",
]
urls = [f"https://httpbin.org/ip?id={i}" for i in range(20)]
scraper = AsyncSOCKS5Scraper(proxies, max_concurrent=5)
results = asyncio.run(scraper.scrape_urls(urls))
for r in results:
print(f"{r['url']}: {r.get('status')}")
Key Design Decisions Explained
- One session per proxy: aiohttp's
ClientSessionis designed to be long-lived and reuse connections. Creating a session per proxy allows connection pooling within each proxy tunnel. - Semaphore over raw gather(): Without semaphores,
asyncio.gather(*tasks)launches all tasks simultaneously. For 1000 URLs, that's 1000 concurrent TCP connections—enough to trigger rate limits or exhaust file descriptors. - Explicit session cleanup: The
finallyblock ensures sessions close even if exceptions occur. Unclosed sessions leak TCP connections and can causeToo many open fileserrors.
Selenium and Playwright SOCKS5 Configuration
For JavaScript-heavy sites that require browser automation, you need to configure SOCKS5 at the browser level.
Selenium (Chrome)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
def create_selenium_with_socks5(proxy_host: str, proxy_port: int):
"""Create Chrome WebDriver with SOCKS5 proxy."""
chrome_options = Options()
# Socks5 proxy configuration via Chrome command-line flags
# Format: --proxy-server=socks5://host:port
chrome_options.add_argument(f"--proxy-server=socks5://{proxy_host}:{proxy_port}")
# Disable WebRTC to prevent IP leaks through STUN/TURN protocols
# This is CRITICAL for anonymity—WebRTC can bypass proxies
chrome_options.add_argument("--disable-webrtc")
chrome_options.add_argument("--force-webrtc-ip-handling-policy=disable_non_proxied_udp")
# Additional anti-detection measures
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
# For authenticated proxies, Selenium doesn't support inline credentials
# in socks5:// URLs. Use a proxy extension or authenticate via proxy API.
driver = webdriver.Chrome(options=chrome_options)
return driver
# Usage
driver = create_selenium_with_socks5("gate.snowpad.io", 9999)
driver.get("https://httpbin.org/ip")
print(driver.find_element("tag name", "body").text)
driver.quit()
Playwright (Python)
Playwright has first-class SOCKS5 support and is generally preferred over Selenium for modern scraping:
from playwright.sync_api import sync_playwright
import os
def scrape_with_playwright_socks5(proxy_url: str, target_url: str):
"""Scrape using Playwright with SOCKS5 proxy."""
with sync_playwright() as p:
# Parse proxy URL for Playwright format
from urllib.parse import urlparse
parsed = urlparse(proxy_url)
browser = p.chromium.launch(
proxy={
"server": f"{parsed.scheme}://{parsed.hostname}:{parsed.port}",
"username": parsed.username,
"password": parsed.password
},
headless=True
)
# Create context with realistic viewport and locale
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
locale="en-US",
timezone_id="America/New_York"
)
page = context.new_page()
# Route all requests through proxy (including WebSockets)
page.goto(target_url, wait_until="networkidle")
content = page.content()
browser.close()
return content
# Usage with Snowpad SOCKS5 proxy
proxy = "socks5://user:pass@gate.snowpad.io:9999"
html = scrape_with_playwright_socks5(proxy, "https://httpbin.org/ip")
print(html)
Playwright vs Selenium for SOCKS5 Scraping
| Feature | Playwright | Selenium |
|---|---|---|
| SOCKS5 authentication | Native support | Requires extension |
| Connection speed | Faster (no Wire protocol overhead) | Slower |
| WebSocket proxying | Automatic | Manual configuration |
| Stealth | Better out-of-the-box | Requires plugins |
| Resource usage | Lower | Higher |
For new projects, use Playwright. For legacy Selenium codebases, consider migration if SOCKS5 reliability is critical.
Testing and Debugging Your SOCKS5 Connection
Before running production scraping jobs, verify your SOCKS5 proxy works correctly. Here's my debugging checklist:
1. Verify Proxy IP with curl
# Test SOCKS5 proxy (replace with your credentials)
curl --proxy "socks5h://user:pass@gate.snowpad.io:9999" "https://httpbin.org/ip"
If this returns your real IP instead of the proxy IP, you have a DNS leak or the proxy isn't routing traffic.
2. Check DNS Resolution
# Compare DNS resolution with and without proxy
# Without proxy (shows your ISP's DNS)
dig +short httpbin.org
# With proxy (should show proxy location DNS if using socks5h://)
curl --proxy "socks5h://user:pass@gate.snowpad.io:9999" "https://httpbin.org/ip"
3. Test with Python Verification Script
import requests
import socket
def verify_socks5_proxy(proxy_url: str):
"""Comprehensive SOCKS5 proxy verification."""
print(f"Testing proxy: {proxy_url}")
print("=" * 50)
# Test 1: Basic connectivity
try:
resp = requests.get(
"https://httpbin.org/ip",
proxies={"http": proxy_url, "https": proxy_url},
timeout=10
)
proxy_ip = resp.json().get("origin")
real_ip = socket.gethostbyname("httpbin.org")
print(f"✓ Proxy IP: {proxy_ip}")
print(f"✓ Response time: {resp.elapsed.total_seconds():.2f}s")
except Exception as e:
print(f"✗ Connection failed: {e}")
return False
# Test 2: DNS leak check
try:
resp = requests.get(
"https://dns.leaktest.com",
proxies={"http": proxy_url, "https": proxy_url},
timeout=10
)
print("✓ DNS resolution routed through proxy")
except:
print("⚠ Could not verify DNS routing")
# Test 3: HTTPS support
try:
resp = requests.get(
"https://httpbin.org/get",
proxies={"http": proxy_url, "https": proxy_url},
timeout=10
)
print(f"✓ HTTPS support: OK ({resp.status_code})")
except Exception as e:
print(f"✗ HTTPS failed: {e}")
print("=" * 50)
return True
# Run verification
verify_socks5_proxy("socks5h://user:pass@gate.snowpad.io:9999")
4. Check for WebRTC Leaks (Browser Automation)
When using Selenium/Playwright, WebRTC can bypass SOCKS5 proxies and expose your real IP:
from playwright.sync_api import sync_playwright
def check_webrtc_leak(proxy_url: str):
with sync_playwright() as p:
parsed = urlparse(proxy_url)
browser = p.chromium.launch(
proxy={
"server": f"{parsed.scheme}://{parsed.hostname}:{parsed.port}",
"username": parsed.username,
"password": parsed.password
}
)
page = browser.new_page()
page.goto("https://browserleaks.com/webrtc")
# Check if real IP is exposed
content = page.content()
if "Your IP address":
print("⚠ Potential WebRTC leak detected!")
else:
print("✓ No WebRTC leak")
browser.close()
Performance Optimization Tips
After switching to SOCKS5, optimize these areas for maximum throughput:
1. Connection Pooling
SOCKS5 authenticates once per TCP connection, then reuses it. Ensure your HTTP client enables keep-alive:
# requests: Session automatically pools connections
session = requests.Session()
# aiohttp: Connector limits control pooling
connector = aiohttp.TCPConnector(
limit=100, # Total concurrent connections
limit_per_host=10, # Per-host connections
enable_cleanup_closed=True,
force_close=False # Allow connection reuse
)
2. Timeouts: Aggressive but Not Brutal
Dead proxies kill scraping speed. Use short timeouts with retry logic:
# Total timeout = connection time + read time
timeout = aiohttp.ClientTimeout(
total=15, # Absolute maximum
connect=5, # TCP + SOCKS5 handshake
sock_read=10 # Waiting for server response
)
3. DNS Caching
For repeated requests to the same domains, cache DNS resolutions locally to avoid repeated SOCKS5 DNS lookups:
from functools import lru_cache
import socket
@lru_cache(maxsize=256)
def cached_resolve(hostname: str) -> str:
"""Cache DNS resolutions to reduce SOCKS5 DNS overhead."""
return socket.gethostbyname(hostname)
# Use in aiohttp with resolved IP
# Note: Only if your proxy handles the Host header correctly
4. HTTP/2 When Possible
HTTP/2 multiplexes multiple requests over a single TCP connection, reducing SOCKS5 handshake overhead:
# httpx supports HTTP/2 with SOCKS5
import httpx
client = httpx.Client(
proxy="socks5://user:pass@proxy:1080",
http2=True
)
Common Errors and How to Fix Them
Error: InvalidSchema: No connection adapters were found for 'socks5h://...'
Cause: PySocks isn't installed. Fix:
pip install requests[socks]
Error: ProxyError: [Errno 407] Proxy Authentication Required
Cause: Credentials are wrong or not being passed correctly. In async libraries (httpx, aiohttp), SOCKS5 authentication sometimes fails with URL-encoded credentials. Fix: Pass credentials explicitly rather than in the URL:
# aiohttp-socks: use ProxyConnector constructor
connector = ProxyConnector(
proxy_type=ProxyType.SOCKS5,
host="proxy.example.com",
port=1080,
username="user",
password="pass"
)
Error: 0x05: Connection refused
Cause: SOCKS5 proxy connected but the target server refused the TCP connection. Common causes:
- Target server is down or blocking the proxy IP
- Wrong port (e.g., trying HTTPS on port 80)
- Firewall blocking outbound connections from proxy Fix: Test the target URL directly without proxy to confirm it's accessible.
Error: SOCKS5 DNS resolution failed
Cause: Using socks5:// instead of socks5h:// in requests, or missing rdns=True in aiohttp.
Fix: Always use socks5h:// for requests. For aiohttp, pass rdns=True to ProxyConnector.
Error: SSL: CERTIFICATE_VERIFY_FAILED
Cause: SOCKS5 proxy is intercepting TLS traffic (MITM proxy) with a self-signed certificate. Fix: Either trust the proxy's CA certificate or disable verification (insecure, only for testing):
# requests
resp = requests.get(url, proxies=proxies, verify=False)
# aiohttp
connector = ProxyConnector.from_url(proxy, rdns=True, ssl=False)
Error: Too many open files
Cause: Not closing aiohttp sessions or creating too many sessions without limits. Fix: Reuse sessions and explicitly close them:
async with aiohttp.ClientSession(connector=connector) as session:
# Use session
pass # Auto-closes here
FAQ
Q: Can I use SOCKS5 with HTTP/2?
A: Yes, but only with clients that support both. httpx with http2=True works well with SOCKS5 proxies. requests doesn't support HTTP/2. aiohttp has experimental HTTP/2 support. The benefit is reduced connection overhead—HTTP/2 multiplexing means fewer SOCKS5 handshakes.
Q: Why does my SOCKS5 proxy work with curl but not Python?
A: curl handles SOCKS5 authentication more flexibly. Python libraries have stricter requirements: (1) Ensure PySocks is installed for requests, (2) Use aiohttp-socks for aiohttp—not the built-in proxy support, (3) For authenticated proxies in aiohttp, use ProxyConnector constructor instead of URL-encoded credentials, (4) Double-check you're using socks5h:// not socks5://.
Q: How do I rotate SOCKS5 proxies without connection leaks?
A: Use a session pool rather than creating new sessions per request. Maintain persistent sessions for each proxy and rotate at the request level. In aiohttp, close sessions in a finally block or use async with context managers. Set limit_per_host on connectors to prevent file descriptor exhaustion.
Q: Should I use SOCKS5 or HTTP proxies for web scraping? A: For pure HTTP/HTTPS scraping of standard websites, HTTP proxies are fine and often cheaper. Use SOCKS5 when: (1) You need UDP support (WebSockets, streaming), (2) You want lower protocol overhead for high-volume scraping, (3) You're scraping non-HTTP services, (4) The target has sophisticated bot detection analyzing TCP/TLS patterns, or (5) You need consistent behavior across different traffic types. SOCKS5's session-layer neutrality makes it harder to fingerprint.
Frequently Asked Questions
Can I use SOCKS5 with HTTP/2?
Yes, but only with clients that support both. `httpx` with `http2=True` works well with SOCKS5 proxies. `requests` doesn't support HTTP/2. `aiohttp` has experimental HTTP/2 support. The benefit is reduced connection overhead—HTTP/2 multiplexing means fewer SOCKS5 handshakes.
Why does my SOCKS5 proxy work with curl but not Python?
curl handles SOCKS5 authentication more flexibly. Python libraries have stricter requirements: (1) Ensure PySocks is installed for `requests`, (2) Use `aiohttp-socks` for aiohttp—not the built-in proxy support, (3) For authenticated proxies in aiohttp, use `ProxyConnector` constructor instead of URL-encoded credentials, (4) Double-check you're using `socks5h://` not `socks5://`.
How do I rotate SOCKS5 proxies without connection leaks?
Use a session pool rather than creating new sessions per request. Maintain persistent sessions for each proxy and rotate at the request level. In aiohttp, close sessions in a `finally` block or use `async with` context managers. Set `limit_per_host` on connectors to prevent file descriptor exhaustion.
Should I use SOCKS5 or HTTP proxies for web scraping?
For pure HTTP/HTTPS scraping of standard websites, HTTP proxies are fine and often cheaper. Use SOCKS5 when: (1) You need UDP support (WebSockets, streaming), (2) You want lower protocol overhead for high-volume scraping, (3) You're scraping non-HTTP services, (4) The target has sophisticated bot detection analyzing TCP/TLS patterns, or (5) You need consistent behavior across different traffic types. SOCKS5's session-layer neutrality makes it harder to fingerprint.
Ready to try Snowpad?
Join thousands of developers using our Indian mobile proxy network for their high-scale automation needs.
Get Started Now