How to Set Up SOCKS5 Proxy in Python for Web Scraping

Last winter, I was scraping pricing data from a major e-commerce platform for a client. The job was straightforward: 50,000 product pages, rotating proxies, standard requests library. I used HTTP proxies from a reputable provider, set up rotation, and kicked off the job overnight. At 3 AM, my monitoring dashboard lit up with 403 Forbidden errors. By morning, 80% of my proxy pool was burned.

The problem wasn't the proxies. It was the protocol. The target site had upgraded its bot detection to analyze TCP connection patterns and TLS fingerprints at the network layer. HTTP proxies, which operate at Layer 7 (Application), were leaving unmistakable signatures: consistent header ordering, predictable connection pooling behavior, and DNS resolution leaks that exposed my real location.

I switched to SOCKS5 proxies and rewrote the scraper. The job completed with a 94% success rate.

This guide is what I wish I had that night.

What SOCKS5 Actually Is (And Why It Matters)

Most tutorials describe SOCKS5 as "a proxy that handles any traffic type." That's technically true but practically useless. To understand why SOCKS5 matters for scraping, you need to understand where it sits in the network stack.

The OSI Layer Perspective

HTTP proxies operate at Layer 7 (Application). They understand HTTP. They parse headers, can cache responses, modify content, and inspect URLs. This intelligence is useful for browsing but deadly for scraping because it introduces patterns that bot detection systems recognize.

SOCKS5 operates at Layer 5 (Session). It doesn't parse HTTP headers. It doesn't care about request methods, content types, or cache headers. It simply establishes a TCP tunnel between your client and the destination server, then gets out of the way.

This architectural difference has three critical implications for scrapers:

Lower overhead: SOCKS5 adds approximately 10 bytes of protocol overhead per request versus 280+ bytes for HTTP proxies (headers, connection management, authentication negotiation).
No traffic inspection: Because SOCKS5 doesn't interpret your HTTP traffic, it can't accidentally modify headers or introduce telltale proxy signatures.
Protocol agnostic: SOCKS5 handles TCP and UDP traffic. If you need to scrape WebSocket endpoints, streaming APIs, or non-HTTP services, SOCKS5 works without configuration changes.

SOCKS5 vs HTTP Proxy: Performance Reality

The performance differences aren't theoretical. In sustained scraping sessions (1000+ sequential requests), SOCKS5 proxies are typically 25-35% faster than HTTP alternatives:

Metric	SOCKS5	HTTP Proxy	HTTPS Proxy
Connection overhead	~10 bytes	~280 bytes	~280 bytes + TLS
Typical handshake time	30-40ms	40-60ms	130-220ms
Authentication	Once per connection	Per request or pooled	Per request or pooled
Connection reuse	Native TCP keepalive	HTTP Keep-Alive	HTTP Keep-Alive + TLS
DNS resolution	Via proxy (with `socks5h://`)	Client-side by default	Client-side by default

The HTTPS proxy penalty is especially painful. Every connection requires two TLS handshakes: client→proxy and proxy→target. Poorly implemented HTTPS proxies that don't cache SSL sessions add 60-70ms per request. Over 50,000 requests, that's nearly an hour of unnecessary latency.

Python Setup: requests with SOCKS5

The requests library doesn't support SOCKS5 out of the box. You need PySocks, which monkey-patches Python's socket module to route traffic through SOCKS proxies.

Installation

pip install requests[socks]

This installs requests plus PySocks (sometimes packaged as socks). The [socks] extra is critical—without it, you'll get InvalidSchema errors when passing socks5:// URLs.

Basic Configuration

import requests

# NEVER use socks5:// without the 'h' for scraping—DNS leaks expose your location
proxies = {
    "http": "socks5h://username:password@proxy.example.com:1080",
    "https": "socks5h://username:password@proxy.example.com:1080"
}

response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxies,
    timeout=10
)

print(response.json())

Critical Detail: socks5:// vs socks5h://

This is the mistake that burned me on that winter scraping job.

socks5:// resolves DNS locally, then sends the IP to the proxy. Your DNS queries leak your real location and can be logged by your ISP or network admin.
socks5h:// resolves DNS through the proxy server. The destination hostname is sent to the proxy, which performs the DNS lookup from its location.

For scraping geo-targeted content or maintaining anonymity, always use socks5h://. If your proxy provider's documentation only shows socks5://, add the h yourself.

Production Pattern: Session with Retry Logic

For real scraping jobs, never use requests.get() in a loop. Use requests.Session() with transport adapters for connection pooling and implement retry logic for transient failures:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session_with_proxy(proxy_url: str, max_retries: int = 3) -> requests.Session:
    """Create a requests session with SOCKS5 proxy and retry logic."""
    session = requests.Session()
    
    # Configure retries for transient failures
    retry_strategy = Retry(
        total=max_retries,
        backoff_factor=1,  # Wait 1s, 2s, 4s between retries
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["HEAD", "GET", "OPTIONS"]
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    # Set proxy for both protocols
    session.proxies = {
        "http": proxy_url,
        "https": proxy_url
    }
    
    # Headers that don't scream "bot"
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1",
        "Connection": "keep-alive",
    })
    
    return session

# Usage
proxy = "socks5h://user:pass@gate.snowpad.io:9999"
session = create_session_with_proxy(proxy)

try:
    resp = session.get("https://httpbin.org/ip", timeout=15)
    print(f"Status: {resp.status_code}")
    print(f"IP: {resp.json().get('origin')}")
except requests.exceptions.ProxyError as e:
    print(f"Proxy connection failed: {e}")
except requests.exceptions.Timeout:
    print("Request timed out—proxy may be slow or dead")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Proxy Rotation with requests

For large scraping jobs, rotate proxies to distribute load and avoid rate limits:

import random
import requests
from typing import List

class RotatingProxySession:
    """Session manager that rotates SOCKS5 proxies per request."""
    
    def __init__(self, proxy_list: List[str]):
        self.proxies = proxy_list
        self.sessions = [self._create_session(p) for p in proxy_list]
        self.current_index = 0
    
    def _create_session(self, proxy_url: str) -> requests.Session:
        session = requests.Session()
        session.proxies = {"http": proxy_url, "https": proxy_url}
        session.headers.update({
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        })
        return session
    
    def get(self, url: str, **kwargs):
        """Rotate to next proxy and make request."""
        session = self.sessions[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.sessions)
        return session.get(url, **kwargs)

# Usage with Snowpad rotating proxy endpoint
proxies = [
    "socks5h://user1:pass1@gate.snowpad.io:9999",
    "socks5h://user2:pass2@gate.snowpad.io:9999",
]

rotator = RotatingProxySession(proxies)
for i in range(10):
    resp = rotator.get("https://httpbin.org/ip")
    print(f"Request {i+1}: {resp.json().get('origin')}")

Async Scraping with aiohttp + SOCKS5

For high-volume scraping (1000+ requests/minute), synchronous requests becomes a bottleneck. Each request blocks until completion. With async I/O, you can have thousands of concurrent connections.

Installation

pip install aiohttp aiohttp-socks

The aiohttp-socks package provides ProxyConnector, which handles SOCKS5 negotiation asynchronously. Without it, aiohttp cannot route traffic through SOCKS proxies.

Basic Async Configuration

import aiohttp
from aiohttp_socks import ProxyConnector
import asyncio

async def fetch_with_socks5(url: str, proxy_url: str):
    """Fetch URL through SOCKS5 proxy using aiohttp."""
    
    # ProxyConnector handles SOCKS5 handshake asynchronously
    connector = ProxyConnector.from_url(proxy_url)
    
    async with aiohttp.ClientSession(connector=connector) as session:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=15)) as response:
            return await response.json()

# Usage
proxy = "socks5://user:pass@gate.snowpad.io:9999"

# Note: aiohttp-socks uses 'socks5://' for rDNS by default in recent versions
# but explicitly enabling rdns=True ensures DNS resolution through proxy
async def main():
    result = await fetch_with_socks5("https://httpbin.org/ip", proxy)
    print(result)

asyncio.run(main())

Critical: Enabling Remote DNS (rDNS) in aiohttp

Unlike requests where socks5h:// handles DNS via the proxy, aiohttp-socks requires explicit configuration:

from aiohttp_socks import ProxyConnector

# Method 1: Using from_url with rdns parameter
connector = ProxyConnector.from_url(
    "socks5://user:pass@proxy.example.com:1080",
    rdns=True  # CRITICAL: resolves DNS through proxy
)

# Method 2: Manual construction for more control
connector = ProxyConnector(
    proxy_type=aiohttp_socks.ProxyType.SOCKS5,
    host="proxy.example.com",
    port=1080,
    username="user",
    password="pass",
    rdns=True
)

If you skip rdns=True, aiohttp resolves DNS locally. For scraping jobs targeting geo-restricted content, this leaks your real location and can cause inconsistent results.

Production Pattern: Concurrent Scraping with Semaphores

Raw concurrency without limits will overwhelm targets and get you blocked. Use asyncio.Semaphore to limit concurrent connections:

import asyncio
import aiohttp
from aiohttp_socks import ProxyConnector
from typing import List, Dict
import random

class AsyncSOCKS5Scraper:
    """Production async scraper with SOCKS5 proxy rotation and concurrency limits."""
    
    def __init__(self, proxy_list: List[str], max_concurrent: int = 10):
        self.proxy_list = proxy_list
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.results: List[Dict] = []
    
    def _get_connector(self, proxy_url: str):
        """Create SOCKS5 connector with rDNS enabled."""
        return ProxyConnector.from_url(proxy_url, rdns=True)
    
    async def _fetch_one(
        self, 
        session: aiohttp.ClientSession, 
        url: str
    ) -> Dict:
        """Fetch single URL with error handling."""
        async with self.semaphore:
            try:
                async with session.get(
                    url, 
                    timeout=aiohttp.ClientTimeout(total=20)
                ) as response:
                    return {
                        "url": url,
                        "status": response.status,
                        "data": await response.text()
                    }
            except asyncio.TimeoutError:
                return {"url": url, "status": "timeout", "data": None}
            except aiohttp.ClientError as e:
                return {"url": url, "status": "error", "error": str(e)}
    
    async def scrape_urls(self, urls: List[str]) -> List[Dict]:
        """Scrape multiple URLs with proxy rotation."""
        
        # Create one session per proxy, rotate through them
        sessions = []
        for proxy in self.proxy_list:
            connector = self._get_connector(proxy)
            session = aiohttp.ClientSession(connector=connector)
            sessions.append(session)
        
        try:
            tasks = []
            for i, url in enumerate(urls):
                session = sessions[i % len(sessions)]
                task = self._fetch_one(session, url)
                tasks.append(task)
            
            self.results = await asyncio.gather(*tasks, return_exceptions=True)
            return self.results
        finally:
            # CRITICAL: Always close sessions to avoid connection leaks
            await asyncio.gather(*[s.close() for s in sessions])

# Usage
proxies = [
    "socks5://user1:pass1@gate.snowpad.io:9999",
    "socks5://user2:pass2@gate.snowpad.io:9999",
]

urls = [f"https://httpbin.org/ip?id={i}" for i in range(20)]

scraper = AsyncSOCKS5Scraper(proxies, max_concurrent=5)
results = asyncio.run(scraper.scrape_urls(urls))

for r in results:
    print(f"{r['url']}: {r.get('status')}")

Key Design Decisions Explained

One session per proxy: aiohttp's ClientSession is designed to be long-lived and reuse connections. Creating a session per proxy allows connection pooling within each proxy tunnel.
Semaphore over raw gather(): Without semaphores, asyncio.gather(*tasks) launches all tasks simultaneously. For 1000 URLs, that's 1000 concurrent TCP connections—enough to trigger rate limits or exhaust file descriptors.
Explicit session cleanup: The finally block ensures sessions close even if exceptions occur. Unclosed sessions leak TCP connections and can cause Too many open files errors.

Selenium and Playwright SOCKS5 Configuration

For JavaScript-heavy sites that require browser automation, you need to configure SOCKS5 at the browser level.

Selenium (Chrome)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

def create_selenium_with_socks5(proxy_host: str, proxy_port: int):
    """Create Chrome WebDriver with SOCKS5 proxy."""
    
    chrome_options = Options()
    
    # Socks5 proxy configuration via Chrome command-line flags
    # Format: --proxy-server=socks5://host:port
    chrome_options.add_argument(f"--proxy-server=socks5://{proxy_host}:{proxy_port}")
    
    # Disable WebRTC to prevent IP leaks through STUN/TURN protocols
    # This is CRITICAL for anonymity—WebRTC can bypass proxies
    chrome_options.add_argument("--disable-webrtc")
    chrome_options.add_argument("--force-webrtc-ip-handling-policy=disable_non_proxied_udp")
    
    # Additional anti-detection measures
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")
    
    # For authenticated proxies, Selenium doesn't support inline credentials
    # in socks5:// URLs. Use a proxy extension or authenticate via proxy API.
    
    driver = webdriver.Chrome(options=chrome_options)
    return driver

# Usage
driver = create_selenium_with_socks5("gate.snowpad.io", 9999)
driver.get("https://httpbin.org/ip")
print(driver.find_element("tag name", "body").text)
driver.quit()

Playwright (Python)

Playwright has first-class SOCKS5 support and is generally preferred over Selenium for modern scraping:

from playwright.sync_api import sync_playwright
import os

def scrape_with_playwright_socks5(proxy_url: str, target_url: str):
    """Scrape using Playwright with SOCKS5 proxy."""
    
    with sync_playwright() as p:
        # Parse proxy URL for Playwright format
        from urllib.parse import urlparse
        parsed = urlparse(proxy_url)
        
        browser = p.chromium.launch(
            proxy={
                "server": f"{parsed.scheme}://{parsed.hostname}:{parsed.port}",
                "username": parsed.username,
                "password": parsed.password
            },
            headless=True
        )
        
        # Create context with realistic viewport and locale
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
            timezone_id="America/New_York"
        )
        
        page = context.new_page()
        
        # Route all requests through proxy (including WebSockets)
        page.goto(target_url, wait_until="networkidle")
        
        content = page.content()
        browser.close()
        return content

# Usage with Snowpad SOCKS5 proxy
proxy = "socks5://user:pass@gate.snowpad.io:9999"
html = scrape_with_playwright_socks5(proxy, "https://httpbin.org/ip")
print(html)

Playwright vs Selenium for SOCKS5 Scraping

Feature	Playwright	Selenium
SOCKS5 authentication	Native support	Requires extension
Connection speed	Faster (no Wire protocol overhead)	Slower
WebSocket proxying	Automatic	Manual configuration
Stealth	Better out-of-the-box	Requires plugins
Resource usage	Lower	Higher

For new projects, use Playwright. For legacy Selenium codebases, consider migration if SOCKS5 reliability is critical.

Testing and Debugging Your SOCKS5 Connection

Before running production scraping jobs, verify your SOCKS5 proxy works correctly. Here's my debugging checklist:

1. Verify Proxy IP with curl

# Test SOCKS5 proxy (replace with your credentials)
curl --proxy "socks5h://user:pass@gate.snowpad.io:9999" "https://httpbin.org/ip"

If this returns your real IP instead of the proxy IP, you have a DNS leak or the proxy isn't routing traffic.

2. Check DNS Resolution

# Compare DNS resolution with and without proxy
# Without proxy (shows your ISP's DNS)
dig +short httpbin.org

# With proxy (should show proxy location DNS if using socks5h://)
curl --proxy "socks5h://user:pass@gate.snowpad.io:9999" "https://httpbin.org/ip"

3. Test with Python Verification Script

import requests
import socket

def verify_socks5_proxy(proxy_url: str):
    """Comprehensive SOCKS5 proxy verification."""
    
    print(f"Testing proxy: {proxy_url}")
    print("=" * 50)
    
    # Test 1: Basic connectivity
    try:
        resp = requests.get(
            "https://httpbin.org/ip",
            proxies={"http": proxy_url, "https": proxy_url},
            timeout=10
        )
        proxy_ip = resp.json().get("origin")
        real_ip = socket.gethostbyname("httpbin.org")
        print(f"✓ Proxy IP: {proxy_ip}")
        print(f"✓ Response time: {resp.elapsed.total_seconds():.2f}s")
    except Exception as e:
        print(f"✗ Connection failed: {e}")
        return False
    
    # Test 2: DNS leak check
    try:
        resp = requests.get(
            "https://dns.leaktest.com",
            proxies={"http": proxy_url, "https": proxy_url},
            timeout=10
        )
        print("✓ DNS resolution routed through proxy")
    except:
        print("⚠ Could not verify DNS routing")
    
    # Test 3: HTTPS support
    try:
        resp = requests.get(
            "https://httpbin.org/get",
            proxies={"http": proxy_url, "https": proxy_url},
            timeout=10
        )
        print(f"✓ HTTPS support: OK ({resp.status_code})")
    except Exception as e:
        print(f"✗ HTTPS failed: {e}")
    
    print("=" * 50)
    return True

# Run verification
verify_socks5_proxy("socks5h://user:pass@gate.snowpad.io:9999")

4. Check for WebRTC Leaks (Browser Automation)

When using Selenium/Playwright, WebRTC can bypass SOCKS5 proxies and expose your real IP:

from playwright.sync_api import sync_playwright

def check_webrtc_leak(proxy_url: str):
    with sync_playwright() as p:
        parsed = urlparse(proxy_url)
        browser = p.chromium.launch(
            proxy={
                "server": f"{parsed.scheme}://{parsed.hostname}:{parsed.port}",
                "username": parsed.username,
                "password": parsed.password
            }
        )
        
        page = browser.new_page()
        page.goto("https://browserleaks.com/webrtc")
        
        # Check if real IP is exposed
        content = page.content()
        if "Your IP address":
            print("⚠ Potential WebRTC leak detected!")
        else:
            print("✓ No WebRTC leak")
        
        browser.close()

Performance Optimization Tips

After switching to SOCKS5, optimize these areas for maximum throughput:

1. Connection Pooling

SOCKS5 authenticates once per TCP connection, then reuses it. Ensure your HTTP client enables keep-alive:

# requests: Session automatically pools connections
session = requests.Session()

# aiohttp: Connector limits control pooling
connector = aiohttp.TCPConnector(
    limit=100,           # Total concurrent connections
    limit_per_host=10,   # Per-host connections
    enable_cleanup_closed=True,
    force_close=False    # Allow connection reuse
)

2. Timeouts: Aggressive but Not Brutal

Dead proxies kill scraping speed. Use short timeouts with retry logic:

# Total timeout = connection time + read time
timeout = aiohttp.ClientTimeout(
    total=15,      # Absolute maximum
    connect=5,     # TCP + SOCKS5 handshake
    sock_read=10   # Waiting for server response
)

3. DNS Caching

For repeated requests to the same domains, cache DNS resolutions locally to avoid repeated SOCKS5 DNS lookups:

from functools import lru_cache
import socket

@lru_cache(maxsize=256)
def cached_resolve(hostname: str) -> str:
    """Cache DNS resolutions to reduce SOCKS5 DNS overhead."""
    return socket.gethostbyname(hostname)

# Use in aiohttp with resolved IP
# Note: Only if your proxy handles the Host header correctly

4. HTTP/2 When Possible

HTTP/2 multiplexes multiple requests over a single TCP connection, reducing SOCKS5 handshake overhead:

# httpx supports HTTP/2 with SOCKS5
import httpx

client = httpx.Client(
    proxy="socks5://user:pass@proxy:1080",
    http2=True
)

Common Errors and How to Fix Them

Error: InvalidSchema: No connection adapters were found for 'socks5h://...'

Cause: PySocks isn't installed. Fix:

pip install requests[socks]

Error: ProxyError: [Errno 407] Proxy Authentication Required

Cause: Credentials are wrong or not being passed correctly. In async libraries (httpx, aiohttp), SOCKS5 authentication sometimes fails with URL-encoded credentials. Fix: Pass credentials explicitly rather than in the URL:

# aiohttp-socks: use ProxyConnector constructor
connector = ProxyConnector(
    proxy_type=ProxyType.SOCKS5,
    host="proxy.example.com",
    port=1080,
    username="user",
    password="pass"
)

Error: 0x05: Connection refused

Cause: SOCKS5 proxy connected but the target server refused the TCP connection. Common causes:

Target server is down or blocking the proxy IP
Wrong port (e.g., trying HTTPS on port 80)
Firewall blocking outbound connections from proxy Fix: Test the target URL directly without proxy to confirm it's accessible.

Error: SOCKS5 DNS resolution failed

Cause: Using socks5:// instead of socks5h:// in requests, or missing rdns=True in aiohttp. Fix: Always use socks5h:// for requests. For aiohttp, pass rdns=True to ProxyConnector.

Error: SSL: CERTIFICATE_VERIFY_FAILED

Cause: SOCKS5 proxy is intercepting TLS traffic (MITM proxy) with a self-signed certificate. Fix: Either trust the proxy's CA certificate or disable verification (insecure, only for testing):

# requests
resp = requests.get(url, proxies=proxies, verify=False)

# aiohttp
connector = ProxyConnector.from_url(proxy, rdns=True, ssl=False)

Error: Too many open files

Cause: Not closing aiohttp sessions or creating too many sessions without limits. Fix: Reuse sessions and explicitly close them:

async with aiohttp.ClientSession(connector=connector) as session:
    # Use session
    pass  # Auto-closes here

FAQ

Q: Can I use SOCKS5 with HTTP/2? A: Yes, but only with clients that support both. httpx with http2=True works well with SOCKS5 proxies. requests doesn't support HTTP/2. aiohttp has experimental HTTP/2 support. The benefit is reduced connection overhead—HTTP/2 multiplexing means fewer SOCKS5 handshakes.

Q: Why does my SOCKS5 proxy work with curl but not Python? A: curl handles SOCKS5 authentication more flexibly. Python libraries have stricter requirements: (1) Ensure PySocks is installed for requests, (2) Use aiohttp-socks for aiohttp—not the built-in proxy support, (3) For authenticated proxies in aiohttp, use ProxyConnector constructor instead of URL-encoded credentials, (4) Double-check you're using socks5h:// not socks5://.

Q: How do I rotate SOCKS5 proxies without connection leaks? A: Use a session pool rather than creating new sessions per request. Maintain persistent sessions for each proxy and rotate at the request level. In aiohttp, close sessions in a finally block or use async with context managers. Set limit_per_host on connectors to prevent file descriptor exhaustion.

Q: Should I use SOCKS5 or HTTP proxies for web scraping? A: For pure HTTP/HTTPS scraping of standard websites, HTTP proxies are fine and often cheaper. Use SOCKS5 when: (1) You need UDP support (WebSockets, streaming), (2) You want lower protocol overhead for high-volume scraping, (3) You're scraping non-HTTP services, (4) The target has sophisticated bot detection analyzing TCP/TLS patterns, or (5) You need consistent behavior across different traffic types. SOCKS5's session-layer neutrality makes it harder to fingerprint.

How to Set Up SOCKS5 Proxy in Python for Web Scraping

This guide is what I wish I had that night.

What SOCKS5 Actually Is (And Why It Matters)

The OSI Layer Perspective

SOCKS5 vs HTTP Proxy: Performance Reality

The HTTPS proxy penalty is especially painful. Every connection requires two TLS handshakes: client→proxy and proxy→target. Poorly implemented HTTPS proxies that don't cache SSL sessions add 60-70ms per request. Over 50,000 requests, that's nearly an hour of unnecessary latency.

Python Setup: requests with SOCKS5

Installation

Basic Configuration

Critical Detail: socks5:// vs socks5h://

Production Pattern: Session with Retry Logic

Proxy Rotation with requests

Async Scraping with aiohttp + SOCKS5

Installation

Basic Async Configuration

Critical: Enabling Remote DNS (rDNS) in aiohttp

Production Pattern: Concurrent Scraping with Semaphores

Key Design Decisions Explained

Selenium and Playwright SOCKS5 Configuration

Selenium (Chrome)

Playwright (Python)

Playwright vs Selenium for SOCKS5 Scraping

For new projects, use Playwright. For legacy Selenium codebases, consider migration if SOCKS5 reliability is critical.

Testing and Debugging Your SOCKS5 Connection

1. Verify Proxy IP with curl

2. Check DNS Resolution

3. Test with Python Verification Script

4. Check for WebRTC Leaks (Browser Automation)

Performance Optimization Tips

1. Connection Pooling

2. Timeouts: Aggressive but Not Brutal

3. DNS Caching

4. HTTP/2 When Possible

Common Errors and How to Fix Them

Error: InvalidSchema: No connection adapters were found for 'socks5h://...'

Error: ProxyError: [Errno 407] Proxy Authentication Required

Error: 0x05: Connection refused

Error: SOCKS5 DNS resolution failed

Error: SSL: CERTIFICATE_VERIFY_FAILED

Error: Too many open files

FAQ

Frequently Asked Questions

Can I use SOCKS5 with HTTP/2?

Why does my SOCKS5 proxy work with curl but not Python?

How do I rotate SOCKS5 proxies without connection leaks?

Should I use SOCKS5 or HTTP proxies for web scraping?

More from Snowpad

Setting Up Anti-Detect Browsers with Snowpad Mobile Proxies

Using Playwright and Puppeteer with SOCKS5 Proxies (Node.js Guide)

Building a Price Monitoring System for Indian E-Commerce

Ready to try Snowpad?