Building a Price Monitoring System for Indian E-Commerce
Deepesh Kalur
Expert Contributor
To build a price monitoring system for Indian e-commerce, use Snowpad's rotating mobile proxies with Python requests/BeautifulSoup. Schedule checks based on product competitiveness (1-2 hours for high-competition items), store data in SQLite, and set up alerts for significant price changes. Use JSON-LD extraction for Flipkart and multiple selector fallbacks for Amazon.
Price monitoring is the highest-ROI application of web scraping. A single pricing insight can be worth thousands — catching a competitor's flash sale, identifying stock shortages, or spotting MAP violations.
This guide covers building a production price monitoring system for Indian e-commerce using Snowpad's mobile proxies.
Architecture Overview
A robust price monitoring system has four components:
- Scheduler: Decides what to scrape and when
- Scraper: Extracts price data from target sites
- Storage: Persists historical price data
- Alerts: Notifies when prices change significantly
The Scraper
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import json
import time
import random
PROXY = "socks5h://user:pass@gw.snowpad.io:9999"
class PriceScraper:
def __init__(self):
self.session = requests.Session()
self.session.proxies = {"http": PROXY, "https": PROXY}
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S928B) AppleWebKit/537.36",
"Accept-Language": "en-IN"
})
def scrape_amazon(self, asin):
url = f"https://www.amazon.in/dp/{asin}"
time.sleep(random.uniform(3, 7))
resp = self.session.get(url, timeout=15)
if resp.status_code != 200:
return None
soup = BeautifulSoup(resp.text, "html.parser")
# Try multiple selectors (Amazon changes them)
price_selectors = [
".a-price-whole",
".a-price .a-offscreen",
"#priceblock_dealprice",
"#priceblock_ourprice"
]
price = None
for selector in price_selectors:
elem = soup.select_one(selector)
if elem:
price = elem.text.strip().replace(",", "").replace("₹", "")
break
return {
"platform": "amazon",
"asin": asin,
"price": price,
"currency": "INR",
"timestamp": datetime.now().isoformat(),
"url": url
}
def scrape_flipkart(self, pid):
url = f"https://www.flipkart.com/p/{pid}"
time.sleep(random.uniform(5, 10))
resp = self.session.get(url, timeout=20)
if resp.status_code != 200:
return None
# Extract from JSON-LD
import re
match = re.search(r'<script type="application/ld\+json">(.*?)</script>', resp.text)
if match:
data = json.loads(match.group(1))
offers = data.get("offers", {})
return {
"platform": "flipkart",
"product_id": pid,
"price": offers.get("price"),
"currency": offers.get("priceCurrency", "INR"),
"availability": offers.get("availability"),
"timestamp": datetime.now().isoformat(),
"url": url
}
return None
# Usage
scraper = PriceScraper()
amazon_data = scraper.scrape_amazon("B0XXXXX")
flipkart_data = scraper.scrape_flipkart("PID123")
Scheduling Strategies
Frequency by product type:
- High-competition products (electronics): Every 1-2 hours
- Standard products: Every 6-12 hours
- Stable products (books, home goods): Daily
- During sales events: Every 15-30 minutes
Smart scheduling:
- Don't scrape all products at once (bursts trigger anti-bot)
- Distribute requests across the day
- Increase frequency when prices change rapidly
- Decrease frequency when prices are stable
Data Storage
import sqlite3
from datetime import datetime
class PriceDatabase:
def __init__(self, db_path="prices.db"):
self.conn = sqlite3.connect(db_path)
self._init_tables()
def _init_tables(self):
self.conn.execute('''
CREATE TABLE IF NOT EXISTS prices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
platform TEXT NOT NULL,
product_id TEXT NOT NULL,
price REAL,
currency TEXT,
availability TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
url TEXT
)
''')
self.conn.execute('''
CREATE INDEX IF NOT EXISTS idx_product_time
ON prices(platform, product_id, timestamp)
''')
self.conn.commit()
def save_price(self, data):
self.conn.execute('''
INSERT INTO prices (platform, product_id, price, currency, availability, url)
VALUES (?, ?, ?, ?, ?, ?)
''', (
data['platform'],
data.get('asin') or data.get('product_id'),
data.get('price'),
data.get('currency', 'INR'),
data.get('availability'),
data.get('url')
))
self.conn.commit()
def get_price_history(self, platform, product_id, days=30):
cursor = self.conn.execute('''
SELECT price, timestamp FROM prices
WHERE platform = ? AND product_id = ?
AND timestamp > datetime('now', ?)
ORDER BY timestamp
''', (platform, product_id, f'-{days} days'))
return cursor.fetchall()
Alert System
def check_price_changes(db, threshold_percent=10):
"""Alert when prices change by more than threshold."""
cursor = db.conn.execute('''
SELECT platform, product_id, price, timestamp,
LAG(price) OVER (PARTITION BY platform, product_id ORDER BY timestamp) as prev_price
FROM prices
WHERE timestamp > datetime('now', '-1 day')
''')
alerts = []
for row in cursor:
platform, product_id, price, timestamp, prev_price = row
if prev_price and prev_price > 0:
change = ((price - prev_price) / prev_price) * 100
if abs(change) >= threshold_percent:
alerts.append({
"platform": platform,
"product_id": product_id,
"old_price": prev_price,
"new_price": price,
"change_percent": change,
"timestamp": timestamp
})
return alerts
Scaling to Thousands of Products
For large-scale monitoring (10K+ products):
- Use async scraping: aiohttp with SOCKS5 for concurrent requests
- Implement proxy rotation: Change IP every 5-10 requests
- Add jitter: Random delays between 3-10 seconds
- Cache intelligently: Don't re-scrape unchanged pages
- Monitor success rates: Per-platform tracking
- Handle failures gracefully: Retry with backoff, rotate on failure
FAQ
How often should I check prices? High-competition products: every 1-2 hours. Standard: every 6-12 hours. During sales: every 15-30 minutes.
Can I monitor prices across multiple platforms? Yes. Build separate scrapers for each platform using the same proxy pool. Track identical products across Amazon, Flipkart, and Myntra.
Do I need static IPs for price monitoring? No. Rotating proxies are better because they prevent detection. Use sticky sessions only if the platform requires login.
How do I handle products with variants? Track each variant (size, color) separately. Include variant information in your database schema.
Frequently Asked Questions
How often should I check prices?
High-competition products: every 1-2 hours. Standard: every 6-12 hours. During sales: every 15-30 minutes.
Can I monitor prices across multiple platforms?
Yes. Build separate scrapers using the same proxy pool. Track identical products across Amazon, Flipkart, and Myntra.
Do I need static IPs for price monitoring?
No. Rotating proxies are better because they prevent detection. Use sticky sessions only if login is required.
How do I handle products with variants?
Track each variant separately. Include variant information in your database schema.
Ready to try Snowpad?
Join thousands of developers using our Indian mobile proxy network for their high-scale automation needs.
Get Started Now