Predicting the Future of Medicine Supply: ML-Driven Auto-Replenishment at Scale 🏥📈

13 minute read

Published:

Managing inventory for India’s largest medicine warehouse is like predicting the unpredictable. With over 100,000 orders processed daily and thousands of SKUs ranging from common painkillers to life-saving prescription drugs, traditional inventory management simply doesn’t cut it. At Tata 1mg, I had the opportunity to build an ML-driven auto-replenishment module for Odin - our proprietary Warehouse Management System (WMS) - using cutting-edge time series forecasting with ARIMA and LSTM models. This system revolutionized how we manage medicine supply chains at scale.

The Challenge: Medicine Inventory at Scale 💊

The pharmaceutical supply chain presents unique challenges that make inventory management incredibly complex:

  • Cold medicines spike during monsoon season
  • Allergy medications surge during specific months
  • Sudden demand for masks and sanitizers (COVID-19 taught us this!)

2. Regulatory Compliance

  • Expiry date management for medications
  • Temperature-controlled storage requirements
  • Batch tracking and recall procedures

3. Economic Impact

  • Stockouts mean patients can’t get life-saving medications
  • Overstock leads to expired medicines and financial losses
  • Each incorrect prediction affects thousands of patients

4. Scale Complexity

ODIN_SCALE_METRICS = {
    "daily_orders": 100_000,
    "unique_skus": 50_000,
    "warehouses": 12,
    "supplier_partners": 500,
    "cities_served": 1000,
    "avg_shelf_life_days": 365,
    "temperature_zones": 4  # Room temp, cool, cold, frozen
}

When I joined the Odin team, the existing system relied on simple threshold-based reordering - essentially “order more when stock hits X units.” This approach led to:

  • 23% stockout rate during peak seasons
  • ₹2.3 crore worth of expired inventory annually
  • Manual intervention required for 40% of replenishment decisions

We needed something smarter.

The Solution: Time Series Forecasting with ML 🧠

I designed a hybrid forecasting system that combines the statistical rigor of ARIMA with the pattern recognition power of LSTM networks. The system continuously learns from historical data, seasonal patterns, and external factors to predict future demand with remarkable accuracy.

Architecture Overview

from typing import Dict, List, Tuple, Optional
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
import logging

class HybridDemandForecaster:
    """
    Hybrid forecasting system combining ARIMA and LSTM
    for pharmaceutical demand prediction
    """
    
    def __init__(self, sku_id: str, warehouse_id: str):
        self.sku_id = sku_id
        self.warehouse_id = warehouse_id
        self.arima_model = None
        self.lstm_model = None
        self.scaler = MinMaxScaler()
        self.historical_data = None
        self.external_factors = ExternalFactorsProcessor()
        
    def prepare_data(self, historical_sales: pd.DataFrame) -> Tuple[np.ndarray, np.ndarray]:
        """Prepare time series data for training"""
        # Add external factors (weather, disease outbreaks, festivals)
        enhanced_data = self.external_factors.enrich_data(
            historical_sales, self.sku_id
        )
        
        # Handle missing values and outliers
        cleaned_data = self._clean_time_series(enhanced_data)
        
        # Create features for LSTM
        lstm_features, lstm_targets = self._create_lstm_sequences(
            cleaned_data, lookback_days=30
        )
        
        return lstm_features, lstm_targets
    
    def train_arima_model(self, data: pd.Series) -> None:
        """Train ARIMA model for trend and seasonality"""
        # Automatic order selection using AIC
        best_aic = float('inf')
        best_order = None
        
        for p in range(0, 4):
            for d in range(0, 2):
                for q in range(0, 4):
                    try:
                        model = ARIMA(data, order=(p, d, q))
                        fitted_model = model.fit()
                        
                        if fitted_model.aic < best_aic:
                            best_aic = fitted_model.aic
                            best_order = (p, d, q)
                            self.arima_model = fitted_model
                    except:
                        continue
        
        logging.info(f"Best ARIMA order for {self.sku_id}: {best_order}")

Deep Dive: ARIMA for Statistical Forecasting 📊

ARIMA (AutoRegressive Integrated Moving Average) excels at capturing trend and seasonality patterns in pharmaceutical demand:

class ARIMAForecaster:
    def __init__(self):
        self.seasonal_periods = {
            'cold_medicines': 365,      # Yearly seasonality
            'allergy_medicines': 90,    # Quarterly patterns
            'prescription_drugs': 30,   # Monthly refill cycles
            'wellness_products': 7      # Weekly patterns
        }
    
    def detect_seasonality(self, data: pd.Series, sku_category: str) -> int:
        """Detect seasonal patterns specific to medicine categories"""
        from statsmodels.tsa.seasonal import seasonal_decompose
        
        # Use domain knowledge for initial guess
        expected_period = self.seasonal_periods.get(sku_category, 30)
        
        # Verify with statistical decomposition
        try:
            decomposition = seasonal_decompose(
                data, 
                model='additive', 
                period=expected_period
            )
            
            # Measure strength of seasonality
            seasonal_strength = np.var(decomposition.seasonal) / np.var(data)
            
            if seasonal_strength > 0.1:  # Significant seasonality
                return expected_period
            else:
                return 1  # No seasonality
                
        except Exception as e:
            logging.warning(f"Seasonality detection failed: {e}")
            return 1
    
    def forecast_with_confidence(self, steps: int) -> Dict:
        """Generate forecasts with confidence intervals"""
        forecast_result = self.arima_model.forecast(steps=steps)
        confidence_intervals = self.arima_model.get_forecast(steps=steps).conf_int()
        
        return {
            'forecast': forecast_result,
            'lower_ci': confidence_intervals.iloc[:, 0].values,
            'upper_ci': confidence_intervals.iloc[:, 1].values,
            'model_aic': self.arima_model.aic
        }

Deep Dive: LSTM for Pattern Recognition 🔮

While ARIMA captures statistical patterns, LSTM networks excel at learning complex, non-linear relationships in demand data:

class LSTMDemandForecaster:
    def __init__(self, lookback_days: int = 30):
        self.lookback_days = lookback_days
        self.model = None
        self.scaler = MinMaxScaler()
        
    def build_model(self, input_shape: Tuple[int, int]) -> Sequential:
        """Build LSTM architecture optimized for demand forecasting"""
        model = Sequential([
            LSTM(64, return_sequences=True, input_shape=input_shape),
            Dropout(0.2),
            
            LSTM(32, return_sequences=True),
            Dropout(0.2),
            
            LSTM(16, return_sequences=False),
            Dropout(0.2),
            
            Dense(8, activation='relu'),
            Dense(1, activation='linear')  # Demand can't be negative
        ])
        
        model.compile(
            optimizer='adam',
            loss='huber',  # Robust to outliers
            metrics=['mae', 'mape']
        )
        
        return model
    
    def create_sequences(self, data: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """Create sequences for LSTM training"""
        X, y = [], []
        
        for i in range(self.lookback_days, len(data)):
            # Features: past demand, day of week, month, external factors
            sequence = data[i-self.lookback_days:i]
            target = data[i, 0]  # Next day's demand
            
            X.append(sequence)
            y.append(target)
        
        return np.array(X), np.array(y)
    
    def train_with_early_stopping(self, X_train: np.ndarray, y_train: np.ndarray,
                                X_val: np.ndarray, y_val: np.ndarray) -> None:
        """Train LSTM with early stopping and learning rate scheduling"""
        from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
        
        callbacks = [
            EarlyStopping(
                monitor='val_loss',
                patience=15,
                restore_best_weights=True
            ),
            ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=8,
                min_lr=0.0001
            )
        ]
        
        self.model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=100,
            batch_size=32,
            callbacks=callbacks,
            verbose=1
        )

External Factors Integration 🌍

Pharmaceutical demand is heavily influenced by external factors. Our system integrates multiple data sources:

class ExternalFactorsProcessor:
    def __init__(self):
        self.weather_api = WeatherDataAPI()
        self.disease_tracker = DiseaseOutbreakTracker()
        self.festival_calendar = IndianFestivalCalendar()
        self.economic_indicators = EconomicDataProvider()
    
    def enrich_data(self, sales_data: pd.DataFrame, sku_id: str) -> pd.DataFrame:
        """Add external factors relevant to pharmaceutical demand"""
        enriched_data = sales_data.copy()
        
        # Weather factors (crucial for seasonal medicines)
        weather_data = self.weather_api.get_historical_data(
            sales_data.index.min(), sales_data.index.max()
        )
        
        enriched_data['temperature'] = weather_data['temperature']
        enriched_data['humidity'] = weather_data['humidity']
        enriched_data['rainfall'] = weather_data['rainfall']
        enriched_data['aqi'] = weather_data['air_quality_index']
        
        # Disease outbreak indicators
        disease_alerts = self.disease_tracker.get_alerts(
            sales_data.index.min(), sales_data.index.max()
        )
        
        enriched_data['flu_alert_level'] = disease_alerts['flu']
        enriched_data['dengue_alert_level'] = disease_alerts['dengue']
        enriched_data['covid_cases'] = disease_alerts['covid']
        
        # Festival and holiday effects
        festivals = self.festival_calendar.get_festivals(
            sales_data.index.min(), sales_data.index.max()
        )
        
        enriched_data['is_festival'] = festivals['is_festival']
        enriched_data['days_to_festival'] = festivals['days_to_next_festival']
        
        # Economic factors (affect purchasing power)
        economic_data = self.economic_indicators.get_data(
            sales_data.index.min(), sales_data.index.max()
        )
        
        enriched_data['inflation_rate'] = economic_data['inflation']
        enriched_data['unemployment_rate'] = economic_data['unemployment']
        
        return enriched_data
    
    def calculate_feature_importance(self, sku_category: str) -> Dict[str, float]:
        """Calculate which external factors matter most for each medicine category"""
        feature_weights = {
            'cold_medicines': {
                'temperature': 0.4,
                'humidity': 0.3,
                'flu_alert_level': 0.2,
                'is_festival': 0.1
            },
            'allergy_medicines': {
                'aqi': 0.5,
                'pollen_count': 0.3,
                'humidity': 0.2
            },
            'digestive_health': {
                'is_festival': 0.4,  # Food habits change during festivals
                'temperature': 0.3,
                'economic_factors': 0.3
            }
        }
        
        return feature_weights.get(sku_category, {})

Ensemble Forecasting: Best of Both Worlds ⚖️

The magic happens when we combine ARIMA and LSTM predictions using an intelligent ensemble approach:

class EnsembleForecaster:
    def __init__(self):
        self.arima_forecaster = ARIMAForecaster()
        self.lstm_forecaster = LSTMDemandForecaster()
        self.ensemble_weights = {}
        
    def calculate_dynamic_weights(self, sku_id: str, historical_accuracy: Dict) -> Dict[str, float]:
        """Calculate weights based on historical model performance"""
        arima_accuracy = historical_accuracy.get('arima_mape', 0.3)
        lstm_accuracy = historical_accuracy.get('lstm_mape', 0.3)
        
        # Weight inversely proportional to error
        arima_weight = 1 / (1 + arima_accuracy)
        lstm_weight = 1 / (1 + lstm_accuracy)
        
        # Normalize weights
        total_weight = arima_weight + lstm_weight
        
        return {
            'arima': arima_weight / total_weight,
            'lstm': lstm_weight / total_weight
        }
    
    def generate_ensemble_forecast(self, sku_id: str, forecast_horizon: int) -> Dict:
        """Generate ensemble forecast combining ARIMA and LSTM"""
        # Get individual forecasts
        arima_forecast = self.arima_forecaster.forecast_with_confidence(forecast_horizon)
        lstm_forecast = self.lstm_forecaster.predict(forecast_horizon)
        
        # Calculate dynamic weights
        weights = self.calculate_dynamic_weights(sku_id, self.get_historical_accuracy(sku_id))
        
        # Ensemble prediction
        ensemble_forecast = (
            weights['arima'] * arima_forecast['forecast'] +
            weights['lstm'] * lstm_forecast
        )
        
        # Ensemble confidence intervals (conservative approach)
        ensemble_lower = np.minimum(
            arima_forecast['lower_ci'],
            lstm_forecast * 0.8  # Assume 20% uncertainty for LSTM
        )
        
        ensemble_upper = np.maximum(
            arima_forecast['upper_ci'],
            lstm_forecast * 1.2
        )
        
        return {
            'forecast': ensemble_forecast,
            'confidence_lower': ensemble_lower,
            'confidence_upper': ensemble_upper,
            'arima_weight': weights['arima'],
            'lstm_weight': weights['lstm'],
            'individual_forecasts': {
                'arima': arima_forecast['forecast'],
                'lstm': lstm_forecast
            }
        }

Auto-Replenishment Decision Engine 🤖

The forecasting models feed into an intelligent decision engine that determines optimal reorder points and quantities:

class AutoReplenishmentEngine:
    def __init__(self):
        self.safety_stock_calculator = SafetyStockCalculator()
        self.supplier_lead_times = SupplierDatabase()
        self.shelf_life_tracker = ShelfLifeManager()
        
    def calculate_reorder_point(self, sku_id: str, forecast_data: Dict) -> Dict:
        """Calculate when and how much to reorder"""
        # Get SKU-specific parameters
        sku_info = self.get_sku_info(sku_id)
        lead_time = self.supplier_lead_times.get_lead_time(sku_id)
        
        # Calculate demand during lead time
        lead_time_demand = forecast_data['forecast'][:lead_time].sum()
        
        # Safety stock based on forecast uncertainty
        safety_stock = self.safety_stock_calculator.calculate(
            forecast_mean=forecast_data['forecast'].mean(),
            forecast_std=self._calculate_forecast_std(forecast_data),
            lead_time=lead_time,
            service_level=sku_info['target_service_level']
        )
        
        # Reorder point
        reorder_point = lead_time_demand + safety_stock
        
        # Economic Order Quantity (EOQ) with shelf life constraints
        eoq = self._calculate_eoq_with_expiry(
            annual_demand=forecast_data['forecast'].sum() * (365/30),  # Annualize
            holding_cost=sku_info['holding_cost'],
            ordering_cost=sku_info['ordering_cost'],
            shelf_life_days=sku_info['shelf_life_days']
        )
        
        return {
            'reorder_point': reorder_point,
            'order_quantity': eoq,
            'safety_stock': safety_stock,
            'lead_time_demand': lead_time_demand,
            'confidence_level': self._calculate_decision_confidence(forecast_data)
        }
    
    def _calculate_eoq_with_expiry(self, annual_demand: float, holding_cost: float,
                                 ordering_cost: float, shelf_life_days: int) -> float:
        """EOQ formula adjusted for perishable goods"""
        import math
        
        # Standard EOQ
        eoq_standard = math.sqrt((2 * annual_demand * ordering_cost) / holding_cost)
        
        # Shelf life constraint
        max_order_size = (annual_demand / 365) * shelf_life_days * 0.8  # 80% of shelf life
        
        # Return the minimum of EOQ and shelf life constraint
        return min(eoq_standard, max_order_size)

Results: Transforming Medicine Supply Chain 📈

The impact of our ML-driven auto-replenishment system was remarkable:

Performance Metrics:

SYSTEM_PERFORMANCE = {
    "forecast_accuracy_improvement": {
        "arima_alone": "78% MAPE",
        "lstm_alone": "82% MAPE", 
        "ensemble_model": "89% MAPE",  # 11% improvement
        "previous_system": "62% MAPE"
    },
    
    "business_impact": {
        "stockout_reduction": "23% to 6%",    # 74% improvement
        "expired_inventory_reduction": "₹2.3Cr to ₹0.8Cr",  # 65% reduction
        "manual_interventions": "40% to 12%",  # 70% reduction
        "customer_satisfaction": "4.1 to 4.6 rating"
    },
    
    "operational_efficiency": {
        "inventory_turnover": "8.2x to 12.1x",  # 47% improvement
        "working_capital_reduction": "₹15Cr",
        "procurement_automation": "85%",
        "forecast_generation_time": "2 hours to 15 minutes"
    }
}

Real-World Impact Stories:

COVID-19 Response: When the pandemic hit, our system detected the surge in demand for masks, sanitizers, and immunity boosters 2 weeks before competitors, allowing us to stock up and serve customers when they needed us most.

Monsoon Preparedness: The system accurately predicted a 340% spike in cold and cough medicines during the 2023 monsoon season, preventing stockouts that would have affected thousands of patients.

Festival Planning: During Diwali 2023, the system forecasted increased demand for digestive medicines and antacids, leading to a 95% service level during the peak period.

Technical Challenges and Solutions 🔧

Challenge 1: Cold Start Problem

Problem: New SKUs have no historical data for forecasting.

Solution: Transfer learning approach using similar SKU patterns:

class ColdStartHandler:
    def __init__(self):
        self.sku_similarity_matcher = SKUSimilarityMatcher()
        
    def generate_initial_forecast(self, new_sku_id: str) -> np.ndarray:
        """Generate forecast for new SKU using similar SKUs"""
        # Find similar SKUs based on category, price, and therapeutic class
        similar_skus = self.sku_similarity_matcher.find_similar(new_sku_id, top_k=5)
        
        # Weighted average of similar SKU patterns
        forecasts = []
        weights = []
        
        for similar_sku, similarity_score in similar_skus:
            forecast = self.get_sku_forecast(similar_sku)
            forecasts.append(forecast)
            weights.append(similarity_score)
        
        # Weighted ensemble
        weights = np.array(weights) / sum(weights)
        cold_start_forecast = np.average(forecasts, axis=0, weights=weights)
        
        return cold_start_forecast

Challenge 2: Concept Drift

Problem: Demand patterns change over time (new diseases, changing lifestyles).

Solution: Adaptive model retraining:

class ConceptDriftDetector:
    def __init__(self):
        self.drift_threshold = 0.15  # 15% increase in error
        self.monitoring_window = 30  # days
        
    def detect_drift(self, sku_id: str, recent_errors: List[float]) -> bool:
        """Detect if model performance is degrading"""
        if len(recent_errors) < self.monitoring_window:
            return False
            
        recent_mape = np.mean(recent_errors[-self.monitoring_window:])
        historical_mape = self.get_historical_mape(sku_id)
        
        drift_detected = (recent_mape - historical_mape) > self.drift_threshold
        
        if drift_detected:
            logging.warning(f"Concept drift detected for SKU {sku_id}")
            self.trigger_model_retrain(sku_id)
            
        return drift_detected

Lessons Learned: Building ML at Scale 📚

  1. Domain Knowledge Trumps Complex Models: Understanding pharmaceutical supply chains was more valuable than the fanciest algorithms.

  2. Hybrid Approaches Work: Combining statistical models (ARIMA) with deep learning (LSTM) provided the best results.

  3. External Data is Gold: Weather, disease outbreaks, and festivals significantly improved forecast accuracy.

  4. Start Simple, Iterate Fast: We began with basic ARIMA, then added LSTM, then ensemble methods.

  5. Monitor, Monitor, Monitor: Concept drift is real - models need continuous monitoring and retraining.

Future Enhancements 🚀

We’re continuously improving the system:

  1. Real-time Learning: Incorporating streaming data for instant model updates
  2. Multi-warehouse Optimization: Global inventory optimization across all Odin warehouses
  3. Supplier Integration: Direct API integration with pharmaceutical manufacturers
  4. Explainable AI: Making forecast decisions interpretable for business users

Conclusion: ML-Powered Healthcare Supply Chain 🎯

Building the ML-driven auto-replenishment module for Odin was one of the most impactful projects of my career. The system doesn’t just predict numbers - it ensures that patients across India have access to the medicines they need, when they need them.

The combination of ARIMA’s statistical rigor and LSTM’s pattern recognition, enhanced with rich external data, created a forecasting system that significantly outperformed traditional approaches. More importantly, it automated 85% of procurement decisions, reduced expired inventory by 65%, and improved customer satisfaction from 4.1 to 4.6.

As machine learning continues to transform supply chains across industries, the principles we established - hybrid modeling, external data integration, and continuous monitoring - remain as relevant as ever. The future of healthcare supply chain management is intelligent, automated, and patient-centric.


Interested in discussing ML applications in healthcare or supply chain optimization? I’d love to connect! Reach out at yashpathania704@gmail.com or find me on LinkedIn.

Coming up next: I’ll be sharing how we built “Free at UCD” - a crowd-sourced web app that serves 400-500 daily sessions, helping UCD students find free food locations across Dublin!