Diet Pandas 🐼🥗¶

Tagline: Same Pandas taste, half the calories (RAM).

🎯 The Problem¶

Pandas is built for safety and ease of use, not memory efficiency. When you load a CSV, standard Pandas defaults to "safe" but wasteful data types:

int64 for small integers (wasting 75%+ memory per number)
float64 for simple metrics (wasting 50% memory per number)
object for repetitive strings (wasting massive amounts of memory and CPU)

Diet Pandas solves this by acting as a strict nutritionist for your data. It aggressively analyzes data distributions and "downcasts" types to the smallest safe representation—often reducing memory usage by 50% to 80% without losing information.

🚀 Quick Example¶

import dietpandas as dp

# Drop-in replacement for pandas.read_csv
# Loads faster and uses less RAM automatically
df = dp.read_csv("huge_dataset.csv")
# 🥗 Diet Complete: Memory reduced by 67.3%
#    450.00MB -> 147.15MB

# Or optimize an existing DataFrame
import pandas as pd
df_heavy = pd.DataFrame({
    'year': [2020, 2021, 2022], 
    'revenue': [1.1, 2.2, 3.3]
})
df_light = dp.diet(df_heavy)
# 🥗 Diet Complete: Memory reduced by 62.5%

✨ Key Features¶

⚡ Parallel Processing (NEW in v0.5.0): Multi-threaded optimization for 2-4x speedup
🏃 Fast Loading: Uses Polars engine for 5-10x faster CSV parsing
🎯 Smart Optimization: Automatically downcasts numeric types to smallest safe representation
🚀 Vectorized Boolean Detection (NEW in v0.5.0): Lightning-fast boolean column optimization
🗜️ Sparse Support: Optimizes columns with many repeated values (95%+ reduction)
📅 DateTime Handling: Efficient datetime column optimization
📊 Multiple Formats: CSV, Parquet, Excel, JSON, HDF5, Feather
🔥 Aggressive Mode: Optional extreme compression for maximum memory savings
📈 Memory Reports: Detailed analysis of memory usage per column
✅ 100% Pandas Compatible: Works seamlessly with all pandas operations

📚 Documentation¶

Installation - Get started in seconds
Quick Start - Basic usage examples
User Guide - Comprehensive tutorials
API Reference - Complete API documentation
Performance - Benchmark results

🎯 When to Use Diet Pandas¶

✅ Perfect for: - Loading large CSV files (>100MB) - Working with limited RAM environments - Training ML models on large datasets - ETL pipelines with memory constraints - Web applications serving data

⚠️ Consider alternatives for: - Tiny datasets (<1MB) - optimization overhead not worth it - Streaming data pipelines - consider Polars directly - When you need maximum precision (financial calculations)

🤝 Contributing¶

We welcome contributions! See our Contributing Guide for details.

📄 License¶

This project is licensed under the MIT License - see the LICENSE file for details.