Skip to content

Diet Pandas 🐼πŸ₯—

Tagline: Same Pandas taste, half the calories (RAM).

PyPI version Python 3.10+ License: MIT

🎯 The Problem

Pandas is built for safety and ease of use, not memory efficiency. When you load a CSV, standard Pandas defaults to "safe" but wasteful data types:

  • int64 for small integers (wasting 75%+ memory per number)
  • float64 for simple metrics (wasting 50% memory per number)
  • object for repetitive strings (wasting massive amounts of memory and CPU)

Diet Pandas solves this by acting as a strict nutritionist for your data. It aggressively analyzes data distributions and "downcasts" types to the smallest safe representationβ€”often reducing memory usage by 50% to 80% without losing information.

πŸš€ Quick Example

import dietpandas as dp

# Drop-in replacement for pandas.read_csv
# Loads faster and uses less RAM automatically
df = dp.read_csv("huge_dataset.csv")
# πŸ₯— Diet Complete: Memory reduced by 67.3%
#    450.00MB -> 147.15MB

# Or optimize an existing DataFrame
import pandas as pd
df_heavy = pd.DataFrame({
    'year': [2020, 2021, 2022], 
    'revenue': [1.1, 2.2, 3.3]
})
df_light = dp.diet(df_heavy)
# πŸ₯— Diet Complete: Memory reduced by 62.5%

✨ Key Features

  • ⚑ Parallel Processing (NEW in v0.5.0): Multi-threaded optimization for 2-4x speedup
  • πŸƒ Fast Loading: Uses Polars engine for 5-10x faster CSV parsing
  • 🎯 Smart Optimization: Automatically downcasts numeric types to smallest safe representation
  • πŸš€ Vectorized Boolean Detection (NEW in v0.5.0): Lightning-fast boolean column optimization
  • πŸ—œοΈ Sparse Support: Optimizes columns with many repeated values (95%+ reduction)
  • πŸ“… DateTime Handling: Efficient datetime column optimization
  • πŸ“Š Multiple Formats: CSV, Parquet, Excel, JSON, HDF5, Feather
  • πŸ”₯ Aggressive Mode: Optional extreme compression for maximum memory savings
  • πŸ“ˆ Memory Reports: Detailed analysis of memory usage per column
  • βœ… 100% Pandas Compatible: Works seamlessly with all pandas operations

πŸ“š Documentation

🎯 When to Use Diet Pandas

βœ… Perfect for: - Loading large CSV files (>100MB) - Working with limited RAM environments - Training ML models on large datasets - ETL pipelines with memory constraints - Web applications serving data

⚠️ Consider alternatives for: - Tiny datasets (<1MB) - optimization overhead not worth it - Streaming data pipelines - consider Polars directly - When you need maximum precision (financial calculations)

🀝 Contributing

We welcome contributions! See our Contributing Guide for details.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.