Diet Pandas πΌπ₯¶
Tagline: Same Pandas taste, half the calories (RAM).
π― The Problem¶
Pandas is built for safety and ease of use, not memory efficiency. When you load a CSV, standard Pandas defaults to "safe" but wasteful data types:
int64for small integers (wasting 75%+ memory per number)float64for simple metrics (wasting 50% memory per number)objectfor repetitive strings (wasting massive amounts of memory and CPU)
Diet Pandas solves this by acting as a strict nutritionist for your data. It aggressively analyzes data distributions and "downcasts" types to the smallest safe representationβoften reducing memory usage by 50% to 80% without losing information.
π Quick Example¶
import dietpandas as dp
# Drop-in replacement for pandas.read_csv
# Loads faster and uses less RAM automatically
df = dp.read_csv("huge_dataset.csv")
# π₯ Diet Complete: Memory reduced by 67.3%
# 450.00MB -> 147.15MB
# Or optimize an existing DataFrame
import pandas as pd
df_heavy = pd.DataFrame({
'year': [2020, 2021, 2022],
'revenue': [1.1, 2.2, 3.3]
})
df_light = dp.diet(df_heavy)
# π₯ Diet Complete: Memory reduced by 62.5%
β¨ Key Features¶
- β‘ Parallel Processing (NEW in v0.5.0): Multi-threaded optimization for 2-4x speedup
- π Fast Loading: Uses Polars engine for 5-10x faster CSV parsing
- π― Smart Optimization: Automatically downcasts numeric types to smallest safe representation
- π Vectorized Boolean Detection (NEW in v0.5.0): Lightning-fast boolean column optimization
- ποΈ Sparse Support: Optimizes columns with many repeated values (95%+ reduction)
- π DateTime Handling: Efficient datetime column optimization
- π Multiple Formats: CSV, Parquet, Excel, JSON, HDF5, Feather
- π₯ Aggressive Mode: Optional extreme compression for maximum memory savings
- π Memory Reports: Detailed analysis of memory usage per column
- β 100% Pandas Compatible: Works seamlessly with all pandas operations
π Documentation¶
- Installation - Get started in seconds
- Quick Start - Basic usage examples
- User Guide - Comprehensive tutorials
- API Reference - Complete API documentation
- Performance - Benchmark results
π― When to Use Diet Pandas¶
β Perfect for: - Loading large CSV files (>100MB) - Working with limited RAM environments - Training ML models on large datasets - ETL pipelines with memory constraints - Web applications serving data
β οΈ Consider alternatives for: - Tiny datasets (<1MB) - optimization overhead not worth it - Streaming data pipelines - consider Polars directly - When you need maximum precision (financial calculations)
π€ Contributing¶
We welcome contributions! See our Contributing Guide for details.
π License¶
This project is licensed under the MIT License - see the LICENSE file for details.