Changelog¶
All notable changes to Diet Pandas will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.4.0] - 2025-12-23¶
Added¶
- Smart Float-to-Integer Conversion: Automatically detect and convert float columns to integers when they contain only whole numbers
- Detects floats with no decimal part (e.g., 1.0, 2.0, 3.0)
- Converts to smallest appropriate integer type (int8, int16, uint8, etc.)
- Preserves NaN values using nullable integer types (Int8, UInt8, etc.)
- New
float_to_intparameter fordiet()andoptimize_float()(default: True) - Can be disabled with
float_to_int=Falseto preserve float types - Significant memory savings for ID columns, year fields, counts, and categorical codes
- 9 comprehensive test cases covering edge cases and NaN handling
Improved¶
- Enhanced
optimize_float()with intelligent integer detection logic - Updated documentation with float-to-int examples and use cases
- Added
float_to_int_demo.pyscript demonstrating the feature
Performance¶
- Up to 50% memory reduction for datasets with float-typed integer columns
- Common in CSV files where numeric columns are loaded as float64 by default
Tests¶
- 128 total tests passing (9 new float-to-int conversion tests)
[0.3.0] - 2025-12-23¶
Added¶
- Automatic Chunked Reading: Memory-aware CSV reading
- Automatically switches to chunked reading for large files
- Estimates file size and available memory with
psutil - Configurable
memory_threshold(default: 70% of available RAM) auto_chunkparameter to enable/disable (default: True)- Works seamlessly with schema persistence
-
Prevents out-of-memory errors on large datasets
-
Schema Persistence: Save and reuse optimization schemas
save_schema()- Save DataFrame schema to JSONload_schema()- Load schema from JSON fileapply_schema()- Apply saved schema to DataFrameauto_schema_path()- Generate schema file paths automatically- Skip re-analysis on repeated loads for faster processing
- Integration with
read_csv()viaschema_pathandsave_schemaparameters - 16 comprehensive tests for schema operations
Improved¶
- Enhanced
read_csv()with automatic memory management - Added
psutil>=5.9.0dependency for memory monitoring - Better handling of large files with automatic chunking
- Schema-based optimization eliminates redundant analysis
Performance¶
- Automatic memory-aware chunking prevents out-of-memory errors
- Schema reuse eliminates redundant analysis overhead
- Faster repeated loads when using schema persistence
Tests¶
- 139 total tests passing (23 new schema/chunking tests)
- Cross-platform compatibility (Windows, macOS, Linux)
- Fixed Windows file permission issues in tests
- All code formatted with black and verified with flake8/isort
[0.2.2] - 2025-12-22¶
Fixed¶
- Removed broken logo and favicon references from documentation site
[0.2.1] - 2025-12-21¶
Added¶
- Documentation Site: Complete documentation website at https://luiz826.github.io/diet-pandas/
- Getting Started guides (Installation, Quick Start)
- User Guide (Basic Usage, File I/O, Advanced, Memory Reports)
- API Reference with auto-generated docs
- Performance Benchmarks
- Changelog, Contributing, and License pages
Fixed¶
- GitHub Actions workflow for documentation deployment
- CI now properly installs package before building docs
[0.2.0] - 2025-12-21¶
Added¶
- DateTime Optimization: New
optimize_datetime()function for efficient datetime handling - Sparse Data Support: New
optimize_sparse()function for sparse array optimization (up to 96% memory reduction on sparse data) - Extended File Format Support:
read_json()- JSON file reading with optimizationread_hdf()- HDF5 file reading with optimizationread_feather()- Feather file reading with optimizationto_parquet_optimized()- Optimized Parquet writingto_feather_optimized()- Optimized Feather writing- Performance Benchmarking: Comprehensive benchmark script (
scripts/benchmark.py) - New parameters for
diet(): optimize_datetimes- Enable/disable datetime optimization (default: True)optimize_sparse_cols- Enable sparse optimization (default: False)sparse_threshold- Threshold for sparse conversion (default: 0.9)
Improved¶
- Enhanced
diet()function with datetime and sparse support - Better test coverage with 12 new tests for new features
- Comprehensive documentation updates
- Emojis in output messages for better UX (🥗)
Performance¶
- 82-87% memory reduction on typical datasets (tested with 10K-500K rows)
- Up to 96% memory reduction on sparse data
- Optimization time: 0.007-0.16 seconds depending on dataset size
Documentation¶
- Updated README with new features
- Improved CONTRIBUTING.md with completed tasks
- Added performance benchmarking guide
- New example code for sparse and datetime optimization
[0.1.0] - 2025-12-19¶
Added¶
- Initial release of Diet Pandas
- Core optimization engine with intelligent type downcasting
diet()function for optimizing existing DataFramesoptimize_int()for integer optimization (int64 → int8/int16/uint8/uint16)optimize_float()for float optimization (float64 → float32/float16)optimize_obj()for string to category conversionget_memory_report()for detailed memory usage analysis- Fast I/O module with Polars integration
read_csv()with automatic optimization (5-10x faster)read_parquet()with automatic optimizationread_excel()with automatic optimizationto_csv_optimized()for saving optimized DataFrames- Aggressive mode for maximum compression (float16)
- Customizable categorical threshold
- In-place optimization option
- Verbose mode with memory reduction statistics
- Comprehensive test suite (95%+ coverage)
- Detailed documentation and examples
- Quick reference card
- Development guide
- Contributing guidelines
Features¶
- 50-80% memory reduction on typical datasets
- 5-10x faster CSV loading with Polars
- 100% Pandas compatibility
- Automatic fallback to standard Pandas if Polars unavailable
- Safe, lossless optimization by default
- Optional aggressive mode for maximum compression
Documentation¶
- Complete README with examples
- Quick reference card (QUICKREF.md)
- Development guide (DEVELOPMENT.md)
- Contributing guidelines (CONTRIBUTING.md)
- Project summary (PROJECT_SUMMARY.md)
- 5 comprehensive examples (examples.py)
- Inline docstring examples
- API documentation
Testing¶
- Unit tests for all core functions
- Integration tests for I/O operations
- Edge case testing (NaN, empty data, etc.)
- Cross-platform testing (Linux, macOS, Windows)
- Python 3.10+ compatibility testing
Infrastructure¶
- PyPI-ready package structure
- GitHub Actions CI/CD workflow
- Makefile for common tasks
- Quick setup script
- MIT License
[Unreleased]¶
Planned for 0.2.0¶
- DateTime optimization
- Boolean optimization
- Sparse data handling
- JSON format support
- HDF5 format support
- Feather format support
Planned for 0.3.0¶
- Parallel processing support
- Streaming optimization
- Custom optimization profiles
- Performance benchmarking tools
Future Considerations¶
- Web dashboard for visualization
- Jupyter notebook extension
- VS Code extension
- Auto-optimization on DataFrame creation
- Integration with Dask for big data
- GPU acceleration support
Release Notes¶
Version 0.1.0¶
This is the initial release of Diet Pandas, a memory optimization library for Pandas DataFrames.
Key Highlights:
- Reduces DataFrame memory usage by 50-80% without data loss
- Loads CSV files 5-10x faster than standard Pandas
- Fully compatible with existing Pandas workflows
- Easy to use: just replace pd.read_csv() with dp.read_csv()
Example Usage:
import dietpandas as dp
# Fast loading with automatic optimization
df = dp.read_csv("large_file.csv")
# 🥗 Diet Complete: Memory reduced by 67.3%
# Or optimize existing DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
df = dp.diet(df)
Technical Details: - Uses Polars for multi-threaded CSV parsing - Intelligent type downcasting algorithms - Automatic categorical conversion for low-cardinality strings - Comprehensive test coverage - Cross-platform support
Installation:
Documentation: - See README.md for complete documentation - See QUICKREF.md for quick reference - See examples.py for usage examples
For more information, visit: https://github.com/yourusername/diet-pandas