Skip to content

Changelog

All notable changes to Diet Pandas will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.4.0] - 2025-12-23

Added

  • Smart Float-to-Integer Conversion: Automatically detect and convert float columns to integers when they contain only whole numbers
  • Detects floats with no decimal part (e.g., 1.0, 2.0, 3.0)
  • Converts to smallest appropriate integer type (int8, int16, uint8, etc.)
  • Preserves NaN values using nullable integer types (Int8, UInt8, etc.)
  • New float_to_int parameter for diet() and optimize_float() (default: True)
  • Can be disabled with float_to_int=False to preserve float types
  • Significant memory savings for ID columns, year fields, counts, and categorical codes
  • 9 comprehensive test cases covering edge cases and NaN handling

Improved

  • Enhanced optimize_float() with intelligent integer detection logic
  • Updated documentation with float-to-int examples and use cases
  • Added float_to_int_demo.py script demonstrating the feature

Performance

  • Up to 50% memory reduction for datasets with float-typed integer columns
  • Common in CSV files where numeric columns are loaded as float64 by default

Tests

  • 128 total tests passing (9 new float-to-int conversion tests)

[0.3.0] - 2025-12-23

Added

  • Automatic Chunked Reading: Memory-aware CSV reading
  • Automatically switches to chunked reading for large files
  • Estimates file size and available memory with psutil
  • Configurable memory_threshold (default: 70% of available RAM)
  • auto_chunk parameter to enable/disable (default: True)
  • Works seamlessly with schema persistence
  • Prevents out-of-memory errors on large datasets

  • Schema Persistence: Save and reuse optimization schemas

  • save_schema() - Save DataFrame schema to JSON
  • load_schema() - Load schema from JSON file
  • apply_schema() - Apply saved schema to DataFrame
  • auto_schema_path() - Generate schema file paths automatically
  • Skip re-analysis on repeated loads for faster processing
  • Integration with read_csv() via schema_path and save_schema parameters
  • 16 comprehensive tests for schema operations

Improved

  • Enhanced read_csv() with automatic memory management
  • Added psutil>=5.9.0 dependency for memory monitoring
  • Better handling of large files with automatic chunking
  • Schema-based optimization eliminates redundant analysis

Performance

  • Automatic memory-aware chunking prevents out-of-memory errors
  • Schema reuse eliminates redundant analysis overhead
  • Faster repeated loads when using schema persistence

Tests

  • 139 total tests passing (23 new schema/chunking tests)
  • Cross-platform compatibility (Windows, macOS, Linux)
  • Fixed Windows file permission issues in tests
  • All code formatted with black and verified with flake8/isort

[0.2.2] - 2025-12-22

Fixed

  • Removed broken logo and favicon references from documentation site

[0.2.1] - 2025-12-21

Added

  • Documentation Site: Complete documentation website at https://luiz826.github.io/diet-pandas/
  • Getting Started guides (Installation, Quick Start)
  • User Guide (Basic Usage, File I/O, Advanced, Memory Reports)
  • API Reference with auto-generated docs
  • Performance Benchmarks
  • Changelog, Contributing, and License pages

Fixed

  • GitHub Actions workflow for documentation deployment
  • CI now properly installs package before building docs

[0.2.0] - 2025-12-21

Added

  • DateTime Optimization: New optimize_datetime() function for efficient datetime handling
  • Sparse Data Support: New optimize_sparse() function for sparse array optimization (up to 96% memory reduction on sparse data)
  • Extended File Format Support:
  • read_json() - JSON file reading with optimization
  • read_hdf() - HDF5 file reading with optimization
  • read_feather() - Feather file reading with optimization
  • to_parquet_optimized() - Optimized Parquet writing
  • to_feather_optimized() - Optimized Feather writing
  • Performance Benchmarking: Comprehensive benchmark script (scripts/benchmark.py)
  • New parameters for diet():
  • optimize_datetimes - Enable/disable datetime optimization (default: True)
  • optimize_sparse_cols - Enable sparse optimization (default: False)
  • sparse_threshold - Threshold for sparse conversion (default: 0.9)

Improved

  • Enhanced diet() function with datetime and sparse support
  • Better test coverage with 12 new tests for new features
  • Comprehensive documentation updates
  • Emojis in output messages for better UX (🥗)

Performance

  • 82-87% memory reduction on typical datasets (tested with 10K-500K rows)
  • Up to 96% memory reduction on sparse data
  • Optimization time: 0.007-0.16 seconds depending on dataset size

Documentation

  • Updated README with new features
  • Improved CONTRIBUTING.md with completed tasks
  • Added performance benchmarking guide
  • New example code for sparse and datetime optimization

[0.1.0] - 2025-12-19

Added

  • Initial release of Diet Pandas
  • Core optimization engine with intelligent type downcasting
  • diet() function for optimizing existing DataFrames
  • optimize_int() for integer optimization (int64 → int8/int16/uint8/uint16)
  • optimize_float() for float optimization (float64 → float32/float16)
  • optimize_obj() for string to category conversion
  • get_memory_report() for detailed memory usage analysis
  • Fast I/O module with Polars integration
  • read_csv() with automatic optimization (5-10x faster)
  • read_parquet() with automatic optimization
  • read_excel() with automatic optimization
  • to_csv_optimized() for saving optimized DataFrames
  • Aggressive mode for maximum compression (float16)
  • Customizable categorical threshold
  • In-place optimization option
  • Verbose mode with memory reduction statistics
  • Comprehensive test suite (95%+ coverage)
  • Detailed documentation and examples
  • Quick reference card
  • Development guide
  • Contributing guidelines

Features

  • 50-80% memory reduction on typical datasets
  • 5-10x faster CSV loading with Polars
  • 100% Pandas compatibility
  • Automatic fallback to standard Pandas if Polars unavailable
  • Safe, lossless optimization by default
  • Optional aggressive mode for maximum compression

Documentation

  • Complete README with examples
  • Quick reference card (QUICKREF.md)
  • Development guide (DEVELOPMENT.md)
  • Contributing guidelines (CONTRIBUTING.md)
  • Project summary (PROJECT_SUMMARY.md)
  • 5 comprehensive examples (examples.py)
  • Inline docstring examples
  • API documentation

Testing

  • Unit tests for all core functions
  • Integration tests for I/O operations
  • Edge case testing (NaN, empty data, etc.)
  • Cross-platform testing (Linux, macOS, Windows)
  • Python 3.10+ compatibility testing

Infrastructure

  • PyPI-ready package structure
  • GitHub Actions CI/CD workflow
  • Makefile for common tasks
  • Quick setup script
  • MIT License

[Unreleased]

Planned for 0.2.0

  • DateTime optimization
  • Boolean optimization
  • Sparse data handling
  • JSON format support
  • HDF5 format support
  • Feather format support

Planned for 0.3.0

  • Parallel processing support
  • Streaming optimization
  • Custom optimization profiles
  • Performance benchmarking tools

Future Considerations

  • Web dashboard for visualization
  • Jupyter notebook extension
  • VS Code extension
  • Auto-optimization on DataFrame creation
  • Integration with Dask for big data
  • GPU acceleration support

Release Notes

Version 0.1.0

This is the initial release of Diet Pandas, a memory optimization library for Pandas DataFrames.

Key Highlights: - Reduces DataFrame memory usage by 50-80% without data loss - Loads CSV files 5-10x faster than standard Pandas - Fully compatible with existing Pandas workflows - Easy to use: just replace pd.read_csv() with dp.read_csv()

Example Usage:

import dietpandas as dp

# Fast loading with automatic optimization
df = dp.read_csv("large_file.csv")
# 🥗 Diet Complete: Memory reduced by 67.3%

# Or optimize existing DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
df = dp.diet(df)

Technical Details: - Uses Polars for multi-threaded CSV parsing - Intelligent type downcasting algorithms - Automatic categorical conversion for low-cardinality strings - Comprehensive test coverage - Cross-platform support

Installation:

pip install diet-pandas

Documentation: - See README.md for complete documentation - See QUICKREF.md for quick reference - See examples.py for usage examples


For more information, visit: https://github.com/yourusername/diet-pandas