API Reference: I/O Functions¶
This page documents all file input/output functions in Diet Pandas.
All I/O functions automatically optimize the loaded DataFrame and return a standard pandas DataFrame.
Read Functions¶
read_csv()¶
Read a CSV file with automatic memory optimization. Uses Polars engine for 5-10x faster parsing.
dietpandas.io.read_csv(filepath, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, use_polars=True, schema_path=None, save_schema=False, memory_threshold=0.7, auto_chunk=True, chunksize=100000, **kwargs)
¶
Reads a CSV file using Polars engine (if available), then converts to optimized Pandas.
Automatically switches to chunked reading when file is too large to fit in memory. This function is often 5-10x faster at parsing CSVs than pandas.read_csv, and the resulting DataFrame uses significantly less memory due to automatic optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to CSV file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
use_polars
|
bool
|
If True and Polars is available, use it for parsing (default: True) |
True
|
schema_path
|
Union[str, Path, None]
|
Optional path to schema file for consistent typing |
None
|
save_schema
|
bool
|
If True, save schema after optimization (only with chunked reading) |
False
|
memory_threshold
|
float
|
Use chunked reading if estimated memory > threshold * available (default: 0.7) |
0.7
|
auto_chunk
|
bool
|
If True, automatically use chunked reading for large files (default: True) |
True
|
chunksize
|
int
|
Number of rows per chunk when using chunked reading (default: 100,000) |
100000
|
**kwargs
|
Additional arguments passed to the CSV reader |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Examples:
>>> # Use saved schema for consistent typing
>>> df = read_csv("data.csv", schema_path="data.diet_schema.json")
>>> # Large files are automatically chunked
>>> df = read_csv("huge_file.csv") # Automatically uses chunked reading
Source code in src/dietpandas/io.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
Example:
import dietpandas as dp
# Basic usage
df = dp.read_csv("data.csv")
# Disable optimization
df = dp.read_csv("data.csv", optimize=False)
# Aggressive mode
df = dp.read_csv("data.csv", aggressive=True)
read_parquet()¶
Read a Parquet file with automatic memory optimization.
dietpandas.io.read_parquet(filepath, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, use_polars=True, **kwargs)
¶
Reads a Parquet file using Polars engine (if available), then converts to optimized Pandas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to Parquet file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
use_polars
|
bool
|
If True and Polars is available, use it for parsing (default: True) |
True
|
**kwargs
|
Additional arguments passed to the Parquet reader |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Source code in src/dietpandas/io.py
Example:
read_excel()¶
Read an Excel file with automatic memory optimization.
dietpandas.io.read_excel(filepath, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, **kwargs)
¶
Reads an Excel file and returns an optimized Pandas DataFrame.
Note: Polars support for Excel is limited, so this uses pandas.read_excel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to Excel file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
**kwargs
|
Additional arguments passed to pandas.read_excel |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Source code in src/dietpandas/io.py
Example:
import dietpandas as dp
# Read specific sheet
df = dp.read_excel("data.xlsx", sheet_name="Sheet1")
# Read all sheets
dfs = dp.read_excel("data.xlsx", sheet_name=None)
read_json()¶
Read a JSON file with automatic memory optimization.
dietpandas.io.read_json(filepath, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, **kwargs)
¶
Reads a JSON file and returns an optimized Pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to JSON file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
**kwargs
|
Additional arguments passed to pandas.read_json |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Examples:
Source code in src/dietpandas/io.py
Example:
import dietpandas as dp
# Read JSON lines format
df = dp.read_json("data.jsonl", lines=True)
# Read standard JSON
df = dp.read_json("data.json")
read_hdf()¶
Read an HDF5 file with automatic memory optimization.
dietpandas.io.read_hdf(filepath, key, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, **kwargs)
¶
Reads an HDF5 file and returns an optimized Pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to HDF5 file |
required |
key
|
str
|
Group identifier in the HDF5 file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
**kwargs
|
Additional arguments passed to pandas.read_hdf |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Examples:
Source code in src/dietpandas/io.py
Example:
Note: Requires optional tables dependency:
read_feather()¶
Read a Feather file with automatic memory optimization.
dietpandas.io.read_feather(filepath, optimize=True, aggressive=False, categorical_threshold=0.5, verbose=False, **kwargs)
¶
Reads a Feather file and returns an optimized Pandas DataFrame.
Feather is a fast, lightweight columnar data format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to Feather file |
required |
optimize
|
bool
|
If True, apply diet optimization after reading (default: True) |
True
|
aggressive
|
bool
|
If True, use aggressive optimization (float16, etc.) |
False
|
categorical_threshold
|
float
|
Threshold for converting objects to categories |
0.5
|
verbose
|
bool
|
If True, print memory reduction statistics |
False
|
**kwargs
|
Additional arguments passed to pandas.read_feather |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Optimized pandas DataFrame |
Examples:
Source code in src/dietpandas/io.py
Example:
Write Functions¶
to_csv_optimized()¶
Write a DataFrame to CSV with memory optimization.
dietpandas.io.to_csv_optimized(df, filepath, optimize_before_save=True, **kwargs)
¶
Saves a DataFrame to CSV, optionally optimizing it first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to save |
required |
filepath
|
Union[str, Path]
|
Path where CSV will be saved |
required |
optimize_before_save
|
bool
|
If True, optimize the DataFrame before saving |
True
|
**kwargs
|
Additional arguments passed to pandas.to_csv |
{}
|
Source code in src/dietpandas/io.py
Example:
import dietpandas as dp
import pandas as pd
df = pd.DataFrame({'col': range(1000)})
dp.to_csv_optimized(df, "output.csv")
to_parquet_optimized()¶
Write a DataFrame to Parquet with memory optimization.
dietpandas.io.to_parquet_optimized(df, filepath, optimize_before_save=True, **kwargs)
¶
Saves a DataFrame to Parquet format, optionally optimizing it first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to save |
required |
filepath
|
Union[str, Path]
|
Path where Parquet file will be saved |
required |
optimize_before_save
|
bool
|
If True, optimize the DataFrame before saving |
True
|
**kwargs
|
Additional arguments passed to pandas.to_parquet |
{}
|
Source code in src/dietpandas/io.py
Example:
import dietpandas as dp
import pandas as pd
df = pd.DataFrame({'col': range(1000)})
dp.to_parquet_optimized(df, "output.parquet")
to_feather_optimized()¶
Write a DataFrame to Feather format with memory optimization.
dietpandas.io.to_feather_optimized(df, filepath, optimize_before_save=True, **kwargs)
¶
Saves a DataFrame to Feather format, optionally optimizing it first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to save |
required |
filepath
|
Union[str, Path]
|
Path where Feather file will be saved |
required |
optimize_before_save
|
bool
|
If True, optimize the DataFrame before saving |
True
|
**kwargs
|
Additional arguments passed to pandas.to_feather |
{}
|
Source code in src/dietpandas/io.py
Example:
import dietpandas as dp
import pandas as pd
df = pd.DataFrame({'col': range(1000)})
dp.to_feather_optimized(df, "output.feather")
Supported File Formats¶
| Format | Read Function | Write Function | Optional Dependency |
|---|---|---|---|
| CSV | read_csv() |
to_csv_optimized() |
None (built-in) |
| Parquet | read_parquet() |
to_parquet_optimized() |
pyarrow |
| Excel | read_excel() |
N/A | openpyxl |
| JSON | read_json() |
N/A | None (built-in) |
| HDF5 | read_hdf() |
N/A | tables |
| Feather | read_feather() |
to_feather_optimized() |
pyarrow |
Performance Comparison¶
CSV Reading Performance¶
import time
import pandas as pd
import dietpandas as dp
# Standard pandas
start = time.time()
df_pandas = pd.read_csv("large_file.csv")
pandas_time = time.time() - start
# Diet pandas
start = time.time()
df_diet = dp.read_csv("large_file.csv")
diet_time = time.time() - start
print(f"Pandas: {pandas_time:.2f}s, Memory: {df_pandas.memory_usage().sum() / 1e6:.1f} MB")
print(f"Diet: {diet_time:.2f}s, Memory: {df_diet.memory_usage().sum() / 1e6:.1f} MB")
# Pandas: 45.2s, Memory: 2300.0 MB
# Diet: 8.7s, Memory: 750.0 MB
Common Parameters¶
Most read functions support these common parameters:
optimize(bool, default=True): Whether to optimize memory usageaggressive(bool, default=False): Use aggressive optimization mode**kwargs: Additional parameters passed to underlying pandas function