`excel`

pinky_snowpark.excel

openpyxl helpers for building Excel workbooks from Snowpark DataFrames.

Requires the excel optional dependency::

pip install pinky-snowpark[excel]

All openpyxl imports are lazy — the module is importable without openpyxl installed, and errors surface only when a function is actually called.

`CellFormat` `dataclass`

Formatting descriptor for an Excel column (openpyxl).

All fields are optional — only non-None / non-False attributes are applied, leaving the openpyxl default intact for everything else.

Attributes:

Name	Type	Description
`number_format`	`str \| None`	openpyxl number format string (e.g. `'#,##0.00 "€"'`).
`bold`	`bool`	Bold font.
`italic`	`bool`	Italic font.
`font_color`	`str \| None`	Font colour as RRGGBB hex without `#` (e.g. `"FF0000"`).
`font_size`	`int \| None`	Font size in points.
`bg_color`	`str \| None`	Background fill colour as RRGGBB hex (e.g. `"FFF2CC"`).
`h_align`	`str \| None`	Horizontal alignment — `"left"` \| `"center"` \| `"right"`.
`v_align`	`str \| None`	Vertical alignment — `"top"` \| `"center"` \| `"bottom"`.
`wrap`	`bool`	Enable text wrap.

Source code in src/pinky_snowpark/excel.py

@dataclass
class CellFormat:
    """Formatting descriptor for an Excel column (openpyxl).

    All fields are optional — only non-None / non-False attributes are applied,
    leaving the openpyxl default intact for everything else.

    Attributes:
        number_format: openpyxl number format string (e.g. ``'#,##0.00 "€"'``).
        bold:          Bold font.
        italic:        Italic font.
        font_color:    Font colour as RRGGBB hex without ``#`` (e.g. ``"FF0000"``).
        font_size:     Font size in points.
        bg_color:      Background fill colour as RRGGBB hex (e.g. ``"FFF2CC"``).
        h_align:       Horizontal alignment — ``"left"`` | ``"center"`` | ``"right"``.
        v_align:       Vertical alignment — ``"top"`` | ``"center"`` | ``"bottom"``.
        wrap:          Enable text wrap.
    """
    number_format: str | None = None
    bold: bool = False
    italic: bool = False
    font_color: str | None = None
    font_size: int | None = None
    bg_color: str | None = None
    h_align: str | None = None
    v_align: str | None = None
    wrap: bool = False

`ParseResult`

Bases: NamedTuple

Return value of :func:parse_worksheet — supports tuple unpacking.

Attributes:

Name	Type	Description
`headers`	`list[str]`	Column names after normalization.
`rows`	`list[dict[str, str \| None]]`	Row data — every cell value is cast to `str`, empty cells as `None`.

Source code in src/pinky_snowpark/excel.py

class ParseResult(NamedTuple):
    """Return value of :func:`parse_worksheet` — supports tuple unpacking.

    Attributes:
        headers: Column names after normalization.
        rows:    Row data — every cell value is cast to ``str``, empty cells as ``None``.
    """
    headers: list[str]
    rows: list[dict[str, str | None]]

`create_workbook_from_df(df, sheet_name='Data', freeze_panes='A2', auto_fit=True, fixed_column_widths=None, wrap_columns=None, col_fmt=None, col_transform=None, wb=None, header_fmt=_DEFAULT_HEADER_FMT)`

Create (or extend) an openpyxl Workbook from a Snowpark DataFrame.

Writes column headers from the DataFrame schema, then all data rows. Optionally applies: pane freeze, auto column widths, fixed widths, text wrap, per-column number formats and per-column value transforms.

Parameters:

Name	Type	Description	Default
`df`	`'snowpark.DataFrame'`	Source Snowpark DataFrame.	required
`sheet_name`	`str`	Sheet tab name (default `"Data"`).	`'Data'`
`freeze_panes`	`str`	Top-left cell of the frozen region (default `"A2"`).	`'A2'`
`auto_fit`	`bool`	Auto-size column widths from content (default `True`).	`True`
`fixed_column_widths`	`dict[str, int] \| None`	`{"A": 20, "C": 40}` — overrides auto_fit for those columns.	`None`
`wrap_columns`	`list[int] \| None`	1-based column indices where text wrap is enabled.	`None`
`col_fmt`	`dict[int \| str, str \| CellFormat] \| None`	`{col: preset_name_or_CellFormat}` applied to data rows. Column keys accept either a 1-based integer index (`3`) or a letter string (`"C"`); both forms may coexist. Available presets: `date_fr`, `date_iso`, `date_us`, `date_ch`, `datetime_fr`, `eur`, `gbp`, `chf`, `usd`, `pct`, `pct_val`, `int`, `decimal`, `bold`, `center`, `right`, `wrap`, `bg_yellow`, `bg_red`, `bg_green`, `bg_blue`, `bg_grey`, `eur_bold`, `eur_right`, `pct_center`, `header`, `header_dark`.	`None`
`col_transform`	`dict[int \| str, Callable[[Any], Any]] \| None`	`{col: callable}` applied to each cell value before writing. Column keys accept either a 1-based integer index or a letter string (same as `col_fmt`). Use functions from `pinky_core.fmt` such as `format_boolean`, `format_duration`, `format_trend`, `format_stars`, `format_fraction`, `format_number`.	`None`
`wb`	`'Workbook \| None'`	Existing Workbook to add a sheet to (`None` = new Workbook).	`None`
`header_fmt`	`CellFormat \| None`	Format applied to header row (default: bold 12pt).	`_DEFAULT_HEADER_FMT`

Returns:

Type	Description
`'Workbook'`	openpyxl Workbook with the populated sheet.

Source code in src/pinky_snowpark/excel.py

def create_workbook_from_df(
    df: "snowpark.DataFrame",
    sheet_name: str = "Data",
    freeze_panes: str = "A2",
    auto_fit: bool = True,
    fixed_column_widths: dict[str, int] | None = None,
    wrap_columns: list[int] | None = None,
    col_fmt: dict[int | str, str | CellFormat] | None = None,
    col_transform: dict[int | str, Callable[[Any], Any]] | None = None,
    wb: "Workbook | None" = None,
    header_fmt: CellFormat | None = _DEFAULT_HEADER_FMT,
) -> "Workbook":
    """Create (or extend) an openpyxl Workbook from a Snowpark DataFrame.

    Writes column headers from the DataFrame schema, then all data rows.
    Optionally applies: pane freeze, auto column widths, fixed widths, text
    wrap, per-column number formats and per-column value transforms.

    Args:
        df:                   Source Snowpark DataFrame.
        sheet_name:           Sheet tab name (default ``"Data"``).
        freeze_panes:         Top-left cell of the frozen region (default ``"A2"``).
        auto_fit:             Auto-size column widths from content (default ``True``).
        fixed_column_widths:  ``{"A": 20, "C": 40}`` — overrides auto_fit for those columns.
        wrap_columns:         1-based column indices where text wrap is enabled.
        col_fmt:              ``{col: preset_name_or_CellFormat}`` applied to data rows.
                              Column keys accept either a 1-based integer index (``3``)
                              or a letter string (``"C"``); both forms may coexist.
                              Available presets: ``date_fr``, ``date_iso``, ``date_us``,
                              ``date_ch``, ``datetime_fr``, ``eur``, ``gbp``, ``chf``,
                              ``usd``, ``pct``, ``pct_val``, ``int``, ``decimal``,
                              ``bold``, ``center``, ``right``, ``wrap``, ``bg_yellow``,
                              ``bg_red``, ``bg_green``, ``bg_blue``, ``bg_grey``,
                              ``eur_bold``, ``eur_right``, ``pct_center``,
                              ``header``, ``header_dark``.
        col_transform:        ``{col: callable}`` applied to each cell value before
                              writing. Column keys accept either a 1-based integer index
                              or a letter string (same as ``col_fmt``).
                              Use functions from ``pinky_core.fmt`` such as
                              ``format_boolean``, ``format_duration``, ``format_trend``,
                              ``format_stars``, ``format_fraction``, ``format_number``.
        wb:                   Existing Workbook to add a sheet to (``None`` = new Workbook).
        header_fmt:           Format applied to header row (default: bold 12pt).

    Returns:
        openpyxl Workbook with the populated sheet.
    """
    from openpyxl import Workbook as _Workbook
    from openpyxl.styles import Alignment, Font

    col_fmt = _resolve_col_keys(col_fmt) if col_fmt else None
    col_transform = _resolve_col_keys(col_transform) if col_transform else None

    if wb is None:
        wb = _Workbook()
        ws: Worksheet = wb.active  # type: ignore[assignment]
    else:
        ws = wb.create_sheet()  # type: ignore[assignment]
    ws.title = sheet_name

    headers = [f.name.strip('"') for f in df.schema.fields]
    ws.append(headers)

    if header_fmt:
        font_kwargs: dict[str, Any] = {}
        if header_fmt.bold:
            font_kwargs["bold"] = True
        if header_fmt.font_size:
            font_kwargs["size"] = header_fmt.font_size
        if header_fmt.font_color:
            font_kwargs["color"] = header_fmt.font_color
        if font_kwargs:
            for col_idx in range(1, len(headers) + 1):
                ws.cell(row=1, column=col_idx).font = Font(**font_kwargs)

    for row in [r.as_dict(recursive=True) for r in df.collect()]:
        values: list[Any] = list(row.values())
        if col_transform:
            values = [
                col_transform[i + 1](v) if (i + 1) in col_transform else v
                for i, v in enumerate(values)
            ]
        ws.append(values)

    ws.freeze_panes = freeze_panes

    if auto_fit:
        _auto_fit_columns(ws)

    if fixed_column_widths:
        for col_letter, width in fixed_column_widths.items():
            ws.column_dimensions[col_letter].width = width

    if wrap_columns:
        alignment = Alignment(wrapText=True, vertical="center")
        for col_idx in wrap_columns:
            for row_idx in range(2, ws.max_row + 1):
                ws.cell(row=row_idx, column=col_idx).alignment = alignment

    if col_fmt:
        for col_idx, fmt in col_fmt.items():
            resolved = CELL_FORMATS[fmt] if isinstance(fmt, str) else fmt
            _apply_cell_format(ws, col_idx, resolved)

    return wb

`parse_worksheet(ws, *, normalize_headers=True, deduplicate=True)`

Parse an openpyxl Worksheet into a list of row dicts.

Skips leading empty rows to find the header row, then reads all subsequent non-empty rows as string values.

Typical use: download an XLSX from an API (INSEE Melodi, DGEFP…), open with openpyxl.load_workbook(io.BytesIO(xlsx_bytes)), then call this function on each sheet before building a Snowpark DataFrame.

Parameters:

Name	Type	Description	Default
`ws`	`'Worksheet'`	openpyxl Worksheet object (read-only or normal).	required
`normalize_headers`	`bool`	Convert headers to UPPER_SNAKE_CASE using `to_upper_snake_case` from `pinky_core.fmt` (default `True`). Set `False` to keep raw header strings as-is.	`True`
`deduplicate`	`bool`	Append `_2`, `_3`… to duplicate header names after normalization (default `True`).	`True`

Returns:

Type	Description
`ParseResult`	Tuple `(headers, rows)` where `headers` is the list of normalized
`ParseResult`	column names and `rows` is a `list[dict[str, str \| None]]` — cell
`ParseResult`	values cast to `str`, empty cells as `None`.

Example::

import io, openpyxl
from pinky_snowpark import parse_worksheet

wb = openpyxl.load_workbook(io.BytesIO(xlsx_bytes), read_only=True, data_only=True)
for sheet_name in wb.sheetnames:
    headers, rows = parse_worksheet(wb[sheet_name])
    # → build Snowpark DataFrame or stage_write(session, rows, ...)

Source code in src/pinky_snowpark/excel.py

def parse_worksheet(
    ws: "Worksheet",
    *,
    normalize_headers: bool = True,
    deduplicate: bool = True,
) -> ParseResult:
    """Parse an openpyxl Worksheet into a list of row dicts.

    Skips leading empty rows to find the header row, then reads all
    subsequent non-empty rows as string values.

    Typical use: download an XLSX from an API (INSEE Melodi, DGEFP…),
    open with ``openpyxl.load_workbook(io.BytesIO(xlsx_bytes))``, then
    call this function on each sheet before building a Snowpark DataFrame.

    Args:
        ws:                 openpyxl Worksheet object (read-only or normal).
        normalize_headers:  Convert headers to UPPER_SNAKE_CASE using
                            ``to_upper_snake_case`` from ``pinky_core.fmt``
                            (default ``True``).  Set ``False`` to keep raw
                            header strings as-is.
        deduplicate:        Append ``_2``, ``_3``… to duplicate header names
                            after normalization (default ``True``).

    Returns:
        Tuple ``(headers, rows)`` where ``headers`` is the list of normalized
        column names and ``rows`` is a ``list[dict[str, str | None]]`` — cell
        values cast to ``str``, empty cells as ``None``.

    Example::

        import io, openpyxl
        from pinky_snowpark import parse_worksheet

        wb = openpyxl.load_workbook(io.BytesIO(xlsx_bytes), read_only=True, data_only=True)
        for sheet_name in wb.sheetnames:
            headers, rows = parse_worksheet(wb[sheet_name])
            # → build Snowpark DataFrame or stage_write(session, rows, ...)
    """
    from pinky_core.fmt import to_upper_snake_case

    rows_iter = ws.iter_rows(values_only=True)

    raw_headers: list[str] | None = None
    for row in rows_iter:
        if any(cell is not None for cell in row):
            raw_headers = [
                str(c) if c is not None else f"COL_{i}"
                for i, c in enumerate(row)
            ]
            break

    if not raw_headers:
        return [], []

    headers = (
        [to_upper_snake_case(h) for h in raw_headers]
        if normalize_headers
        else list(raw_headers)
    )

    if deduplicate:
        seen: dict[str, int] = {}
        deduped: list[str] = []
        for h in headers:
            if h in seen:
                seen[h] += 1
                deduped.append(f"{h}_{seen[h]}")
            else:
                seen[h] = 0
                deduped.append(h)
        headers = deduped

    data_rows: list[dict[str, str | None]] = []
    for row in rows_iter:
        if not any(cell is not None for cell in row):
            continue
        cells = list(row)
        if len(cells) < len(headers):
            cells += [None] * (len(headers) - len(cells))
        else:
            cells = cells[: len(headers)]
        data_rows.append({
            h: str(v) if v is not None else None
            for h, v in zip(headers, cells)
        })

    return ParseResult(headers, data_rows)

excel

pinky_snowpark.excel

CellFormat dataclass

ParseResult

create_workbook_from_df(df, sheet_name='Data', freeze_panes='A2', auto_fit=True, fixed_column_widths=None, wrap_columns=None, col_fmt=None, col_transform=None, wb=None, header_fmt=_DEFAULT_HEADER_FMT)

parse_worksheet(ws, *, normalize_headers=True, deduplicate=True)

`excel`

`CellFormat` `dataclass`

`ParseResult`

`create_workbook_from_df(df, sheet_name='Data', freeze_panes='A2', auto_fit=True, fixed_column_widths=None, wrap_columns=None, col_fmt=None, col_transform=None, wb=None, header_fmt=_DEFAULT_HEADER_FMT)`

`parse_worksheet(ws, *, normalize_headers=True, deduplicate=True)`