Filter empty string in a polars lazyframe

2 hours ago 1
ARTICLE AD BOX

I am trying to filter out the URI column from a parquet file having over 50 million rows containing empty string using

import polars as pl lf = pl.scan_parquet("data.parquet") lf.filter(pl.col("URI") == "").collect()

Output:

shape: (0, 3) ┌─────┬────────┬───────────┐ │ URI ┆ REMARK ┆ TIMESTAMP │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 │ ╞═════╪════════╪═══════════╡ └─────┴────────┴───────────┘

Luckily I had labelled the rows with empty string URI in column REMARK with NO URI so,

lf.filter(pl.col("REMARK") == "NO URI").collect()

yields:

shape: (7_767, 3) ┌─────┬────────┬────────────┐ │ URI ┆ REMARK ┆ TIMESTAMP │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 │ ╞═════╪════════╪════════════╡ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ … ┆ … ┆ … │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ │ ┆ NO URI ┆ 1759257000 │ └─────┴────────┴────────────┘

Also for confirmation that the URI column string is just empty string

len(lf.filter(pl.col("REMARK") == "NO URI").collect()["URI"][0]) # Outputs 0

Is this is a bug in polars or have I missed some important info, and how do I get the rows with empty string?

Python version: 3.14.2
Polars version: 1.35.2

Read Entire Article