Conversion from HuggingFace dataset to Pandas DF causing dropped 'string' columns

1 week ago 14
ARTICLE AD BOX

By default, df.describe() does NOT show string columns. It only summarizes numeric columns. String (object) columns are not dropped — they are just not displayed.

You can check your code by this:

df.columns df.dtypes df.describe(include="all")

HuggingFace Dataset objects can have a format attached (e.g., torch, numpy). When a format is set, only the formatted columns are returned unless explicitly told otherwise.

To get consistent columns in both Huggingface and Pandas dataframe, you can use this code:

from datasets import load_dataset pr_commits = load_dataset("hao-li/AIDev", "pr_commits")["train"] pr_commits.reset_format() # IMPORTANT commits_df = pr_commits.to_pandas() print(commits_df.columns) print(commits_df.dtypes)

Hope it helps!

Sultan Ahmed Sagor's user avatar

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Read Entire Article