How to duplicate columns with a prefix/suffix when flattening a DataFrameGroupBy

6 days ago 6
ARTICLE AD BOX

I've been working on a project to learn python and play around with machine learning that uses data from the NHL's public API. I've been able to teach myself via documentation and google searches so far, but I've gotten stumped on my current problem.

My current grouped data looks something like this:

gameId teamId goals assists ... giveaways takeaways
2022020001 18 4 8 ... 3 4
28 1 2 ... 7 3
2022020002 18 3 6 ... 7 11
28 2 4 ... 9 6

I also have a separate source table with general game meta-data. Entries for these games looks something like this:

season gameType ... homeTeam awayTeam winner *
20222023 2 ... 18 28 1
20222023 2 ... 28 18 -1

* 1 is a home team win, -1 is an away team win

I want each row in the end DataFrame to represent the total stats for a single game so I need to duplicate the stats columns to keep from mingling the home team and away team stats for a given game.

It should look something like this for the two example games above:

gameId teamId hame_goals home_assists ... away_giveaways away_takeaways
2022020001 18 4 8 ... 7 3
2022020002 18 3 6 ... 9 6

I actually have another DataFrame with goalie stats (provided example is skater stats) since the actual stats tracked are different so there would be another set of home_* / away_* columns with the goalie stats.

so far I'm doing this:

skaters_reduced = skaters_reduced.reset_index() goalies_reduced = goalies_reduced.reset_index() game_stats = pd.merge( skaters_reduced, goalies_reduced, how='outer', on=[Keys.game_id, Keys.team_id]) game_stats = game_stats.set_index(Keys.game_id)

...which gives me a DataFrame like this:

gameId teamId goals assists ... shorhandedSavesAgainst overallSavesAgainst
2022020001 18 4 8 ... 8 30
2022020001 28 1 2 ... 8 28
2022020002 18 3 6 ... 7 31
2022020002 28 2 4 ... 2 15

I'm just not sure how to proceed or if I'm headed down the wrong path.

Read Entire Article