ARTICLE AD BOX
I've been working on a project to learn python and play around with machine learning that uses data from the NHL's public API. I've been able to teach myself via documentation and google searches so far, but I've gotten stumped on my current problem.
My current grouped data looks something like this:
| 2022020001 | 18 | 4 | 8 | ... | 3 | 4 |
| 28 | 1 | 2 | ... | 7 | 3 | |
| 2022020002 | 18 | 3 | 6 | ... | 7 | 11 |
| 28 | 2 | 4 | ... | 9 | 6 |
I also have a separate source table with general game meta-data. Entries for these games looks something like this:
| 20222023 | 2 | ... | 18 | 28 | 1 |
| 20222023 | 2 | ... | 28 | 18 | -1 |
* 1 is a home team win, -1 is an away team win
I want each row in the end DataFrame to represent the total stats for a single game so I need to duplicate the stats columns to keep from mingling the home team and away team stats for a given game.
It should look something like this for the two example games above:
| 2022020001 | 18 | 4 | 8 | ... | 7 | 3 |
| 2022020002 | 18 | 3 | 6 | ... | 9 | 6 |
I actually have another DataFrame with goalie stats (provided example is skater stats) since the actual stats tracked are different so there would be another set of home_* / away_* columns with the goalie stats.
so far I'm doing this:
skaters_reduced = skaters_reduced.reset_index() goalies_reduced = goalies_reduced.reset_index() game_stats = pd.merge( skaters_reduced, goalies_reduced, how='outer', on=[Keys.game_id, Keys.team_id]) game_stats = game_stats.set_index(Keys.game_id)...which gives me a DataFrame like this:
| 2022020001 | 18 | 4 | 8 | ... | 8 | 30 |
| 2022020001 | 28 | 1 | 2 | ... | 8 | 28 |
| 2022020002 | 18 | 3 | 6 | ... | 7 | 31 |
| 2022020002 | 28 | 2 | 4 | ... | 2 | 15 |
I'm just not sure how to proceed or if I'm headed down the wrong path.
