ARTICLE AD BOX
I've been experimenting with pandas for my data analysis, however I encountered an issue when dealing with .groupby() and .transform() .
Let this be the dataframe:
df = pd.DataFrame({ 'group': ['x','x','x','y','y','z'], 'b': [1, 2, 1, 3, 3, 4] })| x | 1 |
| x | 2 |
| x | 1 |
| y | 3 |
| y | 3 |
| z | 4 |
I wanted to create a column 'c' that collects in a list all the unique values from column 'b' for each group, resulting in this:
| x | 1 | 1,2 (as a list) |
| x | 2 | 1,2 (as a list) |
| x | 1 | 1,2 (as a list) |
| y | 3 | 3 (as a 1 item list) |
| y | 3 | 3 (as a 1 item list) |
| z | 4 | 4 (as a 1 item list) |
To do this I thought of using .groupby('group')['b'].transform(). I chose to do this because that's how I can apply the selected value to the entire group. For instance this code can produce the dataframe below:
df['c'] = df.groupby(['group'])['b'].transform('first')| x | 1 | 1 |
| x | 2 | 1 |
| x | 1 | 1 |
| y | 3 | 3 |
| y | 3 | 3 |
| z | 4 | 4 |
Going back to my problem, I tries this line and it yields the following error:
df['c'] = df.groupby(['group'])['b'].transform(lambda x: np.unique(x.astype(str))) ValueError: Length of values (2) does not match length of index (3)Interestingly this line of code works instead:
df['c'] = df.groupby(['group'])['b'].transform(lambda x: ','.join(np.unique(x.astype(str))))yielding:
| x | 1 | '1,2' |
| x | 2 | '1,2' |
| x | 1 | '1,2' |
| y | 3 | '3' |
| y | 3 | '3' |
| z | 4 | '4' |
I think I can find a workaround using .agg() and .map() probably but I'm interested in knowing what I'm doing wrong with the SeriesGroupBy
Can someone please explain me why this doesn't really work with my code? Thank you!
