pandas 그룹별 빈도 세는 방법 질문드립니다.

Question

pandas 그룹별 빈도 세는 방법 질문드립니다.

조회수 1582회

pandas

group-by

count

python

0

싫어요

파이썬에서 데이터가 아래 표와 같이 구성되어 있는 경우에 그룹별 단어 빈도수 세는 방법을 질문드립니다.
category별로 내용을 통합한 후 중복을 제거해서 
A: apple, orange, banana, melon (4개)
B: peach, apple, orange (3개) 
이런 내용의 결과물을 출력하고 싶습니다.
pandas의 groupby와 value_count를 한번에 쓰는 방법이 있을까요?
아니면 할수있는 다른 방법이 있을까요?

orange 15 points

2020-10-05 16:33:03에 작성됨

댓글 입력

score 1 · Accepted Answer

>>> df = pd.DataFrame({"category":["A", "A", "B"], "content":["apple, orange", "banana, apple, melon", "peach, apple, orange" ]})
>>> df
  category               content
0        A         apple, orange
1        A  banana, apple, melon
2        B  peach, apple, orange
>>> df["content"] = df["content"].str.split(", ")
>>> df
  category                 content
0        A         [apple, orange]
1        A  [banana, apple, melon]
2        B  [peach, apple, orange]
>>> df["content"] = df["content"].apply(set)
>>> df
  category                 content
0        A         {orange, apple}
1        A  {melon, banana, apple}
2        B  {peach, orange, apple}
>>> df.groupby("category").apply(lambda x: set.union(*x.content))
category
A    {melon, banana, orange, apple}
B            {peach, orange, apple}
dtype: object
>>> group_df = df.groupby("category").apply(lambda x: set.union(*x.content))
>>> for tup in group_df.to_frame().itertuples():
    tup1_str = ", ".join(sorted(tup[1]))
    n = len(tup[1])
    print(f"{tup[0]}: {tup1_str} ({n}개)")


A: apple, banana, melon, orange (4개)
B: apple, orange, peach (3개)

content 컬럼을 , 로 split 하여 list 로 만듦.
set 을 apply 하여 list 를 set 으로 바꿈. (이후 합칠 때 중복제거를 위해서)
category 로 groupby 하고, content 컬럼들에 대해 set.union (합집합) 연산으로 묶음.
만들어진 결과를 dataframe 으로 바꾸고 itertuples 로 이터레이트하면서 각 결과를 예쁘게 출력.

pandas 그룹별 빈도 세는 방법 질문드립니다.

조회수 1582회

pandas

group-by

count

python

0

orange 15 points

2020-10-05 16:33:03에 작성됨

댓글 입력

1 답변

1

nowp 9,214 points

2020-10-05 17:05:02에 작성됨

댓글 달기

pandas 그룹별 빈도 세는 방법 질문드립니다.

조회수 1582회

pandas

group-by

count

python

0

orange 15 points

2020-10-05 16:33:03에 작성됨

댓글 입력

1 답변

1

nowp 9,214 points

2020-10-05 17:05:02에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.