공출현 매트릭스 형성 관련 질문

Question

공출현 매트릭스 형성 관련 질문

조회수 452회

0

싫어요

안녕하세요 텍스트 분석을 진행하는 중에 질문을 드립니다. nlp를 마치고 단어간 공출현을 기반으로 공출현 매트릭스를 형성하고자 합니다. 아래와 같은 코드를 사용했습니다. 이전까지는 잘 사용했는데 갑자기 되질 않아서 질문을 올립니다. 봐주셔서 감사합니다.

import collections
import pandas as pd
import numpy as np


def co_occurrence(sentences, window_size):
    d = collections.defaultdict(int)
    vocab = set()
    for text in sentences:
        # preprocessing (use tokenizer instead)
        text = text.lower().split()
        # iterate over sentences
        for i in range(len(text)):
            token = text[i]
            vocab.add(token)  # add to vocab
            next_token = text[i+1 : i+1+window_size]
            for t in next_token:
                key = tuple( sorted([t, token]) )
                d[key] += 1

    # formulate the dictionary into dataframe
    vocab = sorted(vocab) # sort vocab
    df = pd.DataFrame(data=np.zeros((len(vocab), len(vocab)), dtype=np.int16),
                      index=vocab,
                      columns=vocab)
    for key, value in d.items():
        df.at[key[0], key[1]] = value
        df.at[key[1], key[0]] = value
    return df


df = pd.read_csv('data.csv', encoding = 'utf-8')

# http://naver.me/x1eYJPQ2 << 이곳에 파일을 올려두었습니다

df['nlp'] = df["nlp"].str.replace("'", "") 
df['nlp'] = df["nlp"].str.replace(",", "") 
df['nlp'] = df["nlp"].str.replace("･", "")
df['nlp'] = df["nlp"].str.replace("・", "")
df['nlp'] = df["nlp"].str.replace("[", "") 
df['nlp'] = df["nlp"].str.replace("]", "") 
corpus = df.corpus.tolist()


df = co_occurrence(corpus, 3)

df.to_csv('co_occurrence.csv', encoding = 'utf-8')

(•́ ✖ •̀)
알 수 없는 사용자

댓글 입력

score 0 · Accepted Answer

저는 어떻게든 했네요.

인코딩 변경

# df = pd.read_csv('data2.csv', encoding = 'utf-8')
df = pd.read_csv('data2.csv', encoding = 'euc-kr')

없는 칼럼명을 있는 것으로 치환

# corpus = df.corpus.tolist()
corpus = df.nlp.tolist()

공출현 매트릭스 형성 관련 질문

조회수 452회

0

(•́ ✖ •̀)
알 수 없는 사용자

댓글 입력

1 답변

0

광자 552 points

2021-06-19 19:53:51에 작성됨

댓글 달기

공출현 매트릭스 형성 관련 질문

조회수 452회

0

(•́ ✖ •̀)알 수 없는 사용자

댓글 입력

1 답변

0

광자 552 points

2021-06-19 19:53:51에 작성됨

댓글 달기

답변을 하려면 로그인이 필요합니다.

(•́ ✖ •̀)
알 수 없는 사용자