바뀜

Scikit-learn (편집)

2021년 9월 2일 (목) 02:15 판

610 바이트 추가됨 , 2021년 9월 2일 (목) 02:15

→‎기초 사용법

30번째 줄: 30번째 줄:

from sklearn.model_selection import train_test_split

−

train_x, test_x, train_y, test_y = train_test_split(data, label, random_state=1)

+

train_x, test_x, train_y, test_y = train_test_split(data, label, test_size=0.2, train_size=0.8, ,random_state=1)

</syntaxhighlight>random_state는 random함수의 seed값을 고정하여 매번 같은 데이터를 얻게 하기 위함.(test용, 교육용에서 채점 등에 사용. 모델이 랜덤한 효과로 좋아지지 않도록.)

−

일반적으로 이 대신 test_size=0.2 를 사용한다.(20%가 test 데이터로)

+

일반적으로 이 대신 test_size=0.2 를 사용한다.(20%가 test 데이터로)(전체 값이 1이 아닌지, train_size와 같이 넣는다.)

|-

|학습한 매개변수 저장하기

51번째 줄: 51번째 줄:

|}

+

= 결측치 처리 =

+

{| class="wikitable"

+

!방법

+

!설명

+

|-

+

|결측치 채우기

+

|<syntaxhighlight lang="python">

+

from sklearn.impute import SimpleImputer

+

imputer = SimpleImputer()

+

imputed_train_X = pd.DataFrame(imputer.fit_transform(train_X)) # 학습용 자료에 결측치 채워넣기.

+

test_train_X = pd.DataFrame(imputer.fit_transform(test_X)) # 테스트용 데이터에 결측치 채워넣기.

+

</syntaxhighlight>결측치를 채우는 것만으로도 정밀도가 올라가는 경우가 많다.

+

|}

= 모델 검증 =

{| class="wikitable"

79번째 줄: 94번째 줄:

print("정답률 : ", right/total)

</syntaxhighlight>간단하게 모듈을 사용할 수도 있다.<syntaxhighlight lang="python">

−

from sklearn import ~~metrics~~ # 추가로 가져온다.

+

from sklearn.metrics import accuracy_score # 추가로 가져온다.

−

score = ~~metrics.~~accuracy_score(label, pre) # 레이블과 예측값을 넣는다.

+

score = accuracy_score(label, pre) # 레이블과 예측값을 넣는다.

print('정답률 : ', score)

</syntaxhighlight>

|}

−

== SVM 알고리즘 ==

사용할 수 있는 SVM 알고리즘은 다음과 같다. 객체를 만들 때 알고리즘 이름만 바꾸어주면 된다.

Sam

사무관, 인터페이스 관리자, 관리자, 교사

편집

1,419

번