ย ์ด ๊ธ€์—์„œ ๋‹ค๋ฃฐ ๊ฒƒ์€ย https://www.kaggle.com/prathamtripathi/drug-classificationย ์ด ์‚ฌ์ดํŠธ์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ์„œ ์ด๊ฒŒ ์ตœ์ข…์ ์œผ๋กœ ๋ฌด์Šจ ์•ฝ์ธ์ง€ ์˜ˆ์ธกํ•  ๊ฒƒ ์ด์—์š”.ย k-์ตœ๊ทผ์ ‘์ด์›ƒ classifier ๋ชจ๋ธ์„ ํ™œ์šฉํ•  ๊ฒƒ ์ด๊ตฌ์šฉ..


๋ฐฐ์šธ ๊ฒƒ์€ ์•„๋ž˜์™€ ๊ฐ™์•„์š”

one-hot encodingย ํ•˜๋Š” ๋ฐฉ๋ฒ•

k-neareset neighbor classifier model ์‚ฌ์šฉ๋ฒ•


๋ฐ์ดํ„ฐ์…‹ ํŒŒ์•…์„ ํ•ด๋ณด์ฃต

์„ฑ๋ณ„, ๋‚˜์ด, ์ฝœ๋ ˆ์ŠคํŠธ๋กค, Na-Kย ๋“ฑ์ด ์žˆ๊ณ  ๊ฒฐ๋ก ์ ์œผ๋กœ ์•ฝ ์ด๋ฆ„์ด ์žˆ์–ด์šฉ

์–ด๋–ค ์„ฑ๋ณ„๊ณผ ๋‚˜์ด์— ๊ด€ํ•ด์„œ ํŠน์ •ํ•œ ์•ฝ์ด ์ฝœ๋ ˆ์ŠคํŠธ๋กค๊ณผ Na-K ๊ฐ’์„ ๊ทธ๋ฆฌ ํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๊ฒ ๋„ค์šฉ ใ…‡ใ……ใ…‡.

์ž ๊ทธ๋Ÿผ ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ์„ ๋Œ๋ฆด๊นŒ์š”.


์•ฝ์€ ์„ฑ๋ณ„๊ณผ ๋‚˜์ด์— ๋”ฐ๋ผ ํŽธ์ฐจ๊ฐ€ ๊ฐˆ๋ฆฌ๋Š” ๊ฒƒ์ด์ฃ ? ๋”ฐ๋ผ์„œ ์„ฑ๋ณ„์€ ์„ฑ๋ณ„๋Œ€๋กœ ๋ฌถ๊ณ , ๋‚˜์ด๋Š” 20๋Œ€, 30๋Œ€, 40๋Œ€๋กœ ๋ฌถ์–ด์„œ ์ง„ํ–‰ํ•˜๋ฉด ๋”์šฑ ์ข‹๊ฒ ์ง€์š”.

ํ•˜์ง€๋งŒ.... ๊ทธ๋ ‡๊ฒŒ ์งœ๋Š” ๊ฒƒ์€ ์˜ค๋Š˜ ๋ฐฐ์šธ ๊ฒƒ๊ณผ ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ์–ด์„œ ํŒจ์Šคํ• ๊ฒŒ์šฉ ใ…‡ใ……ใ…‡.


path = "/kaggle/input/drug-classification/drug200.csv" df = pd.read_csv(path) header = df.columns

feature_names = ['Age','Sex','BP','Cholesterol','Na_to_K'] data = df[feature_names].to_numpy() target = df[['Drug']].to_numpy()


์ด๋ ‡๊ฒŒ ํ•ด์„œ ๋ฐ์ดํ„ฐ๋ž‘ ์˜ˆ์ธกํ•  ํƒ€๊ฒŸ์„ ๋ถ„๋ฆฌํ–ˆ์–ด์šฉ. ๊ทธ๋Ÿฐ๋ฐ ์•„์ง ํ•  ๊ฒŒ ๋‚จ์•˜์ฃ . ๋ฐ”๋กœ ์›-ํ•ซ ์ธ์ฝ”๋”ฉ์ด์—์šฉ.

์‚ฌ์ดํ‚ท๋Ÿฐ ๋ชจ๋ธ์—์„œ ์š”๊ตฌํ•˜๋Š” ๊ฒƒ์ด๊ธฐ๋„ ํ•˜๊ณต.. ์ˆซ์ž๋Š” ๊ทธ๋ž˜ํ”„์ ์ธ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜์ฃ . ๋ฌธ์ž๋Š” ๊ณ„๋‹จ์  ํ‘œํ˜„์ธ๋ฐ ๋ฐ˜ํ•ด์„œ์šฉ. ์–ด์จŒ๋“  ์ˆซ์ž๋กœ ๋ฐ”๊พธ์–ด์ค์‹œ๋‹น.


one_hot_encoding_char = {"F":0, "M":1, "NORMAL":0, "HIGH":1,"LOW":2} def one_hot_encode(arr, char): result = arr for i in range(0, arr.size): result[i] = char[arr[i]] return result data[:,1] = one_hot_encode(data[:,1], one_hot_encoding_char) data[:,2] = one_hot_encode(data[:,2], one_hot_encoding_char) data[:,3] = one_hot_encode(data[:,3], one_hot_encoding_char)


์ด์ œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ , ํ…Œ์ŠคํŠธ์…‹๊ณผ ํŠธ๋ ˆ์ด๋‹์…‹์„ ๋ถ„๋ฆฌํ•ฉ์‹œ๋‹ค. ์ตœ์ข…์ €๊ธ๋กœ ๊ณผ์ ํ•ฉ์ธ์ง€ ์•„๋‹Œ์ง€ ํŒ๋ณ„ํ•˜๋Š” ๋ชฉํ‘œ๋ฅผ ์œ„ํ•ด์„œ์šฉ.


from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, target, random_state=0)

from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train)


์ด์›ƒ ์ˆ˜๋Š” ์ ๋‹นํžˆ ์ž˜ ์กฐ์ ˆํ•˜์„ธ์šฉ~~


predicted = knn.predict(X_test) score = 0 for i in range(predicted.shape[0]): boolean = predicted[i]==y_test[i] print("{} == {} ? {}".format(predicted[i], y_test[i], boolean)) if(boolean == True): score+=1 print("Score is {}%".format(score / predicted.shape[0] * 100))


๋!