CSED101 – (Solution)

$ 30.00
Category:

Description

Programming Assignment #3
(75 points)

ํ—ˆ์„ฑํ™˜ (hursung1@postech.ac.kr)

โ—ผ ์ œ์ถœ ๋งˆ๊ฐ์ผ: 2021.11.30 23:59
โ—ผ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ: Windows Visual Studio 2019

โ—ผ ์ œ์ถœ๋ฌผ
๏‚Ÿ C ์†Œ์Šค ์ฝ”๋“œ ๋ฐ ์ถœ๋ ฅ ํŒŒ์ผ (mystring.h, mystring.c, assn3.c)
โžข ํ”„๋กœ๊ทธ๋žจ์˜ ์†Œ์Šค ์ฝ”๋“œ์— ์ฑ„์ ์ž์˜ ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•œ ์ฃผ์„์„ ๋ฐ˜๋“œ์‹œ ๋ถ™์—ฌ์ฃผ์„ธ์š”.
๏‚Ÿ ๋ณด๊ณ ์„œ ํŒŒ์ผ (.docx, .hwp ๋˜๋Š” .pdf; assn3.docx, assn3.hwp ๋˜๋Š” assn3.pdf)
โžข ๋ณด๊ณ ์„œ๋Š” AssnReadMe.pdf๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์ž‘์„ฑํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
โžข ๋ณด๊ณ ์„œ๋Š” Problem2์— ๋Œ€ํ•ด์„œ๋งŒ ์ž‘์„ฑํ•ด ์ฃผ์„ธ์š”.
โžข ๋ช…์˜ˆ ์„œ์•ฝ (Honor code): ํ‘œ์ง€์— ๋‹ค์Œ์˜ ์„œ์•ฝ์„ ๊ธฐ์ž…ํ•˜์—ฌ ์ œ์ถœํ•ด ์ฃผ์„ธ์š”: โ€œ๋‚˜๋Š” ์ด
ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ณผ์ œ๋ฅผ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์˜ ๋ถ€์ ์ ˆํ•œ ๋„์›€ ์—†์ด ์™„์ˆ˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.โ€ ๋ณด๊ณ ์„œ ํ‘œ์ง€์— ๋ช…์˜ˆ ์„œ์•ฝ์ด ๊ธฐ์ž…๋˜์–ด ์žˆ์ง€ ์•Š์€ ๊ณผ์ œ๋Š” ์ œ์ถœ๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.
โžข ์ž‘์„ฑํ•œ ์†Œ์Šค ์ฝ”๋“œ์™€ ๋ณด๊ณ ์„œ ํŒŒ์ผ์€ PLMS๋ฅผ ํ†ตํ•ด ์ œ์ถœํ•ด ์ฃผ์„ธ์š”.

โ—ผ ์ฃผ์˜ ์‚ฌํ•ญ
๏‚Ÿ ์ปดํŒŒ์ผ์ด๋‚˜ ์‹คํ–‰์ด ๋˜์ง€ ์•Š๋Š” ๊ณผ์ œ๋Š” 0์ ์œผ๋กœ ์ฑ„์ ๋ฉ๋‹ˆ๋‹ค.
๏‚Ÿ ์ œ์ถœ ๊ธฐํ•œ๋ณด๋‹ค ํ•˜๋ฃจ ๋Šฆ๊ฒŒ ์ œ์ถœํ•  ๋•Œ 20%, ์ดํ‹€ ๋Šฆ๊ฒŒ ์ œ์ถœํ•  ๋•Œ 40% ๊ฐ์ ๋ฉ๋‹ˆ๋‹ค. ์ œ์ถœ ๊ธฐ
ํ•œ๋ณด๋‹ค ์‚ฌํ˜ ์ด์ƒ ๋Šฆ์œผ๋ฉด ์ œ์ถœ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค (0์  ์ฒ˜๋ฆฌ).
๏‚Ÿ ๊ฐ ๋ฌธ์ œ์˜ ์ œํ•œ ์กฐ๊ฑด๊ณผ ์š”๊ตฌ ์‚ฌํ•ญ์„ ๋ฐ˜๋“œ์‹œ ์ง€์ผœ ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
๏‚Ÿ ๋ชจ๋“  ๋ฌธ์ œ์˜ ์ถœ๋ ฅ ํ˜•์‹์€ ์•„๋ž˜์˜ ์˜ˆ์‹œ๋“ค๊ณผ ๋™์ผํ•ด์•ผ ํ•˜๋ฉฐ, ๊ฐ™์ง€ ์•Š์„ ์‹œ ๊ฐ์ ๋ฉ๋‹ˆ๋‹ค.
๏‚Ÿ ๋ถ€์ •ํ–‰์œ„์— ๊ด€ํ•œ ๊ทœ์ •์€ POSTECH ์ „์ž์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€ ํ•™๋ถ€์œ„์›ํšŒ์˜ โ€œPOSTECH ์ „์ž์ปดํ“จํ„ฐ ๊ณตํ•™๋ถ€ ๋ถ€์ •ํ–‰์œ„ ์ •์˜โ€๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค (PLMS์˜ ๋ณธ ๊ณผ๋ชฉ ๊ณต์ง€์‚ฌํ•ญ์— ๋“ฑ๋ก๋œ ๊ธ€ ์ค‘, ์ œ๋ชฉ์ด
[document about cheating]์ธ ๊ธ€์— ์ฒจ๋ถ€๋˜์–ด ์žˆ๋Š” disciplinary.pdf๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”).
๏‚Ÿ ์ด๋ฒˆ ๊ณผ์ œ๋Š” ์ถ”๊ฐ€ ๊ธฐ๋Šฅ ๊ตฌํ˜„๊ณผ ๊ด€๋ จ๋œ ์ถ”๊ฐ€ ์ ์ˆ˜๊ฐ€ ๋”ฐ๋กœ ์—†์Šต๋‹ˆ๋‹ค.

[๋“ค์–ด๊ฐ€๊ธฐ ์ „]

1. ๋ฌธ์ž์—ด(string)
๏‚Ÿ ์—ฐ์†๋œ ๋ฌธ์ž๋“ค๋กœ C ์–ธ์–ด์—์„œ ๋ฌธ์ž์—ด ์•ž ๋’ค์— “๋ฅผ ์ด์šฉํ•œ๋‹ค.
๏‚Ÿ char ํ˜•์˜ 1์ฐจ์› ๋ฐฐ์—ด์„ ์ด์šฉํ•˜์—ฌ ๋ฌธ์ž์—ด์„ ์ €์žฅํ•œ๋‹ค.
๏‚Ÿ ๋ฐฐ์—ด์— ๋ฌธ์ž์—ด์„ ์ €์žฅํ•  ๋•Œ๋Š” ๋์— NULL ๋ฌธ์ž (”)๋ฅผ ๋„ฃ์–ด์„œ ํ‘œ์‹œํ•œ๋‹ค.

2. ์„ ์–ธ
๏‚Ÿ char str[] = “hello”;
์œ„์™€ ๊ฐ™์ด ์„ ์–ธ๊ณผ ๋™์‹œ์— ์ดˆ๊ธฐํ™”๋ฅผ ํ•˜๊ฒŒ ๋˜๋ฉด, ์ž๋™์œผ๋กœ ๋ฌธ์ž์—ด์˜ ๋์— NULL ๋ฌธ์ž๊ฐ€ ์ถ”๊ฐ€๋œ๋‹ค.
str
h e l l o

3. ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ
char str[100]; printf(“Enter the filename: “); scanf(“%s”, str); printf(“%s”, str);

<์‹คํ–‰ ์˜ˆ์‹œ> (์•„๋ž˜ ์˜ˆ์‹œ์˜ ๋ฐ‘์ค„์€ ์‚ฌ์šฉ์ž ์ž…๋ ฅ์— ํ•ด๋‹น)
Enter the filename: train.txt train.txt

โ—ผ Problem1: String functions (3์ )
(๋ฌธ์ œ)
C์˜ <string.h> ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ฌธ์ž์—ด ์ฒ˜๋ฆฌ ํ•จ์ˆ˜ ์ค‘ ์ผ๋ถ€๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ๋‹ค.

(์„ค๋ช…) ์ œ๊ณต๋œ assn3_p1.zipํŒŒ์ผ ๋‚ด์— mystring.h์™€ mystring.c๊ฐ€ ์กด์žฌํ•œ๋‹ค. mystring.h๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฌธ์ž์—ด ์ฒ˜๋ฆฌ ํ•จ์ˆ˜์˜ ์„ ์–ธ์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. (๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๊ณ  ๋ณ€๊ฒฝํ•˜์ง€ ๋ง ๊ฒƒ)
#pragma once
int mystrlen(char *str); int mystrcmp(char *str1, char *str2); char *mystrcpy(char *toStr, char *fromStr); char *mystrlower(char *str);
์œ„ ํ•จ์ˆ˜์˜ ๊ตฌํ˜„๋ถ€๋ฅผ ํฌํ•จํ•œ mystring.c๋ฅผ ์ž‘์„ฑํ•œ๋‹ค. ๊ฐ ํ•จ์ˆ˜์˜ ์ •์˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

(1) int mystrlen(char *str)
NULL๋ฌธ์ž๋ฅผ ์ œ์™ธํ•œ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด๋ฅผ return ํ•œ๋‹ค. ๋นˆ ๋ฌธ์ž์—ด(โ€œโ€)์˜ ๊ฒฝ์šฐ 0์„ return ํ•œ๋‹ค.
Ex)
printf(“%d “, mystrlen(“csed101”)); //๊ฒฐ๊ณผ: 7

(2) int mystrcmp(char *str1, char *str2)
๋ฌธ์ž์—ด str1๊ณผ str2์˜ ๋Œ€์†Œ๊ด€๊ณ„๋ฅผ ๋น„๊ตํ•œ๋‹ค. ๋น„๊ต ๊ธฐ์ค€์€ ASCII์ด๋‹ค. ๊ฐ ๋ฌธ์ž์—ด์˜ ์ฒซ ๋ฌธ์ž๋ถ€ํ„ฐ ๋น„๊ตํ•˜๋ฉฐ, ๋ฌธ์ž๊ฐ€ ๊ฐ™์œผ๋ฉด ๋‹ค์Œ ๋ฌธ์ž๋ฅผ ๋‹ค์‹œ ๋น„๊ตํ•œ๋‹ค. return ํ•  ๊ฐ’์€ ๋‹ค์Œ์˜ ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ •ํ•œ๋‹ค.
– ๋น„๊ต ์ค‘ str1์˜ ๋ฌธ์ž๊ฐ€ ์ž‘์€ ๊ฒฝ์šฐ -1, ํด ๊ฒฝ์šฐ 1์„ return
– ๋น„๊ตํ•˜๋Š” ๋‘ ๋ฌธ์ž์—ด์ด ์™„์ „ํžˆ ๊ฐ™์€ ๊ฒฝ์šฐ(๊ธธ์ด๊ฐ€ ๊ฐ™๊ณ  ๋ชจ๋“  ๋ฌธ์ž๊ฐ€ ๊ฐ™์Œ) 0์„ return
– ๋น„๊ต ์ค‘ ํ•œ ๋ฌธ์ž์—ด์ด ๋จผ์ € ๋์— ๋„๋‹ฌํ•˜๋ฉด ๊ทธ ๋ฌธ์ž์—ด์ด ๋” ์ž‘๋‹ค๊ณ  ํŒ๋‹จ
Ex)
string = “csed101” printf(“%d “, mystrcmp(string, “csed101”)); //๊ฒฐ๊ณผ: 0 printf(“%d “, mystrcmp(string, “csed103”)); //๊ฒฐ๊ณผ: -1 printf(“%d “, mystrcmp(string, “csed”)); //๊ฒฐ๊ณผ: 1 printf(“%d “, mystrcmp(string, “csed1010”)); //๊ฒฐ๊ณผ: -1 printf(“%d “, mystrcmp(string, “Csed101”)); //๊ฒฐ๊ณผ: 1

(3) char *mystrcpy(char *toStr, char *fromStr)
๋ฌธ์ž์—ด fromStr์„ ๋ฌธ์ž์—ด toStr์— ๋ณต์‚ฌํ•œ ํ›„, ๋ฌธ์ž์—ด toStr์˜ ์ฃผ์†Œ๋ฅผ return ํ•œ๋‹ค.
Ex)
char str[100]; printf(“%s “, mystrcpy(str, “Hello world!”)); // ๊ฒฐ๊ณผ: Hello world! printf(“%s “, mystrcpy(str, “CSED101”)); //๊ฒฐ๊ณผ: CSED101

(4) char *mystrlower(char *str)
๋ฌธ์ž์—ด str ๋‚ด์˜ ๋ชจ๋“  ์˜์–ด ์•ŒํŒŒ๋ฒณ์„ ์†Œ๋ฌธ์ž๋กœ ๋ฐ”๊พผ ํ›„, ๋ฌธ์ž์—ด str์˜ ์ฃผ์†Œ๋ฅผ return ํ•œ๋‹ค.
char str[100] = “Hello World! 123”;
printf(“%s “, mystrlower(str)); //๊ฒฐ๊ณผ: hello world! 123

โ—ผ Problem2: Naรฏve Bayes Classifier๋ฅผ ํ™œ์šฉํ•œ ์ŠคํŒธ๋ฌธ์ž ๋ถ„๋ฅ˜ (72์ )

[๋ชฉ์ ]
– ํฌ์ธํ„ฐ์™€ ๋™์  ํ• ๋‹น ์‚ฌ์šฉ๋ฒ•์„ ์ตํžŒ๋‹ค.
– ํŒŒ์ผ ์ž…์ถœ๋ ฅ์˜ ์‚ฌ์šฉ๋ฒ•์„ ์ตํžŒ๋‹ค. – String์˜ ์‚ฌ์šฉ๋ฒ•์„ ์ตํžŒ๋‹ค.

[๋ฐฐ๊ฒฝ์ง€์‹]
1. ๋“ค์–ด๊ฐ€๊ธฐ ์šฐ๋ฆฌ๊ฐ€ ๋ฉ”์ผ ํด๋ผ์ด์–ธํŠธ(ex. ๊ตฌ๊ธ€ ์ง€๋ฉ”์ผ, ๋„ค์ด๋ฒ„ ๋ฉ”์ผ ๋“ฑ)๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, ์ž๋™์œผ๋กœ ์ŠคํŒธ ๋ฉ”์ผ
๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ์ฐจ๋‹จํ•ด์ฃผ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. ์ž์—ฐ์–ธ์–ด์ฒ˜๋ฆฌ์˜ ํฐ task ์ค‘ ํ•˜๋‚˜์ธ text classification์€
์œ„ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ์–ด๋–ค ๋ฌธ์žฅ ํ˜น์€ ๊ธ€์„ ํŠน์ •ํ•œ ๋ฒ”์ฃผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ์ˆ˜๋งŽ
์€ ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•˜๋Š”๋ฐ(tf-idf, support vector machine, โ€ฆ), ์ด๋“ค ์ค‘ ํ•œ ๊ฐ€์ง€ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ธ
Naรฏve Bayes Classification์€ Bayesโ€™ Theorem์„ ๊ธฐ์ดˆ๋กœ ํ•˜์—ฌ text classification์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

2. Bayesโ€™ Theorem
๋‘ ํ™•๋ฅ  ๋ณ€์ˆ˜์˜ ๊ฒฐํ•ฉ ํ™•๋ฅ ์— ๋Œ€ํ•ด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์ด ์„ฑ๋ฆฝํ•œ๋‹ค.
(, ) = (|)() = (|)()
์—ฌ๊ธฐ์„œ, (|)() = (|)() ๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.
(|) = (|)()/()
์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ์ŠคํŒธ ํ•„ํ„ฐ๋ง์˜ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด ์œ„ ์‹์„ ์‚ดํŽด๋ณด์ž. (|)๋Š” ์–ด๋–ค ๋ฉ”์ผ()์ด ์ŠคํŒธ์ธ ์ง€ ์•„๋‹Œ์ง€()๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ํ™•๋ฅ ์ด๋‹ค. ๊ด€์ธก ๊ฐ€๋Šฅํ•œ ์ •๋ณด(ex. ๋ฉ”์ผ ๋ฐ์ดํ„ฐ ์…‹)๋ฅผ ํ†ตํ•ด ์šฐ๋ณ€์˜ ์‹์˜ ํ™•๋ฅ  ๊ฐ’๋“ค์„ ๊ตฌํ•˜์—ฌ, ์ขŒ๋ณ€์˜ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด Bayesโ€™ Theorem์˜ ๋ชฉ์ ์ด๋‹ค.
์šฐ๋ณ€์„ ์‚ดํŽด๋ณด๋ฉด, ๋จผ์ € ()๋Š” ํŠน์ • ๋ฒ”์ฃผ์˜ ๋“ฑ์žฅ ํ™•๋ฅ ์ด๋‹ค. ์ด๋Š” ํ˜„์žฌ ๋ฐ์ดํ„ฐ ์…‹ ๋‚ด์˜ ์ŠคํŒธ ๋ฉ” ์ผ์˜ ๋“ฑ์žฅ ํ™•๋ฅ ์„ ์‚ฌ์šฉํ•œ๋‹ค. (|)๋Š” ํŠน์ • ๋ฒ”์ฃผ์— ๋Œ€ํ•œ ๊ทธ ๋ฉ”์ผ์˜ ๋“ฑ์žฅ ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ()
๋Š” ํ•ด๋‹น ๋ฉ”์ผ์˜ ๋“ฑ์žฅ ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋‚ด์ง€๋งŒ, ๋ณดํ†ต์€ ๊ณ„์‚ฐ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์ด ๊ฐ’์„ ์ง์ ‘ ๊ตฌํ•˜์ง€ ์•Š๊ณ 
(|)()์™€์˜ ๋น„๋ก€ ๊ด€๊ณ„๋งŒ์„ ์ด์šฉํ•œ๋‹ค.

3. Naรฏve Bayes Classification
๋Š” ์—ฌ๋Ÿฌ ๋‹จ์–ด๋“ค(1, 2, โ€ฆ , )๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์„œ(ํ˜น์€ ๋ฌธ์žฅ)์ด๋ฏ€๋กœ, (|)๋Š” ์ด ๋‹จ์–ด๋“ค์˜ ์กฐ๊ฑด
๋ถ€ ๊ฒฐํ•ฉ ํ™•๋ฅ ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
(|) = (1, 2, โ€ฆ , |)
Naรฏve Bayes Classification์€ ์—ฌ๊ธฐ์„œ ๊ฐ ๋‹จ์–ด์˜ ๋“ฑ์žฅ ์‚ฌ๊ฑด์„ ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ์œผ๋กœ ๊ฐ€์ •ํ•œ๋‹ค. ๋…๋ฆฝ์˜
์ •์˜์— ๋”ฐ๋ผ ์‹์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€ํ˜•ํ•  ์ˆ˜ ์žˆ๋‹ค.
(|) = (1, 2, โ€ฆ , |) = โˆ (|)

์ฆ‰, ์šฐ๋ฆฌ๊ฐ€ ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ณ ์ž ํ•  ๋•Œ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.
=

์—ฌ๊ธฐ์„œ ๋Š” ์‹์˜ ๊ฐ’์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

4. ์˜ˆ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ training data๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž.
I will buy a new phone Ham
buy our product, change your old phone Spam
pay attention to our new product Spam
let’s discuss about our homework Ham
I think it is new one Ham

์–ด๋–ค ๋ฌธ์žฅ = โ€œbuy our new phoneโ€์„ Ham์ธ์ง€ Spam์ธ์ง€ ๋ถ„๋ฅ˜ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ฆ‰ (|)์™€ (|)๋ฅผ ๊ตฌํ•˜๊ณ ์ž ํ•œ๋‹ค. Naรฏve Bayes Classification ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ, ๋จผ์ € ๋‹ค์Œ์˜ ํ™•๋ฅ  ๊ฐ’๋“ค์„
๊ตฌํ•œ๋‹ค.
โ—ผ ()
– () = 3/5, () = 2/5
โ€ป (): ์ฃผ์–ด์ง„ ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ์ŠคํŒธ์˜ ๋น„์œจ์„ ๋งํ•จ. ์ฆ‰, 5๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ค‘์— 2๊ฐœ ๊ฐ€ ์ŠคํŒธ์ด๋ฏ€๋กœ 2/5 ๋กœ ๊ณ„์‚ฐ ํ•จ

โ—ผ (|)
– ( | ) = 1/3, ( | ) = 1/2
– ( | ) = 1/3, ( | ) = 1
– ( | ) = 2/3, ( | ) = 1/2
– (โ„Ž | ) = 1/3, (โ„Ž | ) = 1/2
โ€ป ‘new’๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ 5๊ฐœ ์ค‘์— 2๊ฐœ๋Š” ์ •์ƒ ๋ฉ”์ผ(Ham)์—, 1๊ฐœ๋Š” ์ŠคํŒธ์— ์žˆ์œผ
๋ฏ€๋กœ ( | ) = 2/3, ( | ) = 1/2 ๋กœ ๊ณ„์‚ฐ ํ•จ

์ด๋ฅผ ํ†ตํ•ด ๊ฐ’์„ ๊ตฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
– () (|) = 2/135
– () (|) = 1/20
์•„๋ž˜์˜ ๊ฐ’์ด ๋” ํฌ๋ฏ€๋กœ, x๋Š” Spam์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.
[์ฃผ์˜์‚ฌํ•ญ]
1. ํŒŒ์ผ ์ด๋ฆ„์€ “assn3.c”๋กœ ์ €์žฅํ•˜๊ณ , ๋ณด๊ณ ์„œ๋Š” problem2์— ๋Œ€ํ•ด์„œ๋งŒ ์ž‘์„ฑํ•œ๋‹ค.
2. ํ‘œ์ค€ ํ—ค๋” ํŒŒ์ผ <string.h>๋ฅผ includeํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
3. ์ „์—ญ ๋ณ€์ˆ˜, goto ๋ฌธ, ๊ตฌ์กฐ์ฒด ๋ฐ ์—ฐ๊ฒฐ๋ฆฌ์ŠคํŠธ(linked list)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.
4. ๋ชจ๋“  ๊ธฐ๋Šฅ์„ main ํ•จ์ˆ˜์— ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•œ ๊ฒฝ์šฐ ๊ฐ์  ์ฒ˜๋ฆฌํ•œ๋‹ค. ๊ธฐ๋Šฅ์ ์œผ๋กœ ๋…๋ฆฝ๋๊ฑฐ๋‚˜ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ๋Šฅ์€ ์‚ฌ์šฉ์ž ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ด์„œ ๊ตฌํ˜„ํ•œ๋‹ค.
5. ๋ฌธ์ œ ์„ค๋ช…์—์„œ ๋ฉ”๋ชจ๋ฆฌ ๋™์  ํ• ๋‹น์„ ์š”๊ตฌํ•˜๋Š” ๋ถ€๋ถ„์—์„œ ๋ฐฐ์—ด ์‚ฌ์šฉ์‹œ ๊ฐ์ ๋œ๋‹ค.
6. ๋ช…์‹œํ•œ ์—๋Ÿฌ ์ฒ˜๋ฆฌ ์™ธ์—๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š์•„๋„ ๋œ๋‹ค.
7. ๋ฌธ์ œ์˜ ์ถœ๋ ฅ ํ˜•์‹์€ “=-=-” ์ถœ๋ ฅ์„ ์ œ์™ธํ•˜๊ณ  ์•„๋ž˜์˜ ์˜ˆ์‹œ๋“ค๊ณผ ๋™์ผํ•˜๊ฒŒ ์ž‘์„ฑํ•ด ์ฃผ์„ธ์š”.

[๊ตฌํ˜„ ๊ธฐ๋Šฅ ์„ค๋ช…]
0. ํ”„๋กœ๊ทธ๋žจ ์‹œ์ž‘ ์‹คํ–‰ ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฉ”์ธ ํ™”๋ฉด์„ ์ถœ๋ ฅ ํ›„, ์‚ฌ์šฉ์ž ์ž…๋ ฅ์„ ์œ„ํ•ด ๋Œ€๊ธฐํ•œ๋‹ค.

์ด๋•Œ, ์‚ฌ์šฉ์ž๋Š” 3๊ฐœ์˜ ๋ช…๋ น์–ด(train, test, exit) ์ค‘ ํ•˜๋‚˜๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ํ•ด๋‹น ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. 3๊ฐœ์˜ ๋ช…๋ น์–ด ์™ธ ์ž…๋ ฅ์€ ์ ์ ˆํ•œ ์—๋Ÿฌ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ ํ›„, Sleep() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 1์ดˆ๊ฐ„ ๋Œ€๊ธฐํ•œ
ํ›„, system(“cls”)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ™”๋ฉด์„ ์ง€์šฐ๊ณ  ๋ฉ”์ธ ํ™”๋ฉด์„ ๋‹ค์‹œ ๋ณด์—ฌ์ค€๋‹ค.
(์˜ˆ์‹œ์˜ ๋ฐ‘์ค„์€ ์‚ฌ์šฉ์ž ์ž…๋ ฅ์— ํ•ด๋‹น)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: print
Error: Invalid input

๋ช…๋ น์–ด ์ž…๋ ฅ ์‹œ, ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š๊ณ  ๋™์ผํ•œ ๋ช…๋ น์–ด์˜ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•œ๋‹ค. ์˜ˆ๋ฅผ
๋“ค๋ฉด test, Test, TEST ๋“ฑ์€ ๋™์ผํ•œ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

1. Training
“train”์„ ์ž…๋ ฅํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํŒŒ์ผ ์ด๋ฆ„์„ ์ž…๋ ฅ ๋ฐ›์•„, ํ•ด๋‹น ํŒŒ์ผ์„ ์ฝ๋„๋ก ํ•œ๋‹ค.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: train
File name: train.txt

ํŒŒ์ผ ์ด๋ฆ„์„ ์ €์žฅํ•  ๋ฐฐ์—ด์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์„ ์–ธํ•˜๊ณ  ์‚ฌ์šฉํ•œ๋‹ค. ์ฐธ๊ณ ๋กœ, ์‹ค์ œ ์ฑ„์  ์‹œ์—๋Š” 20์ž ์ด๋‚ด์˜ ํŒŒ์ผ ์ด๋ฆ„์„ ์ž…๋ ฅํ•˜์—ฌ ํ…Œ์ŠคํŠธ ํ•  ์˜ˆ์ •์ด๋‹ค. ํŒŒ์ผ ์ด๋ฆ„์—๋Š” ๊ณต๋ฐฑ์ด ์—†๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.
#define MAX_FILE_NAME 30 char filename[MAX_FILE_NAME];

ํŒŒ์ผ์€ ๋ณธ ๊ณผ์ œ์™€ ํ•จ๊ป˜ ์ œ๊ณต๋œ train dataset ํฌ๋งท์„ ๊ฐ€์ง„๋‹ค. ์•„๋ž˜ ์˜ˆ์‹œ๋Š” ์ œ๊ณตํ•œ train.txt ํŒŒ์ผ์˜ ๋‚ด์šฉ์œผ๋กœ, Train dataset ํŒŒ์ผ์€ ๋ชจ๋“  line์ด ์•„๋ž˜์™€ ๊ฐ™์ด [label] [text] ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค( ๋Š” tab ๋ฌธ์ž๋ฅผ ์˜๋ฏธ).
train dataset์˜ ์˜ˆ์‹œ)
ham I will buy a new phone
spam buy our product, change your old phone
spam pay attention to our new product
ham let’s discuss about our homework
ham I think it is new one

๋งŒ์•ฝ ์ž…๋ ฅ ๋ฐ›์€ ํŒŒ์ผ์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์—๋Ÿฌ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. ํ˜„์žฌ ์ƒ ํƒœ๋ฅผ 1์ดˆ๊ฐ„ ์œ ์ง€ ํ›„, ํ™”๋ฉด์„ ์ง€์šฐ๊ณ  ๋ฉ”์ธ ํ™”๋ฉด์„ ๋‹ค์‹œ ๋ณด์—ฌ์ค€๋‹ค.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: train
Error: File does not exist

[label]์€ ham/spam ๋‘˜ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, [text]์˜ ๊ธธ์ด๋Š” 1000์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š”๋‹ค. ํ™•๋ฅ  ๊ฐ’์„ ๊ตฌ
ํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ์–ด์˜ ๋“ฑ์žฅ ๋นˆ๋„ ์ˆ˜๋ฅผ ๊ตฌํ•ด์•ผ ํ•˜๋Š”๋ฐ, ๋‹ค์Œ์˜ ์ฒ˜๋ฆฌ ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค.
– Problem 1์—์„œ ๊ตฌํ˜„ํ•œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [text] ๋‚ด ๋ชจ๋“  ์•ŒํŒŒ๋ฒณ์„ ์†Œ๋ฌธ์žํ™” ํ•œ๋‹ค.
โ€ป mystring.h๋ฅผ includeํ•˜์—ฌ ์‚ฌ์šฉํ•ด๋„ ๋˜๊ณ , ํ•ด๋‹น ์ฝ”๋“œ๋งŒ ๊ฐ€์ ธ์™€์„œ ์‚ฌ์šฉํ•ด๋„ ๋œ๋‹ค.
– [text]์— ํฌํ•จ๋œ ํŠน์ˆ˜ ๋ฌธ์ž, ์ฆ‰ ์•ŒํŒŒ๋ฒณ์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ฌธ์ž๋Š” ๊ณต๋ฐฑ์œผ๋กœ ์น˜ํ™˜ํ•œ๋‹ค.

์œ„ ๊ณผ์ •์ด ๋๋‚˜๋ฉด, ๊ฐ [label]์— ๋Œ€ํ•œ ๊ฐ ๋‹จ์–ด์˜ ๋“ฑ์žฅ ๋นˆ๋„ ์ˆ˜๋ฅผ ๊ตฌํ•œ๋‹ค. (๋‹จ์–ด-๋“ฑ์žฅ ํšŸ์ˆ˜) ์Œ์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ๋ฐฐ์—ด์„ ๊ตฌ์„ฑํ•ด์•ผ ํ•œ๋‹ค. ์ด๋•Œ ๋‹จ์–ด(char **words)์™€ ๋“ฑ์žฅ ํšŸ์ˆ˜(int *freq)๋ฅผ ์ €์žฅํ•  ๋ฐฐ์—ด์€ ๊ณ ์ • ๋ฐฐ์—ด์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , ๊ฐ๊ฐ ๋™์  ํ• ๋‹น ๋ฐ›์•„ ์ƒ ์„ฑํ•œ๋‹ค. ์ฆ‰ ํ•„์š”ํ•  ๋•Œ ๋งˆ๋‹ค ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์  ํ• ๋‹น/์žฌํ• ๋‹นํ•œ๋‹ค. ์ฒ˜์Œ ํ• ๋‹น ์‹œ์—๋Š” ๋ฐฐ์—ด์˜ ํฌ๊ธฐ ๋ฅผ 5๋งŒํผ ํ• ๋‹นํ•˜๊ณ , ๋ถ€์กฑํ•œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์žฌํ• ๋‹น ํ•˜๊ณ ์ž ํ•  ๋•Œ, ํ˜„์žฌ ํ• ๋‹น๋œ ํฌ๊ธฐ์˜ 2๋ฐฐ๋ฅผ ์žฌํ• 
๋‹นํ•œ๋‹ค(ex. ํ˜„์žฌ ํฌ๊ธฐ: 10 โ†’ ์žฌํ• ๋‹น ํ›„ ํฌ๊ธฐ: 20).

๊ทธ๋ฆผ 1 ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•์˜ ํ•œ ์˜ˆ

๋‹จ์–ด์˜ ๊ฒฝ์šฐ, ๊ฐ ๋‹จ์–ด๋งˆ๋‹ค ํ•„์š”ํ•œ ๊ธธ์ด(๋‹จ์–ด์˜ ๋ฌธ์ž์—ด ๊ธธ์ด + 1)๋งŒํผ ๋™์  ํ• ๋‹น ๋ฐ›์•„ ํ•ด๋‹น ๋‹จ์–ด๋ฅผ ์ €์žฅํ•˜๋„๋ก ํ•œ๋‹ค.
์•„๋ž˜๋Š” ๋ฌธ์ž์—ด ๊ธธ์ด๋งŒํผ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์  ํ• ๋‹นํ•œ ์˜ˆ์ œ์ด๋‹ค. ์ž…๋ ฅ ๋ฐ›์„ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด๋Š” ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ถฉ๋ถ„ํ•œ ํฌ๊ธฐ์˜ ๋ฐฐ์—ด(str)์„ ์„ ์–ธํ•˜์—ฌ ๊ทธ ๊ณณ์— ์ €์žฅํ•˜๊ณ , ์ž…๋ ฅ ๋ฐ›์€ ๋ฌธ์ž ์—ด์„ ์ €์žฅํ•  ๊ธธ์ด๋งŒํผ๋งŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์  ํ• ๋‹นํ•˜์—ฌ ๋‹จ์–ด๋ฅผ ์ €์žฅํ•œ๋‹ค.
#define MAX_WORD_LEN 30
โ€ฆ
char str[MAX_WORD_LEN]; // ๋ฌธ์ž์—ด์„ ์ž…๋ ฅ ๋ฐ›์„ ๋ฐฐ์—ด char *p; // ๋™์  ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ๋ฐ›์„ ํฌ์ธํ„ฐ
scanf(“%s”, str); // “hello”๋ฅผ ์ž…๋ ฅ ๋ฐ›์€ ๊ฒฝ์šฐ, “hello”๋ฅผ ์ €์žฅํ•˜๋Š” ๊ณต๊ฐ„์„
// ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‚ญ๋น„
p = (char *)malloc(sizeof(char) * (strlen(str)+1)); // str์— ์ €์žฅ๋œ ๋ฌธ์ž์—ด์˜ ํฌ๊ธฐ์™€
// ๋™์ผํ•œ ๊ณต๊ฐ„ ํ• ๋‹น
strcpy(p, str); // p๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” ๊ณต๊ฐ„์— ๋ฌธ์ž์—ด ๋ณต์‚ฌ ๊ธฐ๋Šฅ ์ˆ˜ํ–‰
โ€ฆ
free(p); // p์— ๋Œ€ํ•œ ์‚ฌ์šฉ์ด ๋ชจ๋‘ ๋๋‚˜๋ฉด, ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ํ•ด์ œ

๋‹จ์–ด ๋“ฑ์žฅ ๋นˆ๋„ ์ˆ˜๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐ ์žˆ์–ด ๋‹ค์Œ์˜ ์ฃผ์˜ ์‚ฌํ•ญ์„ ๋”ฐ๋ฅธ๋‹ค.
– ๋‹จ์–ด์˜ ๊ธธ์ด๊ฐ€ 1์ธ ๊ฒฝ์šฐ ๋นˆ๋„ ์ˆ˜๋ฅผ ์„ธ์ง€ ์•Š๋Š”๋‹ค.
– ๋‹จ์–ด์˜ ๊ธธ์ด๊ฐ€ 20์„ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ ๋นˆ๋„ ์ˆ˜๋ฅผ ์„ธ์ง€ ์•Š๋Š”๋‹ค.
– ์œ„ ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ๊ฒฝ์šฐ์—๋งŒ ๋นˆ๋„ ์ˆ˜๋ฅผ ๊ตฌํ•œ๋‹ค.
– ํ•˜๋‚˜์˜ ๋ฉ”์‹œ์ง€ ๋‚ด์— ๋‹จ์–ด๊ฐ€ ์—ฌ๋Ÿฌ ๋ฒˆ ๋“ฑ์žฅํ•˜๋”๋ผ๋„, ๋“ฑ์žฅ ์—ฌ๋ถ€๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ ํ•œ ๋ฒˆ๋งŒ
์„ผ๋‹ค.
ํ•™์Šต์ด ์ข…๋ฃŒ๋˜๋ฉด, ์œ„ ๊ณผ์ •์„ ํ†ตํ•ด ๊ตฌํ•œ ํ†ต๊ณ„๋ฅผ ๋‹จ์–ด ๊ธฐ์ค€์œผ๋กœ, ์•ŒํŒŒ๋ฒณ์ˆœ์œผ๋กœ ์ •๋ ฌ ํ›„ ์•„๋ž˜์˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ ํ™”๋ฉด์— ์ถœ๋ ฅํ•˜๊ณ  ํŒŒ์ผ๋กœ ์ €์žฅํ•œ๋‹ค. Training์„ ์œ„ํ•ด ํ• ๋‹นํ•œ ๋ฉ”๋ชจ๋ฆฌ๋Š” free() ํ•จ์ˆ˜ ๋ฅผ ํ†ตํ•ด ์ •์ƒ์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
์ถœ๋ ฅ ํŒŒ์ผ ์ด๋ฆ„์€ ‘stats.txt’๋กœ ํ•˜๋ฉฐ, ํ™”๋ฉด ๋˜๋Š” ํŒŒ์ผ์— ์ถœ๋ ฅํ•  ๋•Œ ๋‹ค์Œ์˜ ์ฃผ์˜ ์‚ฌํ•ญ์„ ๋”ฐ
๋ฅธ๋‹ค.
– ๊ฒฐ๊ณผ ์ถœ๋ ฅ์˜ ์ฒซ๋ฒˆ์งธ ์ค„์€ train dataset์—์„œ ํ•™์Šตํ•œ ์ •์ƒ ๋ฉ”์ผ(Ham)๊ณผ ์ŠคํŒธ ๋ฉ”์ผ์˜ ๊ฐœ์ˆ˜๋ฅผ ์•„๋ž˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ ๊ธฐ๋กํ•œ๋‹ค.
– ๋‘ ๋ฒˆ์งธ ์ค„๋ถ€ํ„ฐ ์ถœ๋ ฅ ํ˜•์‹์€ [๋‹จ์–ด],[Ham์˜ ๊ฒฝ์šฐ ๋“ฑ์žฅ ๋นˆ๋„ ์ˆ˜],[Spam์˜ ๊ฒฝ์šฐ ๋“ฑ์žฅ ๋นˆ๋„์ˆ˜] ๋กœ ํ•œ๋‹ค.
๋งŒ์•ฝ ์–ด๋А ์ชฝ์— ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๋ผ๋ฉด 0์œผ๋กœ ์ถœ๋ ฅํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, “can”์ด๋ผ ๋Š” ๋‹จ์–ด๊ฐ€ Ham์˜ ๊ฒฝ์šฐ 10๋ฒˆ ๋“ฑ์žฅํ–ˆ์œผ๋‚˜, Spam์˜ ๊ฒฝ์šฐ ๋“ฑ์žฅํ•˜์ง€ ์•Š์•˜๋‹ค๋ฉด,
“can,10,0″์œผ๋กœ ์ถœ๋ ฅํ•œ๋‹ค.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ์ œ๊ณตํ•œ ‘train.txt’ ํŒŒ์ผ์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต ํ›„, ๊ทธ ํ†ต๊ณ„๋ฅผ ํ™”๋ฉด์— ์ถœ๋ ฅํ•œ ์˜ˆ์‹œ์ด๋‹ค.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: train
File name: train.txt

After training…
Ham:3, Spam:2 ํ†ต๊ณ„ ๊ฒฐ๊ณผ ์ถœ๋ ฅ ์‹œ์ž‘ ๋ถ€๋ถ„ about,1,0 attention,0,1 buy,1,1 change,0,1 discuss,1,0 homework,1,0 is,1,0 it,1,0 let,1,0 new,2,1 old,0,1 one,1,0 our,1,2 pay,0,1 phone,1,1 product,0,2 think,1,0 to,0,1 will,1,0 your,0,1

ํ†ต๊ณ„๋ฅผ ์ถœ๋ ฅํ•œ ํ›„, ์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ์„ ๊ธฐ๋‹ค๋ฆฐ๋‹ค. [Enter]ํ‚ค๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ™”๋ฉด์„ ์ง€์šฐ๊ณ , ๋ฉ”์ธ ํ™”๋ฉด์„ ๋‹ค์‹œ ๋ณด์—ฌ์ค€๋‹ค.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ์ œ๊ณตํ•œ ‘train.txt’ ํŒŒ์ผ์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต ํ›„, ๊ทธ ํ†ต๊ณ„๋ฅผ ํŒŒ์ผ
(‘stats.txt’)๋กœ ์ถœ๋ ฅํ•œ ์˜ˆ์‹œ์ด๋‹ค.
Ham:3, Spam:2 about,1,0 attention,0,1 buy,1,1 change,0,1 discuss,1,0 homework,1,0 is,1,0 it,1,0 let,1,0 new,2,1 old,0,1 one,1,0 our,1,2 pay,0,1 phone,1,1 product,0,2 think,1,0 to,0,1 will,1,0 your,0,1

2. Test
“test”๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฌธ์žฅ์˜ ์ž…๋ ฅ์„ ๊ธฐ๋‹ค๋ฆฐ๋‹ค.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: test Enter a message:

๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๋ฉด, ํ”„๋กœ๊ทธ๋žจ์€ training ๋‹จ๊ณ„์—์„œ ๊ตฌํ•œ ํ†ต๊ณ„(‘stats.txt’) ํŒŒ์ผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด๋‹น ๋ฌธ์žฅ์ด Spam์ธ์ง€ ์•„๋‹Œ์ง€๋ฅผ ํŒ๋ณ„ํ•œ๋‹ค. ๋‹จ, ๋ฌธ์žฅ์˜ ๊ธธ์ด๋Š” 1000์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š”๋‹ค.
๋‹ค์Œ์˜ ์ฃผ์˜์‚ฌํ•ญ์„ ๋”ฐ๋ฅธ๋‹ค.
– Training ๋‹จ๊ณ„์—์„œ ์ฒ˜๋ฆฌํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ๋ฌธ์žฅ ๋‚ด ํŠน์ˆ˜ ๋ฌธ์ž๋ฅผ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๋ฉฐ, ๋ชจ๋“  ์•ŒํŒŒ๋ฒณ์€ ์†Œ๋ฌธ์žํ™” ํ•œ๋‹ค.
– ๋ฌธ์žฅ ๋‚ด ๋‹จ์–ด๊ฐ€ ํ†ต๊ณ„์— ํฌํ•จ๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, ํ•ด๋‹น ๋‹จ์–ด์— ๋Œ€ํ•œ ํ™•๋ฅ  ๊ฐ’์€ ๊ณ„์‚ฐ์— ํฌ ํ•จํ•˜์ง€ ์•Š๋Š”๋‹ค.
– ํ†ต๊ณ„์— ํฌํ•จ๋˜์—ˆ์œผ๋‚˜ ๊ฐœ์ˆ˜๊ฐ€ 0์ธ ๊ฒฝ์šฐ ํ™•๋ฅ  ๊ฐ’์ด 0์ด ๋˜๋ฏ€๋กœ, ์ด ๊ฒฝ์šฐ ํ™•๋ฅ  ๊ฐ’์„
0.1/(Spam์ธ ๊ฒฝ์šฐ์˜ ์ˆ˜) ๋˜๋Š” 0.1/(Ham์ธ ๊ฒฝ์šฐ์˜ ์ˆ˜)๋กœ ํ•œ๋‹ค.
– test๋Š” training์„ ๊ฑฐ์น˜์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ๋‹ค. ์ด ๊ฒฝ์šฐ test์— ์‚ฌ์šฉํ•  ํŒŒ์ผ ์˜ ์ด๋ฆ„์€ โ€œstats.txtโ€์ด์–ด์•ผ ํ•˜๊ณ , training ๋‹จ๊ณ„์—์„œ ์ƒ์„ฑํ•œ ํŒŒ์ผ๊ณผ ๋™์ผํ•œ ํฌ
๋งท์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•œ๋‹ค.

ํŒŒ์ผ์„ ์ฝ๋Š” ๊ฒƒ์€ ์‚ฌ์šฉ์ž๋กœ๋ถ€ํ„ฐ ๋ฌธ์žฅ์„ ์ž…๋ ฅ ๋ฐ›๊ธฐ ์ „์— ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•œ๋‹ค. ํŒŒ์ผ์—์„œ ์ •๋ณด๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, ๊ณ ์ • ํฌ๊ธฐ์˜ ๋ฐฐ์—ด์— ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ training ๋‹จ๊ณ„์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํฌ์ธํ„ฐ๋ฅผ ์„ ์–ธํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋™์ ์œผ๋กœ ํ• ๋‹น/์žฌํ• ๋‹นํ•˜์—ฌ์•ผ ํ•œ๋‹ค. (์ด ๋•Œ, ํ•„์š”ํ•˜๋ฉด 10์ง„ ๋ฌธ์ž์—ด์„ int ํ˜• ์ •์ˆ˜ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” atoi()ํ•จ์ˆ˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅ, ์‚ฌ์šฉ์‹œ stdlib.h๋ฅผ ํฌํ•จ์‹œํ‚ฌ ๊ฒƒ)
๋งŒ์•ฝ ์ž…๋ ฅ ๋ฐ›์€ ๋ฌธ์žฅ์ด
– Spam์ธ ๊ฒƒ์œผ๋กœ ํŒ๋ณ„๋˜๋ฉด: โ€œThis message is SPAMโ€์„ ์ถœ๋ ฅํ•œ๋‹ค.
– Ham์ธ ๊ฒƒ์œผ๋กœ ํŒ๋ณ„๋˜๋ฉด: โ€œThis message is HAMโ€์„ ์ถœ๋ ฅํ•œ๋‹ค.

Spam ํŒ๋ณ„ ๋ฉ”์‹œ์ง€์™€ ํ•จ๊ป˜ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. test๋ฅผ ์œ„ํ•ด ํ• ๋‹นํ•œ ๋ฉ”๋ชจ๋ฆฌ๋Š” free() ํ•จ์ˆ˜ ๋ฅผ ํ†ตํ•ด ์ •์ƒ์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
์•„๋ž˜์˜ ์˜ˆ์‹œ๋Š” ์ œ๊ณตํ•œ train.txt์˜ ์˜ˆ์‹œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ•œ stats.txt๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌํ•œ ํ†ต๊ณ„ ์ •๋ณด์ด๋‹ค(๊ณ„์‚ฐ ๊ณผ์ •์€ 4~5ํŽ˜์ด์ง€์˜ [๋ฐฐ๊ฒฝ์ง€์‹]๊ณผ ์˜ˆ์‹œ๋ฅผ ์ฐธ๊ณ ). ๊ณ„์‚ฐ ๊ฐ’์€ ์†Œ์ˆ˜์  ์…‹์งธ ์ž๋ฆฌ
๊นŒ์ง€ ์ถœ๋ ฅํ•œ๋‹ค. (ex. 0.1 โ†’ 0.100, 0.3126 โ†’ 0.313)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: test
Enter a message: buy our new phone

P(Ham) = 0.600, P(Spam) = 0.400
P(buy | Ham) = 0.333, P(buy | Spam) = 0.500
P(our | Ham) = 0.333, P(our | Spam) = 1.000
P(new | Ham) = 0.667, P(new | Spam) = 0.500
P(phone | Ham) = 0.333, P(phone | Spam) = 0.500

P( Ham | ‘buy our new phone’): 0.015
P(Spam | ‘buy our new phone’): 0.050

This message is SPAM

์œ„ ํ™”๋ฉด์—์„œ ์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ์„ ๊ธฐ๋‹ค๋ฆฐ๋‹ค. [Enter]ํ‚ค๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ™”๋ฉด์„ ์ง€์šด ํ›„ ๋‹ค์‹œ ๋ฉ”์ธ ํ™”
๋ฉด์„ ๋ณด์—ฌ์ค€๋‹ค.
๋งŒ์•ฝ training์„ ํ•˜์ง€ ์•Š์€ ์ƒํƒœ์—์„œ, test ๋ช…๋ น์„ ์ž…๋ ฅํ•˜์˜€์œผ๋‚˜ โ€œstats.txtโ€ ํŒŒ์ผ์ด ์—†๋Š” ๊ฒฝ์šฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์—๋Ÿฌ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค. train์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ 1์ดˆ ๊ฐ„ ๋Œ€๊ธฐํ•œ ํ›„, ํ™” ๋ฉด์„ ์ง€์šด ๋’ค ๋‹ค์‹œ ๋ฉ”์ธ ํ™”๋ฉด์„ ๋ณด์—ฌ์ค€๋‹ค.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naive Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: test
Error: File does not exist

3. ํ”„๋กœ๊ทธ๋žจ ์ข…๋ฃŒ
โ€œexitโ€๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ”„๋กœ๊ทธ๋žจ์„ ์ข…๋ฃŒํ•œ๋‹ค. ์ด ๋•Œ ๋™์ ์œผ๋กœ ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ๋Š” free() ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ •์ƒ์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•˜์—ฌ์•ผ ํ•œ๋‹ค.

โ€ป ๋‹ค๋ฅธ training data๋ฅผ ์‚ฌ์šฉํ•œ ์˜ˆ
Training data)
ham are you free tomorrow? why don’t we go ice rink spam U R entitled to Update to the latest color mobiles with camera for Free!
ham i see ๐Ÿ™‚ see you later spam You have won a 1 week FREE membership in our $100,000 Prize!
spam The New Jersey Devils and the Detroit Red Wings play Ice Hockey. Correct or Incorrect? Reply END SPTV ham did you buy notebooks? If not i will buy them spam you are awarded with a $1500 Bonus Prize, call 09066364589
stats.txt)
Ham: 3, Spam: 4 and,0,1 are,1,1 awarded,0,1 bonus,0,1 buy,1,0 call,0,1 camera,0,1 color,0,1 correct,0,1 detroit,0,1 devils,0,1 did,1,0 don,1,0 end,0,1 entitled,0,1 for,0,1 free,1,2 go,1,0 have,0,1 hockey,0,1 ice,1,1 if,1,0 in,0,1 incorrect,0,1 jersey,0,1 later,1,0 latest,0,1 membership,0,1 mobiles,0,1 new,0,1 not,1,0 notebooks,1,0 or,0,1 our,0,1 play,0,1 prize,0,2 red,0,1 reply,0,1 rink,1,0 see,1,0 sptv,0,1 the,0,2 them,1,0 to,0,1 tomorrow,1,0 update,0,1 we,1,0 week,0,1 why,1,0 will,1,0 wings,0,1 with,0,2 won,0,1 you,3,2

Test ๊ฒฐ๊ณผ)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CSED 101
Assignment 3

Naรฏve Bayesian Classifier for Spam Filtering
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Command: test
Enter a message: you got a $1000 lottery! Call us ๐Ÿ™‚

P(Ham) = 0.429, P(Spam) = 0.571

P(you | Ham) = 1.000, P(you | Spam) = 0.500
P(call | Ham) = 0.033, P(call | Spam) = 0.250

P( Ham | ‘you got a $1000 lottery! Call us :)’): 0.014
P(Spam | ‘you got a $1000 lottery! Call us :)’): 0.071

This message is SPAM

[์ฐธ๊ณ ] ๊ณผ์ œ ์ˆ˜ํ–‰ ์‹œ ๋‹ค์Œ ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

1. ๊ณต๋ฐฑ์„ ํฌํ•จํ•œ ๋ฌธ์ž์—ด ์ž…๋ ฅ ๊ณต๋ฐฑ์ด ์žˆ๋Š” ๋ฌธ์ž์—ด์„ ์ž…๋ ฅ ๋ฐ›์„ ๋•Œ fgets() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
char *fgets(char *str, int num, FILE *stream)
: stream์—์„œ ๋ฌธ์ž์—ด์„ ์ตœ๋Œ€ num-1๊ฐœ๋งŒํผ ๋ฐ›์•„์„œ str์ด ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•œ๋‹ค. ์ด ํฌํ•จ๋œ ๊ฒฝ์šฐ ๊ทธ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด๊ฐ€ num๋ณด๋‹ค ์ž‘๋”๋ผ๋„ ๋” ์ฝ์ง€ ์•Š๋Š”๋‹ค. stdio.h์— ํฌํ•จ.
Ex)
char str[10];
fgets(str, 10, stdin); // stdin์€ ํ‚ค๋ณด๋“œ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ ๋ฐ›๋Š” ๊ฒƒ์„ ์˜๋ฏธ printf(“%s “, str); // ์ž…๋ ฅ ๋ฐ›์€ ๋ฌธ์ž์—ด์„ ์ถœ๋ ฅ

FILE *fp = fopen(“file.txt”, “r”);
fgets(str, 10, fp); // ํŒŒ์ผ์—์„œ๋ถ€ํ„ฐ ํ•œ ๋ผ์ธ์„ ์ฝ์Œ printf(“%s “, str); // ํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ์ฝ์€ ๋ฌธ์ž์—ด์„ ์ถœ๋ ฅ

2. ๋ฌธ์ž์—ด ๋ถ„ํ•  ํŠน์ • ๋ฌธ์ž๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฌธ์ž์—ด์„ ๋ถ„ํ• ํ•˜๊ณ ์ž ํ•  ๋•Œ, strtok() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
char *strtok(char *str, const char *delim)
: ๋ฌธ์ž์—ด str์„ delim์— ํฌํ•จ๋œ ๋ฌธ์ž๋“ค๋กœ ๋ถ„๋ฆฌ(tokenize)ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด “Hello/world!”๋ผ ๋Š” ๋ฌธ์ž์—ด์— ๋Œ€ํ•ด “/”๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฆฌํ•˜๋ฉด “Hello”์™€ “world!”๋กœ ๋‚˜๋‰˜์–ด์ง„๋‹ค. ์ด ๋•Œ ๊ธฐ์ค€์ด ๋˜๋Š” ๋ฌธ์ž(“/”)๋ฅผ delimiter, ๋‚˜๋ˆ„์–ด์ง„ ๋ฌธ์ž๋“ค์„ token์ด๋ผ๊ณ  ํ•œ๋‹ค. string.h์— ํฌํ•จ.
Ex)
char str[100]; char *token1, *token2, *token3; strcpy(str, “Tokenization/Test/String”); token1 = strtok(str, “/”); // token1 = “Tokenization” token2 = strtok(NULL, “/”); // token2 = “Test” token3 = strtok(NULL, “/”); // token3 = “String”

Reviews

There are no reviews yet.

Be the first to review “CSED101 – (Solution)”

Your email address will not be published. Required fields are marked *