## Description

1.1 Spam Detection

Naive Bayes classification uses the formula , where p(t) is the prior and p(x|t) is the likelihood. To find if ”Your gift” is spam, we must calculate p(spam|”your gift”) = p(s|yg), which is equal to .

Calculating the prior probabilities, we get 5 and

Calculating the likelihood, we get = 0. Now, if we try using the normal Bayes classification formula, we will get an undefined answer, as we would have to divide by 0. Instead, we can discard the denominator and simnply calculate whether spam or not spam has a bigger probability, using p(s|yg) = p(your|s)p(gift|s) = p(y|s)p(g|s) and p(!s|yg) = p(y|!s)p(g|!s). To use this, we are assuming all words are independent of each other, so we are not considering sentences at all. To avoid results of 0, we will add 1 to every probability, which should not affect the final classification. The probabilities are as follows:

) =

) =

Since p(”your gift”|spam) > p(”your gift|not spam), ”your gift” would be classified as spam.

3.2 Feature Scaling

1. The change depends on the data and which column is changed. On average, it exacerbated the variance in the perturbed area, and made the variation proportion unusually high in the first component.

2. This change also depends on the data and which column is changed.

3. This does not affect the number of linearly independent components, as that is the property of linearly independent components; they are linearly independent of each other, regardless of any scalar multiplied to any column.

4.2 About RMSE

The RMSE should be as close to 0 as possible. Ideally, if pred and label are the same, RMSE would indeed be 0.

4.3 Testing: ridge regression

## Reviews

There are no reviews yet.