Feedback¶
- Several people use standard MSE as the score metric for the regression
- A lot of people forget to format the output of best alpha in the regression to be one digit
- Seemed some people (again) copied off of the solution notebook; it is a formative assignment!

# Function to categorize decision values into true positives (TP), false positives (FP),
# true negatives (TN), and false negatives (FN) based on model predictions
# Initialize empty lists for each category
tp = [] # True positive
tn = [] # True negative
fp = [] # False positive
fn = [] # False negative
# Get the decision function values for the dataset
res_lr = model.decision_function(X)
# Iterate over each decision value and actual label
for ii in range(len(res_lr)):
if res_lr[ii] > 0: # Positive decision threshold
if y[ii] == 1: # True label is positive
tp.append(res_lr[ii])
else: # True label is negative
fp.append(res_lr[ii])
else: # Negative decision threshold
if y[ii] == 1: # True label is positive
fn.append(res_lr[ii])
else: # True label is negative
tn.append(res_lr[ii])
# Return the categorized decision values
return tp, fp, tn, fn
# Defining a range of alpha values to test
alphas = np.logspace(-4, 4)
# Initializing a list to store the scores for each alpha
scores = []
# Looping through each alpha value
for al in alphas:
# Creating a Ridge regression model with the current alpha
rr = Ridge(alpha=al)
# Fitting the model on the training data
rr.fit(X_train, y_train)
# Predicting on the validation set
y_ridge_alpha.append(rr.predict(xval_res))
# Calculating the model's score on the test set
score = rr.score(Xtest_res, y_test)
# Storing the alpha value and its corresponding score
scores.append([al, score])
Variance Bias Trade-off¶
What is Bias and Variance?
Bias: The error introduced by approximating a real-world problem, which may be complex, by a much simpler model.
- High bias can cause an algorithm to miss relevant relations between features and target outputs (underfitting).
Variance: The error introduced by the model's sensitivity to fluctuations in the training set.
- High variance can cause overfitting, where the model captures noise in the data.
Let's suppose the relationship between $X$ and $Y$ is described by
$$ Y = \sum_i w_i^\star x^i + \epsilon$$
where $w_i^\star$ are the true parameters and $\epsilon$ is some noise.
We will try to model this with
$$ y = p(x) = \sum_i w_i x^i$$
where now the $w_i$ will be fitted to data.
We define
$$ \bar w_i = \langle w_i\rangle $$
as the expectation value of the parameter $w_i$ when fitted to multiple independent samples drawn from the true distribution.
We want to calculate the expected deviation of the fitted coefficients form the true coefficient:
$$ \langle (w_i-w_i^\star)^2\rangle$$
$$ \begin{eqnarray} \langle (w_i-w_i^\star)^2\rangle & = & \langle (w_i-\bar w_i +\bar w_i -w_i^\star)^2\rangle \\ &=& \langle (w_i-\bar w_i)^2\rangle + \langle (\bar w_i-w_i^\star)^2\rangle +2 \langle (w_i-\bar w_i)(\bar w_i -w_i^\star)\rangle \end{eqnarray}$$
The third term vanishes: $$ \langle (w_i-\bar w_i)(\bar w_i -w_i^\star)\rangle = \langle (w_i-\bar w_i)\rangle (\bar w_i -w_i^\star) =0 $$
So we have
$$ \langle (w_i-w_i^\star)^2\rangle = \langle (w_i-\bar w_i)^2\rangle + \langle (\bar w_i-w_i^\star)^2\rangle$$
The first term is the variance term and the second is the bias.
Example¶
To illustrate the variance-bias tradeoff we will be using different models to describe data with true relationship between the input $x$ and the outcome
$$Y(x) = 1+\frac15 x^2 + \epsilon \qquad \mbox{for}\qquad 0\leq x\leq 1\;, \quad 0 \; \mbox{otherwise}$$
Where $\epsilon$ is a gaussian noise. We will use the two models
$$ m_1(x) = a +bx$$
and
$$m_2(x)= a+bx +cx^2 +dx^3.$$
Using $m_1$, a model with too few parameters we get

For low dataset size we see the the variance dominates but as the number of training samples grows the bias dominates. Since the model is not capable of describing the truth the error is not diminishing even though the variance part of the error drops proportional to $1/\sqrt{N}$
For the second model where we have enough freedom to exactly describe the truth we get:
