Second half of Core Ia: Introduction To Machine Learning And Statistics
Same procedure as part 1 with lectures Tuesday and workshops on Friday
Course website: https://miscada-ml-2425.notes.dmaitre.phyip3.dur.ac.uk/

Notebook server: https://miscada-ml-2425.notebooks.danielmaitre.phyip3.dur.ac.uk/

I'm Will Yeadon and will be taking this course
Can contact me via email will.yeadon@durham.ac.uk
Office hours Monday 12:00 - 14:00 PH304 in Rochester Builing


Machine Learning is a subset of artificial intelligence that focuses on enabling computers to learn from data and improve their performance over time without being explicitly programmed.
Machine learning algorithms can be classified according to:



The boundary between the two types can be blurred:
Some problems can be approached using either classification or regression techniques, depending on the desired outcome.




Choosing the right learning mode depends on the dataset size and computational resources.
The entire training set is used for each iteration of the model optimization.
Advantages:
Disadvantages:
The model is updated incrementally as each new training example arrives.
Advantages:
Disadvantages:
The model is optimized using small subsets (batches) of the training set.
Advantages:
Disadvantages:
The choice between instance-based and model-based methods depends on the problem at hand.
Example: Predicting final grade \( g_4 \) of a student given their 1st, 2nd, and 3rd-year results \( g_1 \), \( g_2 \), and \( g_3 \).
This illustrates the fundamental difference in approach between the two methods.
Model-based algorithms are powerful for capturing underlying patterns in data.
Instance-based methods are simple and effective for certain types of problems.
In this lecture, we covered the following key concepts in Machine Learning:
Machine Learning is a broad and evolving field that bridges data, algorithms, and real-world applications. Understanding these concepts forms a solid foundation for deeper exploration into modern ML techniques and applications.
The Perceptron algorithm was the first "machine learning" algorithm.

We have a set of examples of items from two classes. We want to train a model to distinguish between the two classes for new items.
Training data is composed of $n_d$ samples, each of which has $n_f$ features. Each data sample's class label is encoded as $\pm 1$.
$$\mbox{features: }\quad x^{(i)}_j,\qquad 1\leq i\leq n_d\,,\qquad 1\leq j \leq n_f $$ $$\mbox{labels: }\quad y^{(i)} = \pm 1\,,\qquad \qquad 1\leq i\leq n_d $$We will train a linear model
$$ z(x, w) = w_0 + \sum_j x_j w_j $$with $n_f+1$ parameters
$$ w_j\,,\qquad 0 \leq j \leq n_f $$We will use the decision function
$$ \phi(z)= sgn(z) $$If $\phi(z)=+1$ our model predicts the object to belong to the first class, if $\phi(z)=-1$ the model predicts that the object belongs to the second class.
Now we need to set the parameters $w_j$ such that our pediction is as accurate as possible.
Go through all the training data set and for each sample:
Repeat the procedure until no more examples are wrongly predicted (or until a small enough amount fails)
Each repeat is called an epoch.
Some references give the perceptron algorithm with a fixed value of $\eta$.
It is convenient to add a $1$ as the 0-th component of the data vector $x$
$$\vec{x} = 1, x_1, ... , x_{n_f}$$so that the linear model can be written as
$$ z(x,w) = \vec{x}\cdot \vec{w} $$This allows to write the update rule for the perceptron algorithm as
$$ \vec{w} \rightarrow \vec{w} + \eta\, y^{(i)}\, \vec{x}^{(i)} $$The parameter $\eta$ in the update rule
$$\begin{eqnarray}w_j &\rightarrow& w_j + \eta y^{(i)} x^{(i)}_j\,,\quad 1\leq j \leq n_f \\ w_0 &\rightarrow& w_0 + \eta y^{(i)}\end{eqnarray}$$is a parameter of the algorithm, not a parameter of the model or of the problem. It is an example or hyperparameter of a learning algorithm.
The learning rate set how much a single misclassification should affect our parameters.
The perceptron algorithm is converging if the classes are linearly separable in the training set.
It might take a long time to converge



The algorithm stops when there is no errors left but often the demarcation line ends up very close to some of the instances, which leads to bad generalisation
