Problem D
Perceptron Learning Algorithm

Implement the Perceptron Learning Algorithm (PLA) in to find a hypothesis in $d$ dimensions:

\[ h(x) = \textrm{sign}(w_0 + w_1 x_1 + w_2 x_2 + \ldots + w_ d x_ d). \]

In other words, given a set of example pairs $(x, y)$, where $x \in \mathbb {R}^ d$ and $y \in \{ -1,+1\} $, use the PLA to identify a set of good weights $w_0$ through $w_ d$ for classifying the given examples.

\includegraphics[width=0.5\textwidth ]{simple.png}
Figure 1: Plots of training set (top) and testing set (bottom) for the sample data. For this problem, the training set is given as input, and the testing set is listed as output (but not given to your program).


Input for your program is a training set of $n$ examples in $d$ dimensions.

The first line contains two integers $1 \le n \le 10\, 000$ and $1 \le d \le 10$. Following are $n$ lines, each containing $d$ real numbers and one integer. The first $d$ numbers are for values $x_1$ through $x_ d$, and the last is for the corresponding $y$ value, which is either $-1$ or $1$. Each real value has at most $5$ digits past the decimal point and is in the range $[-2 \cdot 10^9, 2 \cdot 10^9]$.


Implement the Perceptron Learning Algorithm (or the Pocket algorithm) to create a perceptron model that fits the data as well as possible. Output $d+1$ real values corresponding to the coefficients $w_0$ to $w_ d$ in the definition of $h(x)$ above.

Note that the sample output shown is actually the test data on which a model would be evaluated (see below). It is in the same format as the input. A reasonable output for the given sample input might be:

-722.0 8.0 13.0

which would give a score of $80.0$ on the test data, according to the metric below.


Each time your program runs, it will process a different training set. The output of your program will be used to construct a perceptron to evaluate on another set of data, called the testing set. The testing set comes from a distribution similar to the training set, but is not known to your program.

Assume that the test data has $m$ examples $\{ (x(1),y(1)), \ldots , (x(m),y(m))\} $. The score your model receives for each test case is based on the accuracy on these $m$ examples (times $100$):

\[ \frac{100}{m} \cdot \sum _{i=1}^ m [h(x(i)) = y(i)] \]

where $[\alpha ]$ is the indicator function (which is $1$ if the condition inside is true and $0$ otherwise).

The overall score for your submission is the average of the scores it receives for all hidden test cases. The sample data is excluded from this average.

Sample Input 1 Sample Output 1
6 2
77 66 1
86 14 1
50 21 -1
42 82 1
28 78 1
22 32 -1
10 2
23 47 -1
26 2 -1
96 78 1
1 65 -1
64 40 1
13 68 1
24 5 -1
69 32 1
21 19 -1
65 88 1
CPU Time limit 3 seconds
Memory limit 1024 MB
Statistics Show
Greg Hamerly
Source Baylor University Machine Learning
License Creative Commons License (cc by-sa)

Please log in to submit a solution to this problem

Log in