Problem G
Support Vector Machine
In machine learning, the Support Vector Machine (SVM) is a popular classifier. Given an $n$-dimensional vector $x \in \mathbb {R}^ n$, an SVM produces a label $h(x) \in \{ -1,+1\} $ depending on where $x$ lies relative to an $n$-dimensional hyperplane.
An example application is email spam classification. We could convert the text of an email into a vector $x$ of counts of word occurrences in the email. Each vector element would represent the count for one dictionary word. Then a trained SVM could classify the email as ‘spam’ ($h(x)=+1$) or ‘non-spam’ ($h(x)=-1$).
Recall that for vectors $x, y \in \mathbb {R}^ n$,
-
$x_ i$ represents the $i$th element of $x$,
-
$x^ T y = \sum _{i=1}^ n x_ i y_ i$ is the vector inner product of $x$ and $y$, and
-
$||x|| = \sqrt {x^ Tx}$ is the length of vector $x$.
The parameters that define the SVM are $w \in \mathbb {R}^ n$ and $b \in \mathbb {R}$. The distance from a point $x$ to the hyperplane defined by $(w,b)$ is
\[ d(x) = (w^ Tx + b) / ||w|| \]so that the hyperplane is the set $H=\{ \, x\, |\, d(x) = 0\, \} $ and is $b / ||w||$ units from the origin at its closest. Note also that $w$ is orthogonal to the hyperplane. The SVM classifier is formally defined as
\[ h(x) = \left\{ \begin{aligned} & +1, \mbox{ if } d(x) \geq 0 \\ & -1, \mbox{ otherwise} \end{aligned} \right. \]For this problem you are given SVM parameters and a list of query points. For each query $x$, identify $d(x)$.
Input
The input contains one test case. The first line contains an integer $1\leq n\leq 1\, 000$. The second line contains $n+1$ floating-point values which are the values of $w_1, w_2, \ldots , w_ n$ and $b$. Each remaining line (up to the end of file) is a query vector $x$, given as $n$ floating point values (in the same order as the elements of $w$). There are at most $1\, 000$ queries. All floating point values in the input are in the range $[-1\, 000,1\, 000]$ with at most $5$ digits past the decimal point. At least one element of $w$ will be non-zero.
Output
For each query $x$, print the value of $d(x)$. The answer should be accurate to within $10^{-4}$ relative or absolute error.
Sample Input 1 | Sample Output 1 |
---|---|
2 -1 2 3 -3 1 8.71 -2.35 |
3.57771 -4.65549 |