Problem C
Support Vector Machine

In machine learning, the Support Vector Machine (SVM) is a popular classifier. Given an $n$-dimensional vector $x \in \mathbb {R}^ n$, an SVM produces a label $h(x) \in \{ -1,+1\} $ depending on where $x$ lies relative to an $n$-dimensional hyperplane.

An example application is email spam classification. We could convert the text of an email into a vector $x$ of counts of word occurrences in the email. Each vector element would represent the count for one dictionary word. Then a trained SVM could classify the email as ‘spam’ ($h(x)=+1$) or ‘non-spam’ ($h(x)=-1$).

Recall that for vectors $x, y \in \mathbb {R}^ n$,

  • $x_ i$ represents the $i$th element of $x$,

  • $x^ T y = \sum _{i=1}^ n x_ i y_ i$ is the vector inner product of $x$ and $y$, and

  • $||x|| = \sqrt {x^ Tx}$ is the length of vector $x$.

The parameters that define the SVM are $w \in \mathbb {R}^ n$ and $b \in \mathbb {R}$. The distance from a point $x$ to the hyperplane defined by $(w,b)$ is

\[ d(x) = (w^ Tx + b) / ||w|| \]

so that the hyperplane is the set $H=\{ \, x\, |\, d(x) = 0\, \} $ and is $b / ||w||$ units from the origin at its closest. Note also that $w$ is orthogonal to the hyperplane. The SVM classifier is formally defined as

\[ h(x) = \left\{ \begin{aligned} & +1, \mbox{ if } d(x) \geq 0 \\ & -1, \mbox{ otherwise} \end{aligned} \right. \]

For this problem you are given SVM parameters and a list of query points. For each query $x$, identify $d(x)$.

\includegraphics[width=0.3\textwidth ]{example}
Figure 1: Rendering of the hyperplane (thick dashed line) and query points (circles) for the sample input. Other dashed lines connected each query point to its closest point on the hyperplane.


The input contains one test case. The first line contains an integer $1\leq n\leq 1\, 000$. The second line contains $n+1$ floating-point values which are the values of $w_1, w_2, \ldots , w_ n$ and $b$. Each remaining line (up to the end of file) is a query vector $x$, given as $n$ floating point values (in the same order as the elements of $w$). There are at most $1\, 000$ queries. All floating point values in the input are in the range $[-1\, 000,1\, 000]$ with at most $5$ digits past the decimal point. At least one element of $w$ will be non-zero.


For each query $x$, print the value of $d(x)$. The answer should be accurate to within $10^{-4}$ relative or absolute error.

Sample Input 1 Sample Output 1
-1 2 3
-3 1
8.71 -2.35

Please log in to submit a solution to this problem

Log in