Take a look at the image below. What does it look like? A car? A bike?

It’s very hard to tell, isn’t it? That’s because it has elements of both a bike and a car. Surprisingly, the image above is one of a car, but regardless of what you chose, for the majority of people, there isn’t a clear-cut answer because this is an extreme example of a car.

Now, imagine if a computer had to figure out if this was a car. Yeah, not happening, right? Well, actually it’s possible with the help of Support Vector Machines.

**What is a Support Vector Machine?**

* A Support Vector Machine can be thought of as a machine learning model that runs on a computer and classifies between two different labels (in this case, cars and bikes) using the help of support vectors*.

What are support vectors, you ask?

In a data classification problem, the support vectors are the pieces of data inputted in the model that are sort of wishy-washy in the sense that they can be interpreted as either of the two labels, but are already given a label. They are the extreme cases of the labels, the cars that don’t look like cars, and the bikes that don’t look bikes.

A support vector machine algorithm decides a boundary line using these support vectors which makes them extremely fast in comparison to other models as they don’t look at every single piece of labeled data. Also, because they are based on the more extreme cases, they can be quite useful for labeling confusing data like the one before of the car-bike.

In the image above, if we were to consider the x-axis to be the number of doors present in the image, and the y-axis to be the number of wheels present in the image, can you guess what the blue dots would represent and what the green dots would represent? Cars or bikes?

Well, the blue dots would represent cars because cars generally have more wheels and doors, whereas the green dots would represent bikes because they have fewer doors and fewer bikes. Remember, this is pre-labeled data, and what the support vector machine really has to do is draw the optimal line between these two clusters so that any image above the line is classified as a car and any image below the line is classified as a bike.** It does this by looking at the extreme cases, the support vectors, and tries to generally maximize the margin (the distance) the boundary line (or the hyperplane) is away from these points. More specifically, it creates two equidistant hyperplanes where the support vectors are parallel to the boundary lines and maximizes the distance between the generated hyperplanes with the boundary line. **We want to maximize the distance between both clusters because we want to** **make the dividing line as in between both clusters as possible (whatever “center” means). We call the dividing line, the hyperplane, because in reality, when working with real models, we don’t look at only two features, but many more and thus have graphs that are of dimension *N*. If we started looking at how metallic the image looks, for example, we might have to add another axis and we would have 3 dimensions. Then it would become a boundary plane, but to save us the trouble of having to rename the same idea of a boundary, we just call it a hyperplane.

Now, you might have a question. What about data that can’t be divided using a line like in the one of the image to the right?

**Non-Linear Data Sets and Support Vector Machines**

What we showed above was a linear support vector machine (LSVM), but the concept can be extended to non-linear examples as well. In fact, it is quite easy. There’s something called a kernel trick which essentially allows you to take a data set that cannot be separated by a hyperplane and make it so it can be. The general idea at least mathematically goes like this. If we define a kernel function *ϕ*(x) in a certain specified way, it is possible to transform a non-linear data set in *N* dimensions to a higher dimension.

If that doesn’t make sense look at the picture below

In this picture, we can see an input space, namely, the non-linear data get transformed into a 3-dimensional space where it can be divided by a hyperplane. The kernel function solves our problem because at this point we are simply dealing with an LSVM but in a higher-dimensional space. Whenever we need to look at unlabeled data, we simply apply the kernel function to it and see if it is above or below the hyperplane. Problem solved!

I would go more in-depth on kernel functions and how exactly we find them and how they work, except this is more of an introduction to Support Vector Machines than anything. I will most probably have another post on this in the *future explaining the math behind SVMs *and how they work with all the mathematical rigor behind it. For now, I think understanding the general idea of SVMs and the motivations for having them is more important.

**Do I need Support Vector Machines?**

Support Vector Machines have numerous applications in machine learning and as such employers hiring data scientists see support vector machines as a **must-know** for its unexpected use cases whether it be in text organization, protein classification, or in other cases where data has a good margin of separation. It definitely doesn’t take the cake when it comes to handling large data sets and it can be tedious to work with, especially compared to other models of learning nowadays, but having another tool never hurts, whether it be for pure enjoyment or for career purposes (often interviews ask questions about SVMs so stay tuned for *my upcoming posts on SVMs*). I hope you got a basic idea for what an SVM is and I hope you are staying safe during these tough times! Good luck and below are more resources you can use to learn more about the math and more technical details about SVMs!

Also, here are some great technology products, I would like to recommend. If you already are planning on buying these products, then buying it here will give me a small commission to keep this blog running at no cost to you! Thanks!:

- Roku Express HD Streaming Media Player 2019
- Echo Dot (3rd Gen) – Smart speaker with clock and Alexa – Sandstone
- Portable Charger Anker PowerCore 20100mAh – Ultra High Capacity Power Bank with 4.8A Output and PowerIQ Technology, External Battery Pack for iPhone, iPad & Samsung Galaxy & More (Black)
- All-New Fire 7 Tablet (7″ display, 16 GB)

byteofmath.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to ("byteofmath.com” (amazon.com)).

**Resources to Check out!**

https://www.youtube.com/watch?v=_PwhiWxHK8o

https://pythonprogramming.net/support-vector-machine-intro-machine-learning-tutorial/

https://www.analyticsvidhya.com/blog/2017/10/svm-skilltest/

http://web.mit.edu/6.034/wwwbob/svm.pdf

https://www.quora.com/What-are-some-interview-questions-on-Support-Vector-Machines-use-in-practice

Please list more resources in the comments if you find them more useful and be sure to subscribe to get the latest updates on our blog!