⚠️ This lesson is retired and might contain outdated information.

Use Support Vector Machines To Find Complex Decision Boundaries with Scikit-learn

Hannah Davis
InstructorHannah Davis
Share this video with your friends

Social Share Links

Send Tweet
Published 7 years ago
Updated 2 years ago

We’ll continue with the iris dataset to implement support vector machines, which can be used to find more complex boundaries for classification or regression problems.

More about Support Vector Machines can be found at scikit-learn.

There are several types of kernel function that can be used with SVMs. Scikit-learn supports these kernels:

-linear

-polynomial ('poly')

-rbf (radial basis function)

-sigmoid

Custom kernels are also supported. Rbf is the default kernel type.

A good overview of kernels can be found here, or at the scikit-learn page.

Instructor: [00:01] From scikit-learn, we'll import our datasets. We'll import our metrics. We'll import our train_test_split function. From sklearn.svm, we'll import SVC, which stands for support vector classifier. If we wanted to do regression, we would import SVR.

[00:29] We're working with the iris dataset, which is datasets.load_iris. We'll assign X to be our iris.data. Our y is iris.target. Then we'll split our data into training and test datasets, so X_train, X_test, y_train, and y_test equals train_test_split that takes our X and y data, a test_size, which will be 15 percent, and a random_state, which will be 2.

[01:11] Then we'll say model equals SVC. We can say model.fit(X_train, y_train). We can make predictions on our test data by saying model.predict and passing in our X_test data. Then we can print our accurate labels and the predictions. We can see the model predicted most of those right.

[01:53] Let's print our model.score. For support vector machines, the default is the accuracy score. Pass it our X_test and y_test data. We can see it's about 95.65 percent accurate.

[02:09] Support vector machines can give us more complex decision boundaries. We get those by using kernels. The SVC function takes an argument called kernel. The default is RBF. That stands for radial basis function.

[02:26] Scikit-learn has support for four kernels. You could think of these as kind of similarity functions, an indicator of how to measure the similarity between two data points. Our other options are sigmoid, linear, and poly, polynomial. You generally want to find the best kernel for your dataset.

[02:54] In addition to the model's score, we can also look at the classification report by passing in our accurate labels and our predictions. We can print our confusion matrix by passing in our accurate labels and our predictions.