Join egghead, unlock knowledge.

Want more egghead?

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock This Lesson

Already subscribed? Sign In

Autoplay

    Use Support Vector Machines To Find Complex Decision Boundaries with Scikit-learn

    Hannah DavisHannah Davis

    We’ll continue with the iris dataset to implement support vector machines, which can be used to find more complex boundaries for classification or regression problems.

    More about Support Vector Machines can be found at scikit-learn.

    There are several types of kernel function that can be used with SVMs. Scikit-learn supports these kernels:

    -linear

    -polynomial ('poly')

    -rbf (radial basis function)

    -sigmoid

    Custom kernels are also supported. Rbf is the default kernel type.

    A good overview of kernels can be found here, or at the scikit-learn page.

    scikit-learnScikit-Learn
    pythonpython
    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    Instructor: 00:01 From scikit-learn, we'll import our datasets. We'll import our metrics. We'll import our train_test_split function. From sklearn.svm, we'll import SVC, which stands for support vector classifier. If we wanted to do regression, we would import SVR.

    00:29 We're working with the iris dataset, which is datasets.load_iris. We'll assign X to be our iris.data. Our y is iris.target. Then we'll split our data into training and test datasets, so X_train, X_test, y_train, and y_test equals train_test_split that takes our X and y data, a test_size, which will be 15 percent, and a random_state, which will be 2.

    01:11 Then we'll say model equals SVC. We can say model.fit(X_train, y_train). We can make predictions on our test data by saying model.predict and passing in our X_test data. Then we can print our accurate labels and the predictions. We can see the model predicted most of those right.

    01:53 Let's print our model.score. For support vector machines, the default is the accuracy score. Pass it our X_test and y_test data. We can see it's about 95.65 percent accurate.

    02:09 Support vector machines can give us more complex decision boundaries. We get those by using kernels. The SVC function takes an argument called kernel. The default is RBF. That stands for radial basis function.

    02:26 Scikit-learn has support for four kernels. You could think of these as kind of similarity functions, an indicator of how to measure the similarity between two data points. Our other options are sigmoid, linear, and poly, polynomial. You generally want to find the best kernel for your dataset.

    02:54 In addition to the model's score, we can also look at the classification report by passing in our accurate labels and our predictions. We can print our confusion matrix by passing in our accurate labels and our predictions.

    Discuss

    Discuss