Combine DeepSeek R1 Reasoning with GPT 3.5 Turbo for the Cheapest, Fastest, and Best AI

This lesson demonstrates how to utilize DeepSeek's R1 model to access its reasoning capabilities and then seamlessly transition to other models for subsequent tasks. By extracting the reasoning output from DeepSeek R1, you can leverage its powerful reasoning while gaining the speed, cost efficiency, and configuration flexibility of alternative models. This two-phase approach begins with DeepSeek’s new R1 model, praised for its open-source nature and value. You then gain the adaptability to switch to other models depending on your needs or existing workflows. This lesson provides a step-by-step walkthrough of this process, along with demo code to help you implement this technique yourself.

Share with a coworker

Transcript

[00:00] DeepSeek has released a model called R1, which everyone is super excited about because it's an open source reasoning model. Now when using the DeepSeek API from deepseek.com, go there and sign up for an API key. They split the output into two phases, so if you enable streaming, and DeepSeek just uses the OpenAI API just like OpenRouter and many others do. So once you turn on streaming you can grab each chunk of the response and check to see if it's reasoning content or regular content. The one we care about is reasoning content, because that's what comes in first, and we're going to check to see if it's null.

[00:36] If it's not null, we're going to hold onto it in this reasoning string and log it out in the console so it's visible. But if it is null, we're going to early exit out of this loop, abort the response, break the loop, and log that reasoning is done. And this allows us to rip out the reasoning from DeepSeq without having to wait for its final summary. So in turn we can swap over to a completely different API, a completely different model. In this scenario we're using GPT 3.5 Turbo through OpenRouter.

[01:07] And while 3.5 Turbo is old, it's also extremely fast and essentially free. And because it no longer has to think, we can feed it the initial question, the reasoning that came out of R1, and then just tell it to do something with the question and reasoning. So it's only doing basic text manipulation instead of thinking. And this is so fast that we don't even have to stream it out. We'll just get the final summary.

[01:31] And to see this in action, we'll run this script. It'll ask how can I help? I'll say why is John Lindquist such a swell guy? Hit enter let it think about why this random dude on the internet is pretty neat Do do do and then reasoning done And then you saw this final summary was essentially instant. So instead of letting R1 write this out or whatever it would have written, which may have been much longer and we didn't have control over, we handed it over to something that was faster and essentially free and allows us to configure it.

[02:11] Now R1 is still pretty cheap. So so far with 29 API requests I've only spent about three cents which is incredible compared to other reasoning models and obviously if you look at 3.5 this is cheaper than dirt it's gonna take a while before I even spend a single penny on using 3.5 so I'll post this script beneath the video you can play around with it and let me know if you have any questions.