Using YOLO for Real-Time Object Detection with Koyeb GPUs
25 minIntroduction
Welcome to this comprehensive guide on implementing real-time object detection using the YOLO (You Only Look Once) algorithm. This technology represents a leap forward in how we detect objects in real time, making it an invaluable tool in surveillance, robotics, and autonomous driving fields.
In this guide, we will walk through the theory behind YOLO, how it works, and how to implement it in your projects for your real-time object detection. We'll also see real-world applications of YOLO in action, showcasing its power and versatility.
You can consult the project repository as work through this guide. You can deploy the YOLO object detection application as built in this tutorial using the Deploy to Koyeb button below:
Understanding YOLO (You Only Look Once)
YOLO, an acronym for "You Only Look Once", is an innovative approach to object detection. The primary difference between YOLO and other object detection algorithms is in the way it handles object detection.
While most algorithms process an image multiple times to detect objects, YOLO uses a single pass, prompting the name "You Only Look Once".
This is a simplified breakdown of how it works:
- First, the entire image is divided into a grid. Each cell in the grid evaluated individually for predicting objects within its boundaries.
- For each grid cell, YOLO generates a number of bounding boxes. A bounding box is a rectangular box that can be computed to contain an object. Each bounding box comes with a confidence score, representing how certain YOLO is that the predicted box encloses some object.
- Along with the bounding box, YOLO also provides the class of the object that it believes exists within the box. For example, it might claim that the object is a cat, dog, car, or any other type of object it has been trained to recognize.
- After making all of the predictions, YOLO selects the bounding box with the highest confidence level and the associated object class as its final prediction.
YOLO's unique approach allows it to detect objects in real-time, making it a great choice for many computer vision tasks.
Requirements
Before diving into building the object detection application with YOLO, it's important to ensure that you have the necessary tools and knowledge. This section outlines the prerequisites you'll need to follow the guide successfully:
- A Koyeb account is required to deploy.
- Knowledge of Python programming to understand the scripts we will create.
- A GitHub account: to store the code and trigger deployments to Koyeb GPUs.
Steps
There are several implementations of the YOLO algorithm available, but for ease-of-use, we will use the Ultralytics implementation in this guide. We will implement and test the code locally and then deploy to Koyeb's GPUs for higher inference speed.
To get started, create a project directory and then create and activate a new virtual environment within:
Now that the virtual environment and project directory are set up, make sure you have the necessary libraries and a version of YOLO installed. Visit PyTorch's getting started page to find the appropriate commands for your operating system. For this guide, we will use the Stable build, the Pip package manager, the Python language, and CUDA 12.1 as the compute platform. The commands to run will depend on your local operating system:
Next, install the Ultralytics YOLO library:
In addition to installing the YOLO implementation from Ultralytics, this also installs OpenCV for image processing.
We will be using Streamlit to build the application UI. In order to support non-local access to the webcam (when deployed to the Koyeb cloud, for instance), you will also need to install a few additional libraries:
You now have all the necessary dependencies installed.
Implementing YOLO for Real-Time Object Detection
Let's now create a simple Streamlit application that shows the YOLO ease of use for object detection, you can create a file called web.py
:
Here's a breakdown of the code:
- Import the necessary libraries:
streamlit_webrtc
is used for real-time video streamingultralytics
is used for loading the YOLOv8 modelav
is used for handling video frames
- Load the YOLOv8 model with the
yolov8n.pt
weights. We will use this model to detect objects in the video frames. - Define the
process_frame
function. This function takes a video frame as input. It converts the frame to an image using PyAV, and then applies the YOLOv8 model to detect objects in the image. The model uses the ByteTrack tracker to keep track of objects between frames. The detection results are then visualized on the frame, and the annotated frame is returned. - Call the
webrtc_streamer
function to create a WebRTC video streamer. We pass theprocess_frame
function as thevideo_frame_callback
argument so that it will be called for each frame of the video stream. Themedia_stream_constraints
argument is used to specify that only the video should be streamed, not audio. Theasync_processing
argument is set toTrue
to enable processing frames in parallel with the streaming. Themode
argument is set toWebRtcMode.SENDRECV
, which sets the streamer to both send and receive video. We also define also anrtc_configuration
so that it can use Google's free STUN servers so that it works when running in the cloud.
You can run the application locally with:
Once the application launches, access it by navigating to http://localhost:8501/
in your browser.
The YOLOv8 model has the capability to not only detect objects in a frame but also track them across frames. This is particularly useful in applications like video surveillance, autonomous vehicles, and human-computer interaction.
For tracking objects, YOLOv8 provides two options: BoT-SORT and ByteTrack.
- BoT-SORT is an extension of the original SORT (Simple Online Realtime Tracker) algorithm that also considers the motion history of objects for tracking. It uses the Kalman filter to predict the future position of objects and the Hungarian algorithm for data association. To enable BoT-SORT in YOLOv8, you need to use the
botsort.yaml
configuration file. - ByteTrack is a more recent tracking algorithm that is designed to handle occlusions and overlapping objects better. It uses a byte-level representation of objects and a motion model to predict the future position of objects. To enable ByteTrack in YOLOv8, use the
bytetrack.yaml
configuration file.
Let's expand the initial application and allow users to upload videos for processing in addition to real-time processing:
Here's a breakdown of the new code:
- Import the new libraries:
cv2
is for handling video files with OpenCVstreamlit
is to create the web application
- Load the YOLOv8 model with the
yolov8n.pt
weights. We will use this model to detect objects in the video frames. - Define the
process_video
function. This function takes a video file path as input and opens the video file using OpenCV. It then loops through the video frames and applies the YOLOv8 model to detect objects in each frame. It returns the frame with the detection results displayed on it. - Create a Streamlit file uploader to allow the user to upload a video file. We create a Streamlit button to start the video processing. When the button is clicked and a file is uploaded, the file is saved to a temporary directory and then processed using the
process_video
function.
Real-World applications of YOLO object detection
The Ultralytics YOLO documentation outlines several specific tasks that can be performed using the YOLO algorithm. Let's explore some of these tasks and their applications in real-world scenarios.
Object Counting
Object counting is the process of simply counting the number of instances of a specific object within an image or video. This task is particularly useful in scenarios like:
- crowd management: Estimating the number of people in a crowded area.
- inventory management: Counting products on shelves in a retail environment.
- wildlife monitoring: Tracking the number of animals in a specific area for ecological studies.
Code Example
Let's see the code changes we could make to our web.py
file in order to support object counting:
Here's a breakdown of the new code:
- Add the
solutions
class to the imports fromultralytics
. This class contains pre-made solutions from Ultralytics likeObjectCounter
. - Define the
count_objects(video_file)
function. This function takes a video file as input and processes it to count objects. Inside the function, we do the following:- Define the
region_points
variable. This variable represents the region of interest (ROI): the coordinates of the region in the video frame where the objects are to be counted. - Initialize the object counter with
solutions.ObjectCounter
. This class is responsible for counting objects that pass through the defined region. It takes parameters such asview_img
,reg_pts
,names
,draw_tracks
, andline_thickness
. - Open the uploaded video using OpenCV's
cv2.VideoCapture(video_file)
function. - Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 tracking is applied to the frame with
model.track(frame, persist=True, show=False)
. Thecounter.start_counting(frame, tracks)
method is called to count objects in the frame and the annotated frame is then displayed. - Release the video capture object with
cap.release()
once the video is finished.
- Define the
- Add a "Count Objects" button and an empty placeholder for displaying the video frames to the user interface. When the button is clicked, the video file is saved to a temporary directory, and the
count_objects(path)
function is called to process the video.
Let's see an example of object counting:
Object Cropping
Object cropping identifies and extracts specific objects from an image or video. This task allows you to create smaller images containing only the objects of interest, which can be useful for:
- image editing: Isolating objects for graphic design purposes.
- data augmentation: Creating additional training data for machine learning models.
- focus enhancement: Highlighting objects in presentations or reports.
Code Example
Let's see the code changes we can make to to the web.py
file in order to support object cropping:
Here's a breakdown of the code:
- Import the new libraries. Import the
Annotator
andcolors
classes fromultralytics.utils.plotting
. These classes draw bounding boxes and retrieve colors for the bounding boxes, respectively. - Define the
crop_objects(video_file)
function. This function takes a video file as input and processes it to crop objects. Inside the function, we do the following:- Open the uploaded video using OpenCV's
cv2.VideoCapture(video_file)
function. - Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 detection is applied to the frame with
model.predict(frame, show=False)
and the bounding boxes and class labels are retrieved. - Crop the object. If bounding boxes are detected, an
Annotator
object is created and, for each bounding box, the object is cropped from the frame and displayed. The original frame with the bounding boxes is not displayed. - Release the video capture object with
cap.release()
once the video is finished.
- Open the uploaded video using OpenCV's
- Add a "Crop Objects" button and an empty placeholder for displaying the video frames to the user interface. When the button is clicked, the video file is saved to a temporary directory, and the
crop_objects(path)
function is called to process the video.
Let's see an example of object cropping:
Object Blurring
Object blurring detects and blurs specific objects within an image or video. This task is essential for maintaining privacy and confidentiality in various applications, including:
- privacy protection: Blurring faces or license plates in surveillance footage.
- hiding sensitive information: Concealing confidential information in documents or presentations.
- anonymization: Ensuring individuals' identities are not revealed in public datasets.
Code Example
Let's see the code changes we need to make to our web.py
file in order to support object blurring:
Here's a breakdown of the code:
- Define the
blur_objects(video_file)
function. This function takes a video file as input and processes it to blur objects. Inside the function, we do the following:- Define the
blur_ratio
variable. This variable represents the intensity of the blur effect. - Open the uploaded video using OpenCV's
cv2.VideoCapture(video_file)
function. - Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 detection is applied to the frame with
model.predict(frame, show=False)
and the bounding boxes and class labels are retrieved. - Blur the object. If bounding boxes are detected, an
Annotator
object is created and, for each bounding box, the object is cropped from the frame, blurred using OpenCV'scv2.blur()
function, and then placed back in the original frame. The annotated frame with the blurred objects is then displayed. - Release the video capture object with
cap.release()
once the video is finished.
- Define the
- Add a "Blur Objects" button and an empty placeholder for displaying the frames with blurred objects. When the button is clicked, the video file is saved to a temporary directory, and the
blur_objects(path)
function is called to process the video.
Let's see an example of object blurring:
Deploy to Koyeb's GPUs
Now that we have the application running locally, it is time to make use of Koyeb's high-performance GPUs to increase the inference processing speed.
Before we do that, we need to create a Dockerfile to install and configure the necessary code and dependencies.
First, create a requirements.txt
file with versions of the packages we installed earlier that will work in the cloud:
This list includes all of the project's requirements, including the dependencies of the ultralytics
package. However, you may notice that we do not list ultralytics
itself. That's because the ultralytics
package depends explicitly on opencv-python
, which won't work in a headless environment like Koyeb.
Instead, we install opencv-python-headless
as a workable substitute along with all of the package's other dependencies. Afterwards, in the Dockerfile, we will install ultralytics
separately with the --no-deps
flag. In this way we can install all of the package's dependencies ahead of time, substituting opencv-python-headless
for opencv-python
, to avoid conflicts with ultralytics
stated dependencies.
Next, create a new Dockerfile in your project directory with the following content:
Let's examine the Dockerfile in detail:
FROM python:3.12
: This line specifies the base image for the Docker image. In this case, it's the official Python 3.12 image.COPY requirements.txt .
: This line copies therequirements.txt
file from the local machine to the current working directory of the Docker image.RUN pip install --upgrade pip && pip install -r requirements.txt
: This line upgrades pip, the Python package installer, to the latest version and then uses it to install all of the Python dependencies listed in therequirements.txt
file.RUN pip install --no-deps ultralytics
: This line installsultralytics
without dependencies to avoid installingopencv-python
. We can do this because therequirements.txt
file already manually installs all of the dependencies and provides a workable substitute foropencv-python
.COPY web.py .
: This line copies theweb.py
file from the local machine to the Docker image in the current directory.EXPOSE 8501
: This line specifies that the Docker container will listen on port 8501.CMD ["streamlit", "run", "web.py"]
: This is the command that will be run when the Docker container starts. It starts the Streamlit app by running theweb.py
script.
Now your project is ready to deploy on GPUs from Koyeb. The final step is to create a repository on your GitHub account.
Download the standard Python .gitignore
file from GitHub to exclude certain folders and files from being pushed to the repository:
Add a line to avoid committing the yolov8n.pt
file that the application creates:
Now, create a new repository on GitHub and run the following commands in your terminal to commit and push your code to the repository:
All of your local code should now be present in the remote repository. You can now deploy the application to Koyeb.
In the Koyeb control panel, on the Overview tab, initiate the app creation and deployment process by clicking Create Service and then Create web service:
- Select GitHub as the deployment source.
- Choose the repository where your code resides. For example,
YoloRealTime
. - Select the GPU Instances and choose your Instance type, for example
RTX-4000-SFF-ADA
. - In the Builder section, select Dockerfile.
- In the Exposed ports section, change the port to
8501
. - Give the Service your preferred name.
- Click Deploy.
During the deployment process, it may take a few minutes to build the Docker image from the Dockerfile and upload it to Koyeb's container registry. After deployment, you can access the Streamlit application using your Koyeb application's public URL. Please note that when running in the cloud, the video image from a webcam might take several of seconds to start streaming since it needs to connect with the STUN servers.
Conclusion
In this guide, we learned about real-time object detection using the YOLO (You Only Look Once) algorithm. We started by discussing areas where real-time detection is important, such as in surveillance, robotics, and self-driving cars and why its ability to process images in a single pass makes it a good tool for real-time applications.
Afterwards, we took a look at how YOLO works by explaining its grid-based prediction system and how it manages to predict bounding boxes and class probabilities at the same time. Throughout the guide we explored advanced tasks YOLO excels at like object counting, cropping, blurring, segmentation, tracking, and action recognition. These tasks show how versatile and powerful YOLO is in handling complex real-world situations.
Finally, we gave practical advice on how to use Docker to deploy YOLO, making it easy to run YOLO applications in a consistent and reproducible environment. We used this to deploy and run the application on high-performance GPUS on Koyeb. As YOLO and real-time object detection continue to improve, they will become even more useful and have more applications. The ability to capitalize on these advancements with by executing on high-performance hardware may lead to exciting breakthroughs in many industries.