Build your own AI Voice Assistant to Control Your PC:
A simple guide on how you can construct your own AI assistant to control various actions on your PC
Recently, the usage of virtual assistants
to control your surroundings is becoming a common practice. We make use of
Google AI, Siri, Alexa, Cortana, and many other similar virtual assistants to
complete tasks for us with a simple voice or audio command. You could ask them
to play music or open a particular file or any other similar task, and they
would perform such actions with ease.
While
these devices are cool, it is also intriguing to develop your own AI voice
automated assistant, which you can utilize to control your desktop with just
the help of your voice. We can use such an AI to chat with you, open videos,
play music, and so much more.
Humantechai working on developing an introductory project for an AI assistant
that you can utilize to control your PC or any other similar device with your
voice. We will get started with an introduction to some of the basic
dependencies that are required to construct this project and proceed to put it
all together into a Python file through which the AI Voice assistant is built
to follow your commands.
Before
diving into this article, if you are interested in other such cool projects
where we construct stuff from scratch, I would recommend checking out one of my
previous works. Below is a link provided where you can develop your own weather
application indicator with Python in less than ten lines of code.
Getting Started with the Basics:
Part-1: The Desktop Control:
In this section of the article, we will
learn how to control our PC. We will learn how to manage and handle some basic
operations on the physical screen. With the help of the PyAutoGUI, we can
perform numerous functionalities required for this project. This automation
library tool allows the users to programmatically control the mouse and
keyboard.
You can
install the requirements to handle all the cursor, mouse, and keyboard-related
tasks with the PyAutoGUI library with a simple pip command, as shown below.
Let us get started with some of the basic
commands from this library that we will require for developing our Python
project for voice-assisted AI. After a couple of minutes, the installation
should be finished in the respective environment without much of a hassle.
Firstly,
let us import the PyAutoGUI library as shown in the below code snippet. The
next critical step is to know the resolution of your working screen. We can
print the default screen size and height of the screen with the size function
available in the recently installed library.
Output: 1920 1080
You can
notice that the resolution of my screen is 1920 x 1080, which should be the
default screen size for most computers. However, if you have higher or lower
resolutions on your monitor screen, you can still follow along with the guide
easily. The commands can be used interchangeably to obtain the desired
coordinates on any resolution. Just make sure to change some of the parameters
accordingly if your screen display resolution doesn’t match mine.
The
other essential command that we will go cover in this section is the command to
discover the current location and position of your mouse pointer. The
position() function of the library will locate the current coordinates where
your mouse pointer is placed. We can use these positions to locate folders and
other essential directories on your Desktop screen. Below is the code snippet
to perform the following action.
Another interesting
functionality of the library is that you can locate the position of certain
images on your current working screen along with the respective coordinates
with the code snippet provided below. The final essential command that
we will look at in this section is the function that allows us to open the
desired directory. By placing my cursor on the top left corner, I was able to
figure out the coordinates of my admin folder. We can move the cursor to the
respective location by using the move To () function along with the respective
position of the folder. We can then use the click() command by mentioning the
left or right mouse button clicks and the number of clicks you want to do. With the above code snippet, you should be able to open
the admin folder as the cursor automatically moves to the admin directory and
double-clicks on it to open it. If you don’t have a similar icon on the
top-left of your screen or if you have a different screen resolution, feel free
to experiment with the positions and coordinates accordingly.
Part-2: The Voice Command
Control:
In this section of the article, we will
understand some of the basic requirements for speech recognition, which is the
second most core component of this project. We will require a microphone to
pass our commands through voice and interpret the information accordingly. The
speech recognition library, along with a text-to-speech converter of your
choice, is recommended. Also ensure that you have PyAudio installed in your
working environment.
If the
viewers are not too familiar with text-to-speech, I would highly recommend
checking out one of my previous articles, where I cover Google text-to-speech
with Python with beginner codes to get you started. The link for the same is
provided below.
Firstly, we can import
the necessary libraries as shown in the below code block. The speech
recognition library will enable us to detect the necessary voice commands.
Additionally, we can make use of a text-to-speech library as well to pass text
commands and convert them to voice and pass them to the system to perform the
desired operation. We can create a variable for the voice recognizer. In
the next step, we will read the microphone input of the user as the source and
interpret the speech accordingly. Once the audio is recognized as desired, the
speech output is displayed on the terminal output. However, if the speech is
undetected, we can pass the necessary exceptions to ensure that the user can
verify their settings accordingly. Below is the code snippet for simple speech
recognition. In the next step, we will construct the final build for the AI voice
assistant, where we can combine the two features discussed in this section into
a single entity to perform the required actions.
Developing the final build for AI Voice Assistant:
Now that we have a basic understanding of the two essential core components of this article in device control and speech recognition, we can start combining both these elements to develop our project. Let us start with the necessary library imports as shown below. In the next snippet, we will define the command’s function, where we will interpret numerous actions. In the below code block, I have defined only a couple of functionalities, i.e., to open my admin directory or the start menu. The function takes a text input provided by the user. We can add several other necessary commands to make further improvements to this project. In the next code block, we will define the functionality for receiving the audio input from the user and recognizing the speech accordingly. Once the audio is heard, make sure to convert it into a lower case before passing the text input into our commands function. Once the below code is built, you are free to test and run the project. In the next code block, we will define the functionality for receiving the audio input from the user and recognizing the speech accordingly. Once the audio is heard, make sure to convert it into a lower case before passing the text input into our commands function. Once the below code is built, you are free to test and run the projec
0 Comments