Build your own AI Voice Assistant to Control Your PC:

A simple guide on how you can construct your own AI assistant to control various actions on your PC

Recently, the usage of virtual assistants to control your surroundings is becoming a common practice. We make use of Google AI, Siri, Alexa, Cortana, and many other similar virtual assistants to complete tasks for us with a simple voice or audio command. You could ask them to play music or open a particular file or any other similar task, and they would perform such actions with ease.

While these devices are cool, it is also intriguing to develop your own AI voice automated assistant, which you can utilize to control your desktop with just the help of your voice. We can use such an AI to chat with you, open videos, play music, and so much more.

Humantechai working on developing an introductory project for an AI assistant that you can utilize to control your PC or any other similar device with your voice. We will get started with an introduction to some of the basic dependencies that are required to construct this project and proceed to put it all together into a Python file through which the AI Voice assistant is built to follow your commands.

Before diving into this article, if you are interested in other such cool projects where we construct stuff from scratch, I would recommend checking out one of my previous works. Below is a link provided where you can develop your own weather application indicator with Python in less than ten lines of code.

Getting Started with the Basics:

Part-1: The Desktop Control:


In this section of the article, we will learn how to control our PC. We will learn how to manage and handle some basic operations on the physical screen. With the help of the PyAutoGUI, we can perform numerous functionalities required for this project. This automation library tool allows the users to programmatically control the mouse and keyboard.

You can install the requirements to handle all the cursor, mouse, and keyboard-related tasks with the PyAutoGUI library with a simple pip command, as shown below.

Let us get started with some of the basic commands from this library that we will require for developing our Python project for voice-assisted AI. After a couple of minutes, the installation should be finished in the respective environment without much of a hassle.

Firstly, let us import the PyAutoGUI library as shown in the below code snippet. The next critical step is to know the resolution of your working screen. We can print the default screen size and height of the screen with the size function available in the recently installed library.

Output: 1920 1080

You can notice that the resolution of my screen is 1920 x 1080, which should be the default screen size for most computers. However, if you have higher or lower resolutions on your monitor screen, you can still follow along with the guide easily. The commands can be used interchangeably to obtain the desired coordinates on any resolution. Just make sure to change some of the parameters accordingly if your screen display resolution doesn’t match mine.

The other essential command that we will go cover in this section is the command to discover the current location and position of your mouse pointer. The position() function of the library will locate the current coordinates where your mouse pointer is placed. We can use these positions to locate folders and other essential directories on your Desktop screen. Below is the code snippet to perform the following action.

Another interesting functionality of the library is that you can locate the position of certain images on your current working screen along with the respective coordinates with the code snippet provided below. The final essential command that we will look at in this section is the function that allows us to open the desired directory. By placing my cursor on the top left corner, I was able to figure out the coordinates of my admin folder. We can move the cursor to the respective location by using the move To () function along with the respective position of the folder. We can then use the click() command by mentioning the left or right mouse button clicks and the number of clicks you want to do. With the above code snippet, you should be able to open the admin folder as the cursor automatically moves to the admin directory and double-clicks on it to open it. If you don’t have a similar icon on the top-left of your screen or if you have a different screen resolution, feel free to experiment with the positions and coordinates accordingly.

Part-2: The Voice Command Control:


In this section of the article, we will understand some of the basic requirements for speech recognition, which is the second most core component of this project. We will require a microphone to pass our commands through voice and interpret the information accordingly. The speech recognition library, along with a text-to-speech converter of your choice, is recommended. Also ensure that you have PyAudio installed in your working environment.

If the viewers are not too familiar with text-to-speech, I would highly recommend checking out one of my previous articles, where I cover Google text-to-speech with Python with beginner codes to get you started. The link for the same is provided below.

Firstly, we can import the necessary libraries as shown in the below code block. The speech recognition library will enable us to detect the necessary voice commands. Additionally, we can make use of a text-to-speech library as well to pass text commands and convert them to voice and pass them to the system to perform the desired operation. We can create a variable for the voice recognizer. In the next step, we will read the microphone input of the user as the source and interpret the speech accordingly. Once the audio is recognized as desired, the speech output is displayed on the terminal output. However, if the speech is undetected, we can pass the necessary exceptions to ensure that the user can verify their settings accordingly. Below is the code snippet for simple speech recognition. In the next step, we will construct the final build for the AI voice assistant, where we can combine the two features discussed in this section into a single entity to perform the required actions.

Developing the final build for AI Voice Assistant:


Now that we have a basic understanding of the two essential core components of this article in device control and speech recognition, we can start combining both these elements to develop our project. Let us start with the necessary library imports as shown below. In the next snippet, we will define the command’s function, where we will interpret numerous actions. In the below code block, I have defined only a couple of functionalities, i.e., to open my admin directory or the start menu. The function takes a text input provided by the user. We can add several other necessary commands to make further improvements to this project. In the next code block, we will define the functionality for receiving the audio input from the user and recognizing the speech accordingly. Once the audio is heard, make sure to convert it into a lower case before passing the text input into our commands function. Once the below code is built, you are free to test and run the project. In the next code block, we will define the functionality for receiving the audio input from the user and recognizing the speech accordingly. Once the audio is heard, make sure to convert it into a lower case before passing the text input into our commands function. Once the below code is built, you are free to test and run the projec