How to Make Your Own AI Waifu Virtual VTUBER or Assistant

How to Make Your Own AI Waifu Virtual VTUBER or Assistant

Show Video

okay guys so today we will make AI waifu VTUBER like this first I will run the code and the code will tell me to choose whether I use microphone or YouTube live stream chat for now let's we choose number one and then I will press and hold right shift to record audio from my mic okay what is your name? tell me good place in Japan Tokyo why you recommend that place okay for the code I already post it on my Github repository you can check it on the description link and in this video I will try to explain how to use this code and I will explain the code so if you already cloned the code you will see there are run.py so first you need to install all the library that will be used in this code and beside of this you will also need to install VoiceVox Engine and you will need API key from Openai and DeepL the DeepL is optional you can also use Google Translate and of course you will need to install VB Virtual Cable what is this for I will explain it later let's way back to the code so in this code you will need like what I said before you will need API key from Open AI you can get it from the Open AI website and I save my API key on config.py and then you will need lore.txt where inside this the txt will save the lore of your AI so for example I write in here so I make the lore for my assistant you are an AI Waifu Virtual Youtuber called Pina your creator is Ardha he made you using blah blah blah blah blah you answer with brief to the point and with no eleboration and then the code will run from this section main section so you will need to input need either 1 or 2 1 if you want to use your microphone and 2 if you want to get the AI to answer live chat live stream from YouTube so for example we choose one if we choose one you need to hold right Shift on your keyboard to make the program record your audio and then the program will call the record audio on the Record Audio it will record your audio while you hold your right shift and then it will save it in input.wav after that the input.wav well go to transcribed transcribe_audio function in this function it will transcribe the audio into text using Whisper AI so for example you say "how are you" and then the and then this function will transcribe the audio from your microphone into text after it translated into text music so in here I translate the input from user from Indonesia to English, why? I translate it to English because it will be more easier for the translator if we translate EN to JP compared with ID to JP but if you are English speaker you don't need to use this code or this function and then it will save the conversation into variable conversation it will this variable conversation will make your AI have a short term memory so it will be more human or be more contextual and then after that code will call openai_answer in openai_answer first we will get the total number or total number of character inside the conversation why we get the total character because openAI only can receive input in 4K character I think 4K something if the total character is more than 4000 it will remove the second dictionary from the list so it will make sure your conversation is not more than 4000 after that conversation will get into message and get answer from OpenAI using GPT 3.5 Turbo

and then the response will also append to conversation and then the message that I that we get from the response from OpenAI we will send it to translate_text function so in this function we will translate first we will translate because the answer is in English I translate it to Indonesia and then I print it I also translate it from it must be EN I forget to change it so the text that we get is in English I translate it into Indonesia so I know what it what is the answer in Indonesia and then I also translate it to Japanese where the Japanese text it will be used to convert the string into an audio in Japanese and then we will send the result JP and result ID into speech_text so in this function we will call katakana_converter where this function will translate English word into katakana for example if there are ORANGE word in English it will be translated to orenji because if it write as ORANGE your AI will will spell orange O R A N G E not orenji so we will need to convert it into katakana and the katakana converter is on katakana.py and also for the translate there are in the translate function is on translate.py you can either use Google Translate or DeepL if you want to use DeepL you can use this function and you if you want to use Google Translate you can use this function and here I use DeepLX DeepLX is free and open source DeepL API and I use DeepLX because I don't have authentication key from DeepL Pro because the DeepL is not available in my country if you want to use the DeepL you can make the function in here and then after you convert it into katakana you will need to make requests to VoiceVox don't forget to run VoiceVox Engine first before you running the script to run the VoiceVox engine you can use Docker okay you can use Docker you can run it on CPU or GPU if you want to run it on GPU you can run this 2 line of code this and this and don't forget to install of course to install the Docker and after you run the VirtualVox Engine you can request where the text is katakana katakana_text and the speaker is I choose this 46 so where are you know where do you know it's 46 you can try the audio that available in VoiceVox in here you can try you can listen to the example of all the voice actor in this website or in this VoiceVox website can try and listen one by one if you get the interesting voice you can go to speaker.json in here and you can find the ID number of the speaker in here for example this is the the name of this speaker is I don't know how to read it which is this one so she have 6 of voice variation so 1 2 3 4 5 6 you can choose one of the 6 variations of her voice in here if you want to use this variation you can choose 37 in the speaker section in here so after we request it music we will get the content of the request we will receive the audio and write it in output.wav and then after that we will write on output.txt

we will write the result_id on output.txt this will be used to display the subtitle on OBS so result_id is in here the answer from the open AI will be translated into Indonesia and then save it into result_id and then it will be saved in output.txt after it save it in output.txt it will be appear in wait a minute in this I use the output txt in here so the output txt will appear on the OBS display in here you will see it later and then we will also save the chat.txt which we get it from the chat_now variable the chat.txt will save what the user say

or what the live chats say and save it into txt file the txt file will be will be used inside this section in OBS Chat.txt and then it will appear on upper side of the OBS display we will use time sleep so the txt will be already write it in txt format and then we will play the sound we'll play the sound that already in output.txt we set the is_speaking is true when it play and we set the is_speaking is false when it already play finished play the speaking is used to prevent the assistant speaking more than one audio at a time and then after the the assistant is finished to to play the sound or to say the audio we use time sleep one second and then we delete the inside of output.txt

and chat.txt so it let the subtitle appear when the assistant is playing the audio or saying the answer and then after one second it's finished to say the answer the subtitle will be gone (disappear) that if you choose the mode 1 if you choose the mode 2 it will run the preparation using thread . and also run the get_livechat function in the same time so the get_livechat will get live chat from YouTube get livechat from YouTube blah blah blah blah blah in here if the chat author is the owner or the chatter is Nightbot it will pass it or continue the code Hmm if you want to include the chat from the owner you can delete this and then we use this re sub to remove emoji from the chat because sometimes in live chat there are emoji that will be like this so I use this to ignore the emoji text like this and then if the chat the chat length is more than 5 I will save it into conversation and I also save the author name "Berkata:" is "Said:" the author name said blah blah blah blah blah then I save it on the conversation why I save the author because it will make your AI know who is talking right now and your AI will be answering to this username I think you know what I mean it will run the preparation also so in the preparation if the assistant is not speaking and the chat is not empty and the chat is not a command and the chat is not the same as the previous chat the assistant will answer the chat so I use this if the assistant is not speaking and the conversation line is more than one and it's not startwith exclamation Mark because sometimes another streamer use exclamation Mark as bot command you know something like !duel or something like that so the exclamation Mark I use it on the if condition and if the chat now is not the same as chat previous it will call openai_answer and then it will be same as the mode 1 okay it's all for the explanation now let's re run the code for run the code, first like what I said you need to run the VoiceVox engine so because I already pulled the VoiceVox engine I just need to run it open CMD I run this, but I face an issue when I run this program I need to delete this so it will be just the port, enter okay it's running it's running and we see on the Docker the VoiceVox Engine status is running you can click it on this link it will open a new tap on your browser okay it's not running okay it's already running now we refresh it now after this if you want to see the docs of this API you can /docs if you familiar with API you can try to use this API and then after the voice Fox is running we can run our code our code our code you can use CMD or terminal from Visual Studio Code for now I will use terminal from Visual Studio code python run.py okay my computer my laptop is a little bit laggy because I open so much application so the program will need you to input 1 or 2 mode 1 or mode 2 let's now we try mode 1 OK hold right shift to record audio let's try hello how are you okay the bot is answerring it's generating the audio it's will be it's will be need a time so it will be like that it's need a little bit time because I open so much application inside my laptop ah I forget let's back at mode 1 give me suggestion what I must visit in Japan okay so the bot will have a short term of memory we can we can just say "why?" instead of why I must go to Tokyo Tower so I will just say "why?" the bot have a sort of memory so it's know what the meaning of "why?" we can use the output from the code and use it inside of Vtube Studio so you see the Vtube Studio is respond my mic as an input so to use the code the audio from the code as an input for this Avatar for this Pina Avatar or my Avatar from Visual Studio you can go to settings and then you find the where is the mouth open you find the mouth open and then change the input into voice volume after you change it into voice volume and then you need to change the the microphone change the microphone and select the VB Audio Virtual Cable VB Audio Virtual Cable so it will use VB Audio Virtual Cable or cable output as the microphone and then change your desktop audio into VB Audio Virtual Cable or the cable input okay you can install it you can find the tutorial in YouTube how to install this I will not explain how to install this on this video so the general, the logic is the code will play the audio inside the VB Audio Virtual Cable and the VB Audio Virtual Cable will be use as an input inside the Vstube Studio and then it will move the mouth of my character so let's we try it like for example I say in this code do you have wait do you have any suggestion beside Tokyo Tower and then we see the Vtube Studio it will be like that and you use game capture to to capture your Vtube Studio inside OBS and then like what I said before you you insert text like this question is from chat.txt

and answer is from output.txt the setting I use is like this and then the setting of the transform for the answer is like this and for the question is like this okay let's we try it once again do you have any suggestion of food in Japan let we see on OBS it will make the subtitle please try Sushi, Ramen, Tempura this after that will be like this okay that's all for the mode 1 so for the mode 2 mode 2 mode 2 select mode 2, you will need to input Live Stream ID so for the test we can find another streamer that live on this time we can go to YouTube live music okay for example I use this one copy the link address we place it in notepad and we just want to use this okay I paste it here and it will capture the live stream chat and answer it and automatically answer it using openAI okay we just need to see our OBS okay for the subtitle I don't get the best result on editing the subtitle because it sometimes the subtitle is bigger sometimes smaller i still finding the best setting for the subtitle okay it's a little bit laggy okay it will be the the the fast how it will respond it it will be what it's depend on the specification of your computer or your laptop okay that's all for the explanation how to make your own AI waifu virtual Youtuber thank you for watching and goodbye

2023-03-25 10:39

Show Video

Other news