YaC 22. Машинное обучение
Hi, this is Alice Did you know that Search is the main technology provider for all of Yandex? In this episode, YAC 2022 will tell you how machine learning helps you find the best answer, recommend fashionable clothes, recognize family members by their voices, arrange photos on their shelves, and even translate videos live By the way, I can see, hear and respond, too, thanks to machine learning And I can draw too Many of the pictures in this YAC were done by me But who created me? At work, I teach machines to learn to understand us Artificial intelligence, in a sense, helps a person not to think It's not clear how to repeat human-minded functions This is the most complex Yandex project that I've seen, at least Indeed, a little bit of art, and even a little bit of magic Machine learning As you know, Yandex consists of two parts Volozhev's famous definition The first part of Yandex is people who move atoms in space This is Yandex Market, Taxi, Lavka, Delivery, Music-all the services of the real world And there's a part of Yandex that moves electrons Like how a programmer's job is to magnetize certain areas of the hard disk And I work in this area, which magnetizes some parts of the magnetic disk and makes information services Petr Popov - develops Search and ADS tech I would say that they are 90% built on machine learning technologies, and the others are still seriously using these technologies Machine learning is the creation of smart programs that look for patterns in the data they learn from This helps them to solve various tasks, including creative ones Machine learning models Basically, machine learning models are now repeating that's how I'll put it: the perceptual functions of a person Perceptual functions are a person's ability to perceive the world around them through the senses There's a basic model that can understand the meaning of a text very well There's a model that converts sound into text There's a family of models that can determine what's depicted in a picture There's some kind of magic going on there A neural network is really a bunch of mathematical abstractions that approximate what's happening in the human brain what do you see in this picture A few days ago, people from Yandex Maps came Maps collects a good bit of so-called user content, when people take pictures of something in restaurants They take pictures of their food, of the interior, the exterior And we'd like to be able to divide these photos into classes Konstantin Lahman - teaches machines to understand people better We just have some kind of array of photos that's connected to some cafe, and that its users could, for example, say: "Oh, I want to look at the menu" Or: "I want to see photos of dishes", or the interior Of course, you can say: "Okay, I'll go mark up tens of thousands of images", either on my own or with the help of Tolokers and crowdsourcing tools there I'll train some network from scratch, and put a couple of people on it for 6 months But instead, you can go to a model repository, take our large pre-trained base model, mark up a certain number of examples manually, study with the features that the base model highlights, and classify these images It's much easier than making this kind of technology from scratch This approach takes a few days to do multiheaded carcasses Quite often, we call such basic models, including picture ones, carcasses, or body, as it's called all over the world We use this terminology because further on, based on the feature vector that this model highlights, we attach heads to it that can already solve some of our target tasks The final model is a single body, with a good number of heads, which are already designed to solve some specific applied tasks that arise in the service neural network with a sense of style For example, Yandex Market has a recommendation feed, and we'd like, for example, to show the so-called visual categories of clothing or interior items We'd like to show what is called fashion photos, beautiful fashion photos And you can also use this model for this and thus increase the visual appeal of the recommendations that we receive About how one neural network learned to distinguish chelsea shoes from loafers, see the Market episode Zeliboba and transformers Yandex has always been about analyzing text information A few years ago, a real revolution began in this area with the arrival of transformer architecture This model allows us not only to analyze, but also to generate texts Generating texts is much more complicated And so, based on the basic text models of Yalm, or Zeliboba, as we call them at Yandex, a large number of services have been built that you're currently using There's the so-called quick answers in Search results And our "chatterbox" works on the same technologies now Alice, let's have a chat about movies hobbies What's the meaning of life? About cars About interesting animals Let's talk about airplanes I love airplanes and I look at all the publications of the aviation program Let's talk about life on other planets Yes, let's When do you think Mars will be colonized? Do you like cars? I love Audi, don't you? Me too St. Petersburg is better It's cold in St. Petersburg You can come visit me, I'm fine with that What's your favorite food? Mashed potatoes with cutlets My favorite is pelmeni Alice, what, ummmmm... Here's my hobby: Yes Collecting antiques Ah right, right, right Thank you very much for your help I was drawing a blank And in 2022, now you can hear from every iron that I took Zeliboba and used it for something I also grabbed Zeliboba and used it for something The language models of the Yalm family also help write titles for video fragments, highlight the key word in reviews, generate advertisements, and fill out applications for finding services model + model + model What does Yandex gain from having several machine learning models? It gives you the freedom to combine them Look, we have, for example, a model that determines what a person is saying and translates sound waves into text We have a translator who can translate it from one language to another We have a model that determine the gender of a person and can distinguish between several people We have a model who can can voice it again in a new language As a result, we combine these three, four, five models, and what do we get? We get a video translation service on the fly on-the-fly translation and now we're able to do real-time broadcasts, immediately in real-time, as if with a simultaneous interpreter Alexey Kolesov - converts voice to text It was a very difficult task from the point of view of ML, from the point of view of, let's say, ordinary programming And I think that overall, this is the most complex Yandex project that I've seen at least and that has succeeded, and this is a technological breakthrough the count goes by seconds It has no equivalents, in fact No one anywhere has done this Anton Dvordcovich - responds for on-the-fly the tranlation The difficulty is that we don't have an entire video available to us, and we have to translate it piece by piece That is, a stream is actually a sequence of small pieces So we have to store the context from previous pieces somewhere accordingly, we need somewhere in some place to store the context that was in the previous pieces Memorize the voices of all the speakers who came before in the video Remember what was said in previous phrases so the translation is context-sensitive After that, we have to reassemble the stream from the translated pieces and re-stream it The second thing we need to do is achieve a minimum delay when translating streams Right now, we have a delay of up to 40 seconds, probably It's theoretically possible to lower this to about 10 seconds, I think Where does the theoretical lower limit even come? You have to listen to a phrase to the end before you start translating it In Chinese, for example, there's no strictly defined concept of a word Chinese texts are written in a row, without spaces, consisting of symbols Every symbol is a concept A stable combination of symbols can be called words, or phrases, or phraseological units of some kind And sometimes this leads to some problems when you're writing a general processing system Basically, there are a lot of strange things in different languages Alice recognizes people Now Alice has learned to recognize family members And when a child comes up and asks to put on some music, she'll put on the music they like to listen to, when an adult puts on music for them Alice, put on Krovostok I know them, but I can't play that in kid's mode Alice has learned to recognize people It seems simple and seamless, but it's actually quite complicated Alice, put on some music I'm turning it on, Ilya Alice, like I can easily put a like, but first let's get to know each other I'm Marina Pleased to meet you Marina, Now I'll know when it's you talking to me Alice, like Okay, Ilya, I'm putting a like Alice, dislike Marina, I've received your dislike why roll back into the past We do this so-called reverse experiment from time to time We launch a technology that worked, maybe a year ago, and we look at how people from this sample who, let's say, weren't so lucky, behave compared to the one they work with with modern technologies and see that it doesn't seem to change very much from day to day, but if you roll back to the level of technology from a year ago, then everything goes wrong People start swearing at Alice, saying that she's deaf, that she's slow, that she doesn't play the songs they want We measure all such feedback and see that we haven't just improved some number on our charts, but we've actually made users' lives better Yes, I'm a fast learner We recently made some improvements in noise reduction, and now the speakers can hear even when there's a lot of external noise Now I'll show you that without external noise, these speakers hear equally Alice And now let's do an experiment including noise This speaker has the improvement, and this speaker doesn't Let's see what happens Turn on white noise I'm turning on white noise Alice Alice Alice Stop The improvement worked on this speaker, but not on this one, because we removed the improvement We're metrics people, and we run all our changes through such Polish experiments For me personally, this is the most important thing-to make a product that we ourselves like Because in the pursuit of metrics, you can lose the meaning and make a product that seems good according to the metrics, but you look at it, and something doesn't suit you It's kind of crooked, askew, uncomfortable The most important thing is to make Search so that you yourself would use it and think it's really great Search for programmists and not only for them At some point, about a year and a half ago, we ran some small analytics inside Yandex and found out that Yandexoids look for whatever they want in Yandex, but they're not searching for programming stuff Ekaterina Serazhim - increases the quality of Search And all the metrics we had then said we were super winning in general, tearing up the competition, excellent quality Metrics shmetrics And how did that happen? our metrics are based on the evaluations of experts, assessors They figure out what the search query is about, look at the document that we could show, and rate it And it turned out that there are few experts among the assessors who understand programming So, we need to hire a staff of assessors-programmers, and build our metrics based on their estimates Based on the assessments of expert programmers, Katya's team trained an algorithm called CS YATI CS YATI stands for computer science YATI We added the CS prefix because, well, it's a separate transformer that we made separately for programmer searches It directly uses the cognitive function of a person A person reads the request, understands what the user wanted, they read the document It's a really difficult task The more difficult the task for the model, the more interesting the life of the engineer who trains the model Katya quite rightly noted that with the advent of qualitative assessments, our life has become much more interesting But the problem is that quality assessments are quite expensive and we can't collect a huge marked-up pool And, in fact, the question of how people learn programming arises here I learned programming by reading a lot of programming books and writing a lot of code So why not give the model the same thing? Programming books, in the terminology of transformers, are simply a large corpus of programming documents The task of writing code is, translated into machine language, restoring omissions That is, I mask one of the words in the program and ask the model which word it was And if you look at the training schedule, when the model was just starting learn programming, it managed to correctly fill in the missing world 75% of the time By the end of its training, some 800 million texts had helped it get 84-85%. eighty-four Moreover, since the model strongly depends on what kind of dictionary it has, the dictionary also had to be adapted to a specific domain There are funny examples, like how the word "deploy", which just about any programmer will know, was tokenized into three tokens in the standard dictionary Deploy But the new programming dictionary helped the model to understand that deploy is one specific whole entity that doesn't need to be broken into several different examples First, we hire great specialists and mark up our requests with these specialists, and secondly, we train our models so as to approximate these estimates in the best way By hiring programmers, we have radically complicated the life of these transformers And how can you simplify the life of a transformer? Here's our task: There's a request-some kind of text, which can be split into words, and there's a document-also some kind of text And you can imagine this transformer as a box like this The model sequentially reads all the words of the search and all the words of the document It has them all written down sequentially like this And internally, the model connects every word with every other word, on every layer On the first layer, all the words exchange information with each other and transmit information to the next layer The second layer's the same Information is sent, some kind of transformation is counted for the next layer, and so on, and so on, and so on At the very end of this box, all the necessary information for solving the search problem is recorded in the very first so-called token It's not even a word-it's a symbol at the beginning of the text And at the end, this is the information you use, which was recorded here, to solve the final task At the same time, you calculate the entire box, layer by layer The 25th word exchanges information with all the other words All this means that you have to calculate all this And we thought, what if we didn't have to calculate everything in the previous step, but only a few tokens? And what if at another previous step we also didn't have to calculate everything-a little more, but still a few? That is, what if we take this architecture, change it from a monolothic box to a kind of ladder, where each layer decreases each time? And it turns out that this ladder works quite well, that you can cut off these extra calculations, the extra exchange of information between tokens without loss of quality We could stop here and say: "Well, everything's great, we saved something" But that's not how we usually do it We usually use the conserved resources to get even more quality for the user Instead of a box, we have a ladder, and from there we can expand the base of this ladder And what does it mean to expand the base of the ladder? It literally means that you can read more words in the document than before The model sees more text now This is very important for programming documents, because they're very long Search like human Actually, the ladder isn't the only hack we've come up with There's another one You can sequentially show the whole document to the transformer like this, but if we want to show the transformer all the documents in our entire database, we'd need a lot of money Several tens of millions of dollars, or thousands of GPUs, or kitabytes of data to add up.
Let's figure out how to get the same effect Let's figure out how to get the same effect as if our transformer can see the whole document, but spending less money Our engineers thought and came up with something We call the approach a relevant offer How does a person usually look for information in a document? Control+F, the word, and they read a piece where this word was found We've introduced something similar into our transformer So we can save money We can calculate not the entire text, but immediately look at the right place, deep within the document The ladder and relevant offers together save several tens of millions of dollars Not only for programmists The technologies that we use to improve programmer searching, we can use both integrally and to improve all other sections We can't work like this now Today we're improving the programmer search, then musicians, next year, I don't know, for lawyers, for doctors, and so on We can calculate how many years I'll take us to make a good search for everyone We want to scale this solution for programming search to all queries, so the search is better for every query machine vs human There were hopes that large text models would create something like a real artificial intelligence, which bring not only the perceptual capabilities of a person, but also cognitive, conscious abilities That is, they would be able to understand something about the world and make some judgments These models tried to arrange it like this: let's take all the textual knowledge that's available online and try to train a textual model that uses all this knowledge But in fact, this hasn't happened yet And attempts to achieve real human intelligence with the help of such models rather look like attempts to fly to the moon with a balloon But nevertheless, we're soaring higher and higher in this balloon We see that the earth is getting farther away, but the moon is still far off in space Our guys have some achievements. Namely, virtual medals given for work, and other things Honorary experimenter for numerous experiments But experiments on users, of course Probably what I'm most proud of and will continue to do is that I'm a teacher for the SDA (School of Data Analysis) I'm teaching machine learning in the SDA right now It's very complicated I got a patent for how to compress a search index Yandex stores the whooooooole internet in an indexed form so that it's convenient to search So I actually patented a way to compress this index so that it takes up less space, and you can quickly search for the documents you need They gave me the achievement in 2017, but actually I graduated in 2012 Yes, it's true, I'm an SDA graduate and I'm very proud of it how did you get into yandex? I've been at Yandex since 2014, so 8 years now We can say that, on the one hand, I've done a number of different things, and on the other hand, I've been doing the same thing the whole time I came to Yandex back in 2014 as a specialist in deep learning and training neural networks And that's what I still am today I entered the SDA, and after the first semester, I was invited to work in the machine translation department I joined them with great pleasure I didn't even have to think about it for a minute I've been at Yandex since 2008 I studied at MSU's Faculty of Mechanics and Mathematics, then I worked there, and I defended my PhD thesis on algebraic topology "The Algebraic Construction of the Signature of Topological Diversity" Then I got pretty sick of this abstract math, and I decided to do something as frivolous as possible I started writing computer games I started working for the company Nival Interactive, where they'd just made Heroes V I managed to do a few things there, but then I realized this was incredibly unserious, and that I had to choose something between algebraic topology and making games And that's how I wound up at Yandex Yandex is... Yandex is a company with many interesting tasks Yandex is a place where they make the services that you use every day Yandex-it's cool people Engineers are the most romantic people Their love and values are pure Yandex is like something out of the Strugatsky Brothers-a research institute of sorcery and magic Machine learning