2023-03-08: Interactive Perception & Graphics for a Universally Accessible Metaverse by Ruofei Du
alaeddin: Hello, everyone, Thank you for joining the seminar series organized by the Empathic Computing laboratory. I'm your host, Aladdin, and our guest Speaker for today is Doctor alaeddin: Ruofei Du, who is a senior research scientist at Google and works on creating novel interactive technologies for virtual and augmented reality. His research interests include a. R. VR. Interactive Graphics. alaeddin: His work was published and recognized by many international conferences and venues. and in this seminar he will be sharing his research on interactive perception and graphics technology that
empowers the metaverse alaeddin: with more universal accessibility. alaeddin: The title of the talk is interactive perception and graphics for university accessible metaphors. The talk would be roughly an hour, and followed by brief question and answers, i'm sure we are looking. We are all looking forward to alaeddin: hearing his thoughts on an ideas on this topic. Please join me to welcome Dr. Du thank you for joining us. Ruofei Du: Yeah, thank you, Dr. Nassani, for inviting me to give the guest talk and thank you for the great introduction.
Ruofei Du: And first of all, I'm really honored to share. So some of my latest research in virtual and architecture reality at the Ecl Seminar Series and thanks, Dr. Nassani for inviting me. Ruofei Du: The title of my talk today is interactive graphics for universally accessible metabol. Ruofei Du: Before we started. I would like to give a brief video summary of my research. I'm, currently a senior research scientist at Google Labs. and you can also find my elected research in Google Scholar. Ruofei Du: My research mainly lies in the intersection of 3 fields. Ruofei Du: computer graphics human computer interaction and computer vision Ruofei Du: where I performed cross-filled research by taking the latest chem the graphics, techniques, consideration, techniques into interactive systems and creative applications in virtual and augmented reality. Ruofei Du: The agenda of the talk today is focused on a few of the key papers in Hti in about 1 h Ruofei Du: by topics. My research mainly focused on 3 parts.
Ruofei Du: interaction and communication Ruofei Du: digital world and teach the human Ruofei Du: next. Let's step deep into the main part of my talk in directive graphics for universally acceptable Macros. Ruofei Du: You may be curious what is matters. and how much averse is defined by academia and the industry. Ruofei Du: The concept of metaphors dates back to Science fiction, novel, snow Crash, written by New Stevenson in 1,992, and in this novel metaphor is an immersive, virtual, urban environment. Ruofei Du: Their people can treat the virtual lines and build horses in these matters. Ruofei Du: and the users in metaphor can either enter the environment with virtual reality, gossip and interact with other people virtually and remotely. Ruofei Du: A closer concept is the origin, environment in the recent movie ready player, one which is closer to a world-scale or rpg game. But
Ruofei Du: is this all what Microsoft is about not really Ruofei Du: industry. There are roughly like 2 kind of companies devoting to metabol. Ruofei Du: One is like a gaming platform, such as draw blocks Ruofei Du: which devotes to a platform where players can create things in a virtual environments and a traded virtual currency Directly and right now there are over 100 meeting players per month. Ruofei Du: The other companies work with like Microsoft Ruofei Du: and Facebook, and the matter Ruofei Du: who devotes to developing er devices and services. Ruofei Du: as from my personal experience experiences, the word matters is actually an idealized concept Ruofei Du: that encapsulates many bathrooms. For example, the future of Internet
Ruofei Du: the Internet of things in fiveg virtual reality. This reality Ruofei Du: and even blockchain, plus an ft mirrored world, digital twin and the virtual reality. OS: Ruofei Du: So despite of these passwords.
Ruofei Du: so from researcher perspective, how do I define matter of. And more importantly, what research direction shall we devote to matter of Ruofei Du: on my personal perspective matters? I wanted to be invasioned as a persistent, digital world where people are fully connected as as virtual representations Ruofei Du: as a teenager. My dream was actually to live in the Us. Ruofei Du: However, as of today. I no longer had such dream. But I personally, which matter was a formal tool to make information more useful and accessible, and help people people to live a better physical life.
Ruofei Du: So next, I would like to dive deep into several works I have devoting to metaphors Ruofei Du: and chapter one a mirror the world, and the real-time rendering. Ruofei Du: starting with social street view and the jewelry. Ruofei Du: Oh, so this is some old stuff. And speaking of social media, it's called social media cars, a wide range of topics, such as restaurant reviews, local news, and updates from families and friends. Ruofei Du: But in spite of the recent innovation in virtual reality and assembly's, reality. the current generation of social media layout is still most visualized as a linear narrative.
Ruofei Du: and rarely in the 2D layout. Ruofei Du: and almost never in a threed, immersive, mixed reality settings well. The traditional layoffs are efficient on phones for quickly browsing through social media posts. Ruofei Du: They, like the spatial context, associated with the social media. Ruofei Du: So in 2,016. So I presented the social trivial system in web, threed 2,016, and made some initial contribution in blending immersive street views with geotech social media using maximum poisson disk sampling. Ruofei Du: So basically the idea is to take the death map from the Google Street view and try to depict Ruofei Du: the the social media nearby onto the building walls, using a maximum, this sampling so that they are as a Ruofei Du: beautifully laid out on the build on the building walls.
Ruofei Du: However, the interaction here was still limited to 360 degree panoramas, where users could hardly virtually walk on the streets, but only teleport from 1 10 Roma Ruofei Du: to another panorama. Ruofei Du: And following that work in 2,017, we seen 3D video popularity where researchers are using virtualized to indicate the popularity of social media with virtual 3D buildings. Ruofei Du: and in 50 W. 2,017 Kuka Ital proposed the virtual auto which basically reconstructs virtual buildings, and, like the editors, to walk on Ruofei Du: virtual streets Ruofei Du: with social media depicted along the threads. Ruofei Du: And in West 2,018 a brush, has team, meets the virtual trib repuls where they can depict.
Ruofei Du: They captured hiking pictures onto threed Terrans and make it immersively explored in the virtual environments. Ruofei Du: and in recent years we have seen a high fidelity, and the Facebook spaces where Avatar can talk with each other. Ruofei Du: But still I would like to imagine, like what my social media platform look like in virtual reality. And what if we could allow social media sharing in the life mirror the world. Ruofei Du: and what use cases can be a benefit by depicting social media in virtual reality? Ruofei Du: So in 2,019 we published this jewelry system, which is one of the first mixed reality, social media platform that encapsulates habiters Ruofei Du: checking system, and, more importantly, the nearby social media from Twitter, yelp, and Instagram. And so that for example, here you are visiting the National Gallery in Washington, DC. And you can just draw the street drawings directly on the walls of the buildings, and you chat with another editors, and you can Ruofei Du: also imagine the future when you can walk into the museum and talk with other virtual editors, and look at the virtual paintings directly in the building walls. That will be the invasion of the world. Ruofei Du: So to to achieve that, so we build a technical system which basically get the information from 2D polygons from open sector.
Ruofei Du: And further, we got by excluding the to the polygons into 3D geometry. Ruofei Du: And next we added the habiters clouds, trees based on the metadata of it achieved from overstreet maps Ruofei Du: and meanwhile we can achieve the nearby Geo-tech social media from twitter y'all Fleeker, using their open access, Apis. and we also maintain our own database and library to contain like different forms of social media, for example, balloons, billboards, and give. Ruofei Du: and finally we it. We enabled the emitters to directly talk with each other in this mixed reality, social media platform. and next, I would like to dive deep into the rendering pipeline like how we achieve the rendering of the social media. Ruofei Du: Before we do that I can actually give you a live demo. So right now, I think i'm teleported into Auckland Hospital support. I don't know where I am in the it is a correct building. I'm realizing in the University of Auckland.
alaeddin: Yeah. Very close. Ruofei Du: Okay. So which buildings your life search like by our engineering building. Ruofei Du: Oh. Ruofei Du: let me see Ruofei Du: the network deal. Akland. Ruofei Du: It's our library. Yeah, you basically teleport to a library and Ruofei Du: let me see if oh, okay, here here I am. So basically, you see, like how the library is extruded, and you can directly visit it
Ruofei Du: and new. Well, you kind of the publish something new. Let me see. Hello Ruofei Du: and publish. Okay, Here we go, and If you come to the same place with me, you can translate to me if it's high. Ruofei Du: and the let's come back.
Ruofei Du: So I love to do live demo. For every talk I have been based, and you can see, like how the system is still running live. You can have this, my website for the live demo. Ruofei Du: So, basically as you see, we got the to the Polygons 3D geometry by extruding them, especially in New York City or San Francisco, the Google Street. The open street map contains metadata like how high a building is, or how many floors the building is. So we use the heuristics and Ruofei Du: actual building heights to depict the different buildings in real time.
Ruofei Du: And next, you have everything in the in the system, and you can walk around. Ruofei Du: And finally we fetch the sweet view panorama. Ruofei Du: That's map a normal map. Ruofei Du: and so that we go dive into the pipeline so naively like. If you have the only one panorama. It still give you the illusion like you are teleporting from one panorama to the next panorama. So to reduce the video Jeter and the video artifacts. Ruofei Du: we first depict 2 panoramas Ruofei Du: and feel. Ruofei Du: and then also out. We adjust the vertex of the each triangle on the hemisphere, so that the like the X Y. V. Corresponds to the true depth value according to the 2 depth maps.
Ruofei Du: and finally we figure out the intersected regions in real time, and also we get rid of the triangles which are intersected and the Ruofei Du: encapsulated in the hemisphere. Ruofei Du: And finally we do projection mapping to project the texture onto the 2 hemispheres. Ruofei Du: and we do a little bit alpha blending.
Ruofei Du: And this is how you are seeing. The University of Auckland renders in real time. There are still some artifacts, as you'll see. We don't have gaps data on the top. Ruofei Du: but you can already like a walk around and see the library in real time, and you can even like, draw something on the building walls. Let me see if it's still working. Ruofei Du: Yeah. Ruofei Du: Yeah. yeah, If you zoom out, you can. Ruofei Du: Yeah, there are some bugs on the building walls like 8 years in artifacts. Ruofei Du: But here you can see my drawings and the directly on the building wall here.
Ruofei Du: and yes, the system is still working on online, you can log in with your Google account. Ruofei Du: So finally, we did us do a study on trying to explore the use case, and how the user actually using the two-system sort of review and the jewelry. Ruofei Du: So we use a qualitative quantitative evaluation questionnaires and also post-hoc interviews Ruofei Du: and we find that by adding the ability to actually work in the mir of the world, the users find the new jewelry system more interactive and more creative. Ruofei Du: For example, people propose use cases, for example, to see all the restaurants with street views. It's really helpful. Ruofei Du: and also if it's useful to explore new places. For example, I could immerse myself into the location and also ask questions if there are, like virtual students walking the campus.
Ruofei Du: and people also think it may be useful to for families. For example, I just told my grandpa how to use facetime, and it would be awesome if I could teleport to my grandparent house, and the great wisdom virtually in virtual reality. Ruofei Du: And the more importantly like the photographic buildings are really critical to the system. So we also explored other rendering pipeline, for example, really reconstruct a campus building into the maps so that we can come back. Ruofei Du: The photographs photo rolesque building together with the map. Ruofei Du: This is also future work, and we haven't go that far. Ruofei Du: And they demonstrated in Kai 2,019 onsite, with one undergraduate. IC provides by the time. Ruofei Du: and since then I've seen follow up work. For example, people are bringing semantic data into the pipeline so that you can better reconstruct everything. For example, Here you see the green trees and the pur purple grounds. So by adding semantic implementation you can make the pipeline more efficient.
Ruofei Du: But still you also see people may breaking white balls and have a real museum of it's, editors. And I truly believe this kind of research can lead into a a real-time system which combines mirrored walls and the information around us, and make it more useful. Ruofei Du: And the gather idea I had here is like we have so many surveillance video cameras in cameras. Ruofei Du: So what if we can reproject these surveillance videos into the 3D maps so that you can immersively see what is currently happening in the campus, and it is more effective and efficient for the civilians camera administrators to look at Ruofei Du: the threed video rather than the 2D grid like Soviets video. So this is another exploration.
Ruofei Du: another exploration we did. If we try to use neural network to animate the morphine between when panorama to another panorama. So we have the our meeting paper. Ruofei Du: which basically they try to build a neural network by learning the depth of the tocanoroma, and also fuse the frames in between 2 and Romas, so that it is one Ruofei Du: seamlessly combined. This is still like a preliminary research, because we were using color synthetic data during the pandemic. Unfortunately, the Intern could not come to Google Office during the pandemic time. Ruofei Du: and this is purely down, remotely. So we decided to use upset data to achieve this work, and in future, I hope, like future research, could use real data and make this kind of research. Further. So this here are some results. Ruofei Du: Basically the goal is like a given sparse view from street view. We can synthesize Ruofei Du: the animation in between the 2 panoramas. So in future, when you visits the Ruofei Du: 360 house environment, all 300 and 60 street views. It can give you the smooth interpretation in between another line of work. It's like a using nerve. So here's a work from Weimo which uses the block nerve.
Ruofei Du: and they were able to reconstruct a building Ruofei Du: by taking like a hour long videos nearby. Ruofei Du: So we also wonder, like, how can we further accelerate the real-time rendering pipeline Ruofei Du: to that extent? We had a series of Forbes render for his rendering research. Ruofei Du: So this live research aims to Ruofei Du: accelerate the rendering pipeline by rendering high resolution in the peripher in the four-year region, so which is usually where your eyes are looking at. and the rather in low resolution pixels in your peripheral region, where your eyes are not paying attention to. Ruofei Du: So this simple idea has huge potential like how we can accelerate the future graphics. Pipeline. Ruofei Du: for example, given our original frame, we took it to a log-polar buffer, where the samples in the fouria region. It's more dense than the peripheral region.
Ruofei Du: and also, like we also applied this kind of framework to the further rendering pipeline, where the most time consuming computation is happening in lighting computation. For example, here the roughness and ambient and the refraction map Ruofei Du: takes the most time to compute. Ruofei Du: However, you by rendering them in a log polar buffer. We can efficiently reduce the computation about it, for example, as a factor of a half, or even one over 3.
Ruofei Du: And also, although we have additional path to restore the love of polar buffer into the full-screen resolution, we can still save some computation about it by Ruofei Du: having the most expensive lighting estimation in the smaller local block polar buffer. Ruofei Du: And we we do some user study by having people to gaze at one picture of forbidden rendering and the one picture of original rendering, and ask them if you see if you see a match of the 2 pictures, and they can find out the optimal Ruofei Du: like for your region versus the peripheral region by a fighter test Ruofei Du: and yeah, feel free to dive into the paper for more details, and to further ask me that we also had another idea in in this Ruofei Du: line of research. Ruofei Du: So next, I would like to everyone to make a small experiments. For example, you can hold your hands street and make a triangle in front. Ruofei Du: and not next. You can case at a target which is about like a 2 meters or 3 meters away from your Ruofei Du: 5, Ruofei Du: and they try to close one eye and close the other, and then you will find, like only one. I has the same image when you are opening your both eyes. Ruofei Du: and you can close left, close right, and for myself my right eye shows the same image within the triangle.
Ruofei Du: as if i'm opening both eyes. So this is called like I Dominance. Ruofei Du: and so for majority of the people like only one, I if you're dominant, I and the way when there, when research question, if that's possible, if we render the full resolution, you know dominant, I and the render lower adros in your non- dominant eye, so that we can further cut the rendering budget Ruofei Du: and the answer, is Yes, and we also do some experiments, and we verify that we can further has a factor of 1.3 or 1.4 acceleration, by having Ruofei Du: rendering only high reduction only to one I. We have a low reduction to the other eye, and you don't see much
Ruofei Du: image difference in the rendering with us. Ruofei Du: And along this research we also had, like a light field based for this rendering. And Ruofei Du: also we applied this kind of formation to 350 video streaming. Ruofei Du: This is also done with my intern activity, and we were able to stream the video using a similar scheme. Basically the high resolution is rendered in a great. This is because, like when streaming videos, it is better to store data 150 Ruofei Du: in a rectangle format, so that it's always aligned with the horizontal and the vertical pixels.
Ruofei Du: So we devised a new transformation called the log rectilinear, and in this way we were able to save the budget when rendering and the streaming the 350 videos along the way. Ruofei Du: So, in terms of compression, we recently have some interesting findings in neural compression. We call it a sandwich approach. Ruofei Du: And why? Because it's damage. So the reason that we are not changing any existing compression codecs, for example, right now we have the Mp. 4. We have the Jpeg.
Ruofei Du: Okay, so we say we want to keep the Ruofei Du: existing compression formats, but instead, we want to add a neural pre-processor and the neural post processor before and after the standard codec. Ruofei Du: for example, like a give a J pack encoder and a decoder, we add our neural encoder before, and a neural decoder after. So in this way a big guarantee, we can have a higher compression ratio than the standard image kodak. Ruofei Du: So this kind of idea grounds has some very interesting results. We can achieve, like a higher compression ratio, and without hurting the image quality. Ruofei Du: and the same idea applied for, like Hdr. Image or super resolution tasks. and this is majorly driven by one of my quarters. But a very interesting idea.
Ruofei Du: And Ruofei Du: so another idea we had to for the accelerate graphics. Pipeline is, try to Ruofei Du: apply the level of detail concepts Ruofei Du: to graphics and the neural rendering, for example, the level of detail in come to graphics refers to you render high resolution. Triangles, you know, near field, or where you'll pay more attention to, and resolution triangles in the peripheral region where you are not paid to paying attention to. Ruofei Du: So to that extent we can also have different layers in the decoder of your neural network, so that you can decode a Ruofei Du: like a lower resolution, with a calls, details in lower levels, and a higher resolution with more letters in Ruofei Du: high levels. Ruofei Du: So this sort of a neural decoding process will enable the future rendering pipeline Ruofei Du: a more efficient way. So, for example, if the chair is close to you, you go to level 4. If the chair is too far away. You just give a call shape.
Ruofei Du: It is very pre preliminary research published in the Icv 2 years ago. At the it gave you some top thought of some sort of Ruofei Du: idea like how rendering could be accelerated, and how a mirrored world could be reconstructed with accelerated technique. Ruofei Du: And Next, i'd like to go to some interesting chapter like computational interaction and algorithms and systems which highlight in that set Ruofei Du: so Ruofei Du: AR has gained mainstream popularity on mobile devices of a 1,000 of ars. such as poking on, Go, Snapchat and Ike up place. Ruofei Du: and these apps are typically supported by Google's airc or or apples, a arcade where you usually define a virtual plane like when you want to place the virtual objects. Ruofei Du: However, when you want to evoke such AI experiences.
Ruofei Du: you still need to like, Move your phone to scan the surface, and also the objects Ruofei Du: in the traditional academic reality app looks like past it on the screen rather than in the world. For example, the ideal rendering we want is like the virtual cat, should be behind the bed rather than in front of the bed. Ruofei Du: so how can we achieve it? Ruofei Du: In Holland we we have some technique called slam and Spatial Mapping, where you still scan the room, and you reconstruct the triangles. Ruofei Du: By computing the underlying geometry of the world. Ruofei Du: However, this matters takes time, and requires some initialization at the request, like one to 2 s. Ruofei Du: to fully understand the environment around you, and it's also has some host. When you are reconstructing the match due to the arrows in the flamm microphone. Ruofei Du: So in our work, we wonder, can we achieve a surface interaction, realistic physics.
Ruofei Du: all class planning, using that map? So here's some preview of what that's lab could enable you to interact with the physical Ruofei Du: things. For example, collision the ring of fork effects, flooding, relating, splashing. Ruofei Du: like turning things into server by touching them. Collision. So these kind of effects we don't really need to be construction of the real world. We just need a single desk map. This is what this paper is talking about. So to do this, we actually don't need any special hardware. We only need a single RGB camera. Ruofei Du: So this kind of technique it's called a structure from motion Ruofei Du: basically by just Ruofei Du: start start the app. We just give it a little motion by using the knowledge of where the Imu signal gave you like how? How, how their phone move. Ruofei Du: We pick key frames along the way. For example, there are 1 2 3 4 5 6 7 key frames. and then we compute the correspondence between the key frames and also leverage the leveraging, the Ruofei Du: mu and the post data of the phone, and in this way we can compute the disparity of the pixels. and finally we apply our back to a filter so that you see a full screen depths map along the way, and this does not need any further 3D reconstruction.
Ruofei Du: And the Ruofei Du: Yeah. So here is like how this is reconstructed and the the input Rtp map. We can get a Ruofei Du: depth map, and then you can further transform the text map into Point Cloud format Ruofei Du: by computing the Xyz value from the depth map. I will. Ruofei Du: but still, like using the death map, there is a large gap between the traditional knowledge of researchers and the knowledge of the designers, or at developers.
Ruofei Du: So we wonder, how can we inform the third-party developers and the designers to learn how we can leverage the tab to achieve more? Ruofei Du: So to do that they hosted 3 brainstorming sessions, together with designers, developers, engineers, and in the end we aggregated 39 ideas, which how we can use steps, map with AR apps. Ruofei Du: So and also we categorize them into 3 categories. Ruofei Du: using 3 kind of data structure. the 3 data structure. That's a rate that's mash. And that's texture. Ruofei Du: and basically that's a ray. It's simply a two-d array, which is a very low resolution here 160 by 120 pixels in a air core. Ruofei Du: and the that mash is a triangle mesh, which resides on the Gpu. And here we more of a grid like Mesh, by assigning the depth value to each triangle. Ruofei Du: and this approach is very similar to what we used in jewellery and the photo street view, as you can see here, and share the same philosophy of morphine or mesh.
Ruofei Du: using the depth map. And finally, we provide a gpu-based texture for rendering purpose. Ruofei Du: and we also categorize the that that use case Ruofei Du: of 3 kind of used cases. Ruofei Du: localized steps, service steps, and the dense steps Ruofei Du: and the look at steps uses the depth array to operate on a small number of points directly on the CPU, for example, by converting between the screen space and the wall space Ruofei Du: that's like provides a 3D oriented cluster which can orient its posts by by sensing the nearby pixels Ruofei Du: in the screen. Ruofei Du: So this is them by estimating the normal from the pixels. Traditionally the normal map is conducted by a simple cross product. Ruofei Du: but we find that it's very noisy so we further applied like a two-in of neighborhood to average the normal map.
Ruofei Du: And here you can see, like a the normal. The average normal map looks more blurry, but the most smoother for real time. Interaction, For example, here is a little shooting game we provided to our developers and designers, so that you can shoot a ray in the physical world in real time 150, Ruofei Du: and another use case is past planning, for example, with localized depths look up. We can instruct a virtual editor to avoid obstacles along the way when it's instructed to walk from 1 point to another point. Ruofei Du: and also we can enable ring and fork effects in AR by evaluating whether the ring jobs is heating the surface, using the localized steps. Ruofei Du: and for such that it's basically creates a dense depth. Mesh.
Ruofei Du: So we call this use case me that touch. When you are touching on the screen, it can create turning something into silver or goat. Ruofei Du: And the the other use case is the physical client. It's basically using the unity physical client engine underlying. And but instead, we create a screens with mesh, and you can easily achieve this Ruofei Du: without the needs of doing slam or doing 3D reconstruction. You basically use a single death map to do this.
Ruofei Du: So we also achieve the depth texture. This is called tactor decows in computer graphics where you can basically texture the depth match, using some balloon splash. Ruofei Du: And this effect we call it 3D photo. It's basically rotates existing camera and make a parallax effects. When you are taking the photos and using depth map, you can easily achieve this, and we provide open source code for you to do this already. Ruofei Du: and to for 10 steps. We also apply some anti-addling technique and the rewriting technique. Ruofei Du: This is mostly done by rematching, and you can track the virtual lighting near and far, and we provide the aggregate, some on the gpu for developers to adopt into their AR games. Ruofei Du: And this is my favorite example.
Ruofei Du: Like it's Basically, testing if the pixels are uploaded or not. So here you can see the Ruofei Du: table with light up, and when it goes far away it's restoring to dark again. And this example we call it a aperture effect. Ruofei Du: and the key difference is that traditionally, when you are taking photos for people. You need to anchor your cursor onto the
Ruofei Du: the the subject, and if you walk away Ruofei Du: you probably need to like focus again and tap on the screen. However. Ruofei Du: with the knowledge of the 3D world, we can tap on the screen and the focus on the flower, and no matter how close or how far you are walking around. It still knows the Xyz position of the object you are anchoring with, so that you can still keep the flower in focus where, blurring the rest of the world. Ruofei Du: And there's some more effects you can easily do with. Then steps. It's like a uploading object behind the physical bad and the apply for the effects
Ruofei Du: and include a virtual chair behind the physical table. Ruofei Du: So we further, like optimize the system and recommend developers, the optimal parameters for the relating and the arbitrary effects. Ruofei Du: and these are not that interested. So I just skip. Ruofei Du: And furthermore, we develop our 2 case and it hand over to Snapchat Team Bureau Ruofei Du: and also Tiktok. Ruofei Du: And if you use ever use tiktok filters, and you probably are using our that flab technique here, and Ruofei Du: it's also shipped in Snapchat lenses, where the snap, dancing, hot dog Ruofei Du: it's empowered by depth, app, and the Anders Ruofei Du: under the world, and also the depth, scanning and also growing trees on walls. Yeah, we are proud to have the partners to use our Api to achieve more.
Ruofei Du: The recently we also provided the the raw point clouds, and you can tap on the physical world and to anchor arrows onto the physical objects directly in Team Bureau Ruofei Du: and also that lab. And the Ruofei Du: Yeah, these are. We have also have the code lab available online. Ruofei Du: And yeah, and in future I also envision the laptops to be available on my IoT devices. Ruofei Du: because for now we call it pack of depths, because the values we are getting is passively, since the by estimating the depth from the motion. But right now the phones will have a 10 of 5 centers and the later sensors, and this will enable us to, since hands, and Ruofei Du: it will enable us to, since the dynamic world with a single RGB sensor in the future. Ruofei Du: And then we have the code open source in Github, and also the app is available in place store, and we have the media coverage from word etc. And I also want to point people like some interesting Demos using webex are. And recently you can see the
Ruofei Du: deep learning based depth, estimation, online. And I also invasion. The motion learning based approach will be dominant in the future. Ruofei Du: Okay. So after exploring interaction with the environment, how should we interact with everyday objects? Ruofei Du: So I would like to introduce the I of Ui Demo, which presented in Kai last year. And this is a very cool demo, and I can explain to you what's going on here.
Ruofei Du: Yeah. Ruofei Du: So basically. I'm holding a napkin, and this is the instructor I'm giving to the system. Ruofei Du: I'm asking it. I could. Okay. here is the live demo. So i'm asking the system Ruofei Du: show me the to date better on the card. So now you see the matter directly on the card, and you can move it far. It shows different level of detail. You t on the card, and it shows different ui's directly on the card. Ruofei Du: and if you flip the card you can ready to turn some other Ui onto the card, using voice, for example, saying, Show me the balance of the card.
Ruofei Du: and i'm using the Google's speech to text engine to do the recognition, and you can directly see the Ruofei Du: like balance of the card. And also Ruofei Du: I can move it in different perspective and still being tracked. Ruofei Du: Yeah, and here are some other demos, for example, like using the level distance to indicate the level of of detail Ruofei Du: and use the 6 of those Ruofei Du: to change the volume of the speaker, or you can use the one dollars recognizer to Ruofei Du: So to to the gesture, recognition in midair you can enter some virtual legos or virtual for nature directly in your Ruofei Du: your room, and change the lighting intensity.
Ruofei Du: using the everyday objects, and it you the color of the objects. Ruofei Du: So these Demos are recorded live during the pandemic email home. So lots of Demos are recorded in one day. And Ruofei Du: lastly, I want to show you, like you can also use your hand post to switch between the transcription or translation mode. When you are listening to a talk, for example. Ruofei Du: and when you are Ruofei Du: move your hands. In this way you can show the Chinese translation in in real time. Ruofei Du: and that will make the interaction between the Ruofei Du: the different Ui and the elements more natural and the elegant in the future, I believe. Ruofei Du: Oh, all right. And
Ruofei Du: what's next? Ruofei Du: Oops. Okay. So next, I also want to wonder. Like, can we learn from the history to interact with everyday objects. Ruofei Du: And this is in collaboration with a Cmu professor and also engineering from Google. We are, basically we make some cool demos by learning the history. The famous slurb project, which is a tangible Ruofei Du: interaction projects which you have a physical like color picker. And there's a camera within the color speaker, and you can pick up the color in physical life and then Ruofei Du: paint in digital paintboard to draw things within the physical color. Ruofei Du: And in this example, like we take the metaphor of the tangible computing scenario and make it working in hololens.
Ruofei Du: So basically like it's mostly like a design Rationale paper where we want to highlight we can. A lot of innovation could be done in the airspace by learning the Ruofei Du: learning from the history, learning from the past tangible computing scenario and making it really useful in alchemy's reality, interaction Ruofei Du: in terms of tangible computing. Last year we had an intern major majorly hosted by David Kim, and he he made a very, a low. Ruofei Du: low power, actually no power sending technique. We call it ritual fare. And the basically we use a pen and a very low power inflate Id illuminating the spares. And there's a very cheap sensor on the camera, and the base is able to run all day long. Ruofei Du: so that you can enable threed interaction in real time like like this. Ruofei Du: Let me show if there are some demos, for example, you can draw with a pen in this way. Ruofei Du: and Ruofei Du: it is aligned.
Ruofei Du: Yes. So, for example, these are some other examples where you have a ring on fingers, and it can interact. Ruofei Du: It can do pinch gestures without any power. Ruofei Du: So the main motivation behind this work is that we imagine, like in the future the classes would be lightweight, and consume very little power, and also the sensors in your hand. For example, like each time I use my oculus, I at home, I don't use it that often it may. It all always go off of battery.
Ruofei Du: which is really annoying. But with this kind of like path of thinking technique you can Ruofei Du: the all the sensors and the interaction of protest that does not need any battery, and it's the same thing cheap on the glasses is very widely available, and it requires very low power, and this kind of interaction will enable, like all day long, use cases in the future Ruofei Du: so also with recent advantages of on-device and email models. We we for the vendor like, how can we accurate the prototyping iphos. Ruofei Du: For example, recent years we see like Ruofei Du: the bory segmentation in real time, and the hand tracking in real time and depth, sensing in real time. So how can we build multimedia applications Ruofei Du: like as if we are using Legos.
Ruofei Du: So I honestly having to finish the slides and presentation for my latest paper. It's called the Website Accelerating machine learning, prototyping of multimedia applications through video programming Ruofei Du: and the through this work our team build a live system by allowing designers researchers and that more email practitioners to build a new application as if they were building Legos Ruofei Du: in real time. Ruofei Du: for example. So here you have the image notes and the boarding and meditation notes, and we can put it through a graphics by Jack and Job.
Ruofei Du: and you can easily create a virtual background application without any knowledge of coding. And you can also do the Ruofei Du: depth estimation in real time. And more importantly, you can test the robustness of your models by adding noise in the builder, and change the brightness. Contrast in real time. Ruofei Du: and Ruofei Du: also directly generate figures for your papers in the future. So this is basically like we are comparing 2 desk models side by side, and you can click the download, but it will give you a Pdf. Or Png directly for your figure in the future, and all of this you don't need any coding. It's a real programming platform. Ruofei Du: and here are some more demos. And Yeah. So so this we can Also, if you want to do some coding, You can also change the shader, for example, to Ruofei Du: create more blurry effects for your zoom real-time video conferencing systems.
Ruofei Du: And the but the majority of the system does not need any coding. For example, here you can crop the input Ruofei Du: in the system. And Ruofei Du: yeah, I hope to release the platform with the team around the Kai time frame, and so that I'm also going to demonstrate a live demo in this year, together with a team. So people can play with the system Ruofei Du: in real time on site in Germany. Hamburger.
Yeah. Ruofei Du: So yeah, we also try to system with all the the noisy models. So you can record some microphone recordings and do the denoting and compare to denoting model, like which one you you you prefer. Ruofei Du: Okay, let's dial deep into the final section. This to human and the automated communication. So this is a cool world. First of all, I would like to ask, answer the question like, what is editor Ruofei Du: and no advertisement of the movie. So we have the movie Avatar, so, which is very famous across the world. But the income, the graphics, and the comes to find. Ruofei Du: Oh, this is also not a Ruofei Du: back in Hinduism. It's referring to decent of a detail from a heaven. Ruofei Du: But the 105 Ruofei Du: editor is a graphical representation of a user all the users character of person. Now.
Ruofei Du: can you remember, like we we, what is the oldest auditor in computer history? Anyone on the call. Can you me answer? Mark Billinghurst: I know it's probably kind of boring already. So that's the 2 guests. Oh, there was some very early work done with Vpl. And that late 19 eighties that had advertised. But i'm sure you probably have on some older than that. But I, when I twisted VR in the 1990. They had very crude avatars. Ruofei Du: Yes, and the well-made editor. Yeah. But the editor, i'm trying to talk about here is actually technical. Ruofei Du: Yeah, it has a mouth, it can walk. So it actually a graphical representation of the user themselves. Ruofei Du: So what is this of the art of editors? So this is actually published in August, 2,019, where we reconstruct these 2 humans, using a last stage with more than 50 cameras around that person. Ruofei Du: And with this pipeline you can photographically render the person in any virtual environments by writing them.
Ruofei Du: and then back to a real-time piece of human Ruofei Du: the most famous work I want to highlight. Here is the holocaust, which is actually done by Shoreami Daddy, and also my director Ruofei Du: back back in the days in Microsoft research. They made the first ever real-time system to teleport a physical human to another room, using the one Ruofei Du: 8 pairs of LGBT cameras Ruofei Du: and the person and the real time. Nan rigid fuel and technique. Ruofei Du: and with such technique you can also record your child, and the more importantly, you can save the memory of what you have played with your child by like replaying the memory in real time, and also minimize the memory Ruofei Du: as if you are observing from a bird. I view. So this kind of interaction unblocks so many potential to the future.
Ruofei Du: So along this line of research I have done graphics, jobs trying to optimize the texture fueling of the pipeline, and by by by estimating the geodatics of the Ruofei Du: geometry, and I was able to do some real-time feelings on to the multiview textures so that it seamlessly blend the video texture from multiple cameras. And the the cool idea here is. Ruofei Du: Do not use all of the cameras, but use the cameras. That is Ruofei Du: like view dependence to the to the bureau, for example, like in the previous rendering, you'll see the color are coming from every 8 camera. But in the in the optimized pipeline. The Co. Idea that you only use the side camera front-fifth camera, and the the other side camera for the majority of the rendering part.
Ruofei Du: So this Ruofei Du: makes the rendering quality very much better. Ruofei Du: So what is the state of the art since then? Ruofei Du: So previously we have a line of geometry based research along the way, starting from Kinect fuel and and the dynamic fuel in and recently it's all becoming the error of deep learning. Ruofei Du: We saw the pipeline. This is actually still traditional, like we were using input, images estimating the depths map
Ruofei Du: point for some reconstruction and and point clouds. Ruofei Du: But after then it's getting replaced by geometry net. I'll be doing that and shading that, using mostly unit architecture in offline processing. Ruofei Du: Yeah, and the you need this pipeline. You can almost get a photographic human in real time in recent C Graphs. Ruofei Du: And another kind of this to human is pre reconstructed, and the rigged habiters and I want to highlight the rocket box.
Ruofei Du: and there is open source amateur, which is ready to use for all of your systems. Ruofei Du: And the Facebook. Meta is committing to phone phone scan based editors. So this will unblock you from like a creating your advert with some cheap device. Ruofei Du: So the next question I was wondering is, how can we build a dynamic, dense correspondence within the same subject and among different subjects? Ruofei Du: And a 3 Api paper we had 2 years ago was trying to learn the correspondence between different editors. Ruofei Du: and the the idea is to try to establish the mapping from one post to another post. Ruofei Du: and using this kind of technique we can Ruofei Du: do some animation from one person to another person to for animation, and also it can help to reconstruct a threed editor better in real time. Ruofei Du: And so how can we leverage real-time habiters as of today? And so I want to highlight this work we we did with Ruofei Du: Major I majorly supervised students, Jenny, and we explored different approaches to Ruofei Du: enhance our remote communication during the pandemic time Ruofei Du: so due to pandemic. One of the most annoying thing I have ever found is the low is a bandwidth problem. Oftentimes you are, you know, backyard or you are in your
Ruofei Du: at our home using home Internet Ruofei Du: and the home in times you you have to shut down your video stream so that your voice is clearly transmitted over the Internet. Ruofei Du: So we wonder, can we animate the profile photo when you are closing your video fields? And how can we deliver the message? Who is looking at whom, in the remote view conference. Ruofei Du: and with that motivation. So we did a very interesting research here, and by animating the amateurs using a web camera based I tracker. So here you can see, like we we learn the eye movement Ruofei Du: like beforehand, and we animate the profile photo, using the gate tracker from Ruofei Du: the webcam. Ruofei Du: and due to the time constraint. I just quickly go. So there are some previous work. We tried to use physical displays to achieve that, but we want to like to not rely on any external hardware. So to do that we basically learn from these Amy days Ruofei Du: using a profile photo estimates the depth map and to create a threed photo, using the same technology of that's lab and the jewelry. And finally, we apply the eye mask and
Ruofei Du: move the threed editor and only change the rendering of the I regions. Ruofei Du: Oh, here are from Oops. Yeah. Ruofei Du: the help from image here. Ruofei Du: Yeah. So in this way, like you can rotate the when they? They they are talking, and also you can change the real direction based on where we we which other I bet they are looking at. Ruofei Du: So I guess there's also some interesting finding like. Of course, this is not comparable to video streaming, but the video, like it, is more engaging than using traditional audio-based communication. For example, when you are Ruofei Du: when your bandwidth is low, or when you are using clubhouse like communication, where audio is the only medium available stream.
Ruofei Du: so by animating the editor. It makes your conversation more engaging and more fast. Ruofei Du: and we also explored like a remote works with stylized editors. For example, this is also with for kind of concrete, and we bring the Ruofei Du: 3 is catching to the fifth tone. Ruofei Du: and here are some most interesting demo where you can have your travel iterate in the VR.
Ruofei Du: In the very early days. Ruofei Du: And the more importantly, we can. You can turn your sketches into threed objects, and they teach mass concepts. For example, this is a hyper queue in four-dimensional. Ruofei Du: And here the demo, like you, draw you got something it turns to 3D Ruofei Du: automatically, and you can change the color in real time, and you can design some furniture planning or your partner apartment layouts with a remote participants in real time. Ruofei Du: And Ruofei Du: Yeah, this is most interestingly like how you can combine sketch with threed modeling and you it collaboratively with us remote participants in real time. Ruofei Du: Next we wonder, like coming further automatic communication. You know, video conferencing er and of Xr in the future.
Ruofei Du: So this is one of the most recent work which is led by me, and the mostly done by my interim proof. Ruofei Du: And the Bruce is still working with me on new new new projects right now, and we will present the video captain system in private here. I will just give you a quick Ruofei Du: overview. So basically when we are talking about the travel plans, we are meeting the video concepts. Together with our conversations. Ruofei Du: However, in the video captions we developed a video confident system which leverage large language model models. And we developed a 1.5 K. Diss of conversations and highlighting which video concepts it's mostly important to visualize in the conversation. For example, like Tokyo is located in the Ruofei Du: region of Tab now. So instead of giving you a picture of Tokyo, they give you a map of the Tokyo.
Ruofei Du: and we spend our we can see in use Matthew and the instead of so this this time we give you the picture of you, Smarty, when you are talking with remote people. Ruofei Du: and Ruofei Du: the Triangle building in San Francisco. So this time we give you the real picture of the Triangle building in San Francisco in real time. Ruofei Du: So we actually developed the system, using large language models by learning what we do contact to see. And we use search engines to render the Ruofei Du: Erez Agmoni pictures in real time in Google meet directly, and we deploy the system with more than 26 participants. And also we did a long-term user study. And we found that really interesting like people can use the system to 150. Ruofei Du: Talk about the food they love, talk about their favorite places, talk about their favorite movies or movie stars they like. Ruofei Du: And Ruofei Du: yeah, hopefully, it's a system that will be Ruofei Du: taken into production in the future.
Ruofei Du: And another sort of documentation we that we need. This guy is to segment out objects in video conferencing. Ruofei Du: And oftentimes our video conferences is, we use blurry virtual backgrounds. Ruofei Du: And, for example, if we want to show like physical products with remote people. however, with virtual backgrounds, you often feel like the objects we are showing, is occluded by the virtual backgrounds.
Ruofei Du: So, so to highlight the objects we are showing to other people. We basically leverage the machine, learning real time, machine learning, technique Ruofei Du: and the segment out of the objects in real time communication. And finally, you can see the Ruofei Du: virtual objects highlighted, and you can save it to some remote Ruofei Du: focused view. For example, when you are showing a book like you can prevent your view like this, so that everyone sees the content of the book in a larger screen.
Ruofei Du: And this is another interesting system, like how machine learning can help Ruofei Du: making our conversation more engaging and useful. Ruofei Du: And finally, here often works like how AI could benefit with the accessibility in a physical life and the virtual reality in the future. Like we train a basically between neural network to 250, Ruofei Du: the people who are deaf, and have a hearing to learn the sound events near their home, so that they can be more aware of their surroundings in the future. And another line of research we had is, how can we empower the metavers with AI to improve our life? Ruofei Du: For example, like right now, the last night's model is very popular, and back in days we also explored like using language to create coloring books for Keith, so that we can massively produce the Ruofei Du: coloring books and the Keith to release their creativity by using white comments to colorize the coloring book of of their favorites. Ruofei Du: And today we are in a very exciting area where AI generated contents are mainstream. not only in terms of science, but also all around the world. Ruofei Du: We see Dali 2 the large-language model creating pictures. We see. Imagine from Google also creating for the realistic pictures from language
Ruofei Du: Erez agmoni. And recently the Chat Gbt is one of the most viral phenomenon in the world, and even students are leveraging Cha Chbt to make their writing faster 150. Ruofei Du: So, sending this point, so as a researcher, what can we do to make the world a better place. Ruofei Du: I want to end this talk by playing a clip of Google I/O 2,022 to invasion. How AI can empower the technology and our physical life to be a better place.
Ruofei Du: for example. So here offend I is giving a talk. Ruofei Du: educating our initial prototype last year of our translation class which can help Ruofei Du: people in multilingual family people who do not understand other language or have problems who are Ruofei Du: for people who are therefore out of hearing. This glass has the potential to fundamentally change people's lives. for example. So here we interviewed a family in the Us. Where the doctor speaks only English and the the mother who only speaks Mandarin Chinese. Ruofei Du: and for for years they cannot understand with each other. Ruofei Du: But at the moment we gave the glasses to the mother, and it's the first time ever Ruofei Du: she. Steve. Ruofei Du: the language which is spoken by her doctor for the first time ever. And she basically cries
Ruofei Du: like this is what a perfect example like how technology can fundamentally change people's life in the future. And I wish, like every researcher should Ruofei Du: like try to devote the research. The ultimate goal is to improve people's lives. Ruofei Du: and, in my opinion, I do hope we can have a matter of in the future like this, where you can have the corporate dance experiences in the emotive environment as well as Ruofei Du: blending the virtual reality information into the physical world. but you do not lost yourself in the virtual reality.
Ruofei Du: Instead, you can learn history by seeing the events in the past, in the virtual, in the mixed reality. Ruofei Du: You can tour around the world in the mixed reality, but and learn culture, and meet new friends. Ruofei Du: and even predict the future by leverage. The last language model by using the this technology to realise the data, to talk with the AI agents to help you work more efficiently. Ruofei Du: And in this matter. There will be no gap in the communication nor languages. For example, this is a starline product in Google, where you can teleport the other person in real time, and you imagine you have the translation, last language model, or even a chat gp. See empowering, summarizing your conversation in real time. Ruofei Du: And as a research income, the graphics comes the vision and the human community interaction. We can make a better world with these tiny inventions
Ruofei Du: and thank you, everyone for listening, watching any thinking any questions of your call. Thank you. alaeddin: Amazing. Thank you so much. Lots of lots of work, lots of ideas. alaeddin: I'm. Sure we have lots of questions please feel free to ask or raise your hand. I will start us with
alaeddin: one question trophy. If you don't, mind. alaeddin: i'm amazed by the amount of ideas that you worked on. and alaeddin: especially in computer graphics, and how these ideas to help but make. make, improve people's life. alaeddin: May I ask, how do you come up with these ideas there like brainstorming session? Or do you build on top of existing work, or what is there? Is there something in particular that you can advice to our students and researchers to move in this direction to improve people life. Ruofei Du: Yeah, this is a very good question, thank you. And mostly Mostly the question comes from a real world barriers or gap I have found in my everyday work or life, for example, like the real captions, is like in the pandemic. We we. Ruofei Du: for example, the gate chat we use. Yeah, I mean, I can share the with everyone later. So Don't worry.
Ruofei Du: And Ruofei Du: this okay, Kate, Chat is a true problem when we are in the pen in the pandemic. When I saw the we have trouble like streaming videos for people who are especially across the country. Ruofei Du: and we also we only see the profile photo. So I wonder, like I may make it animated. So the initial piece of also not only animating the Ruofei Du: I, but also animating the mouse, but due to the limitation. We can hardly make the mouse animated in real time with the technology constraint. But in in these years I think the mouse could also be animated with your speech. Ruofei Du: So we just the lower down our expectation. So why not? We track the eyes and make the conversation more engaging? And so the entire project? It's done within 3 months or 4 months very fast. The with the gradient students.
Ruofei Du: and we were able to. Yeah, yeah, we all aligned with the go, and it's a terrific collaboration. This conference students and my students of my previous lab Ruofei Du: during the pandemic. Yeah. The other is like a Ruofei Du: the website projects which I recently published on Kai 2023 this year, and I have worked with many machine learning models in my past years, and oftentimes I found the like, the input and output very similar with each other, and people are doing repetitive works.
Ruofei Du: for example, like any nearest needs to Ruofei Du: like, build up some pipeline to read the webcam and resize the webcam and feed it into a machine learning model and see the results. And for every model they have different engineers doing the same thing. Ruofei Du: So I wonder what if we can have lego blocks and just people just driven jobs. It should be something as easy as it as this. Ruofei Du: So I talk with Stand up low teams with several engineers, and they agree on the vision. So we quickly we build the system throughout the year, and with many brainstorming sessions like weekly brainstorming. And the the same thing goes for depth. Lab, and we had many ideas are coming from like weekly brainstorming on the Google. She's Ruofei Du: usually the format I've I've been given is like, we have some. We watch some videos from Kai waste all commercial videos, and everyone brings down their own for 10 min or 20 min, and we discuss in the end of the brainstorming session Ruofei Du: and everyone present their ideas, and after the meeting we summarize. So in this way we can efficiently collect all the ideas together, and we also can discuss which ideas are worth. Well.
alaeddin: amazing. Yeah, that's a really good thought, because sometimes you can get caught in alaeddin: trying to brainstorm and not doing enough work and alaeddin: trying to balance this. I I especially like they say, project, and it's similar to alaeddin: to scratch platform in terms of the program, I guess. Ruofei Du: Yes, it is. Yeah, yeah, yeah, yeah, yeah, Sorry that I I didn't make any slides for website. And i'm still working on it before the alaeddin: that's great. I would look forward to seeing it in kind. We have a question from Chris, please. Hi, I've just got a couple of them. Actually.
chris chitty: now, I was just hoping that you can keep chris chitty: having your eyes so that you can show focus and distance because our eyes communicate a lot, and when we looking