The second-life knockoffs were only necessary because of the trade-off they made to target low-price HMD market and on-device GPU+CPU processing rather than PC VR or processing puck design. This photogrammetry based avatar stuff has already been working for years. See Meta's previous "Codec Avatar" in 2021 (https://www.youtube.com/watch?v=bS4Gf0PWmZs) or Unreal's MetaHuman (https://www.roadtovr.com/epic-metahuman-mesh-import-scan-rea...).
what's the score with Epic and Apple Vision btw? I saw Lumen and Nanite now work in UE 5.3 on M2 silicon but not sure if it will be possible to develop apps for Apple Vision with Unreal, they just mention Unity?
> How did they go from second-life knockoffs to this?
Typically you do one MVP version and as soon as you've knocked that out, start working on a proper version, integrating feedback from the MVP as it comes in. I'm guessing they always wanted to have photorealism, but realized it'd take a while to get in place so they launched a quick version then got back to work.
> it'd take a while to get in place so they launched a quick version then got back to work.
This is not a phone app. Physical demos that propose something that a user has no point of comparison with (new paradigm) absolutely need to be on either end of the maturity scale.
- Either very raw barebone MPVs that should be only for lab/research aware people who won't cling to details and are able to see the 'principle' of what it could be.
- Or absolutely imaculate picture perfect marketing demos that you can show to non-technical people
Anything in between is a recipe for getting shot down, getting no buy-in, and receiving no insightful feedback.
> Anything in between is a recipe for getting shot down, getting no buy-in, and receiving no insightful feedback.
I don’t agree. I dislike Fb to the extent that I already deleted my account quite a while ago. And I thought the Meta attempt at a meta verse with cartoon characters looked ridiculous.
But the video in the OP link, with the photo real avatars. This is a whole other level. It’s super cool.
The fact that they made a ridiculous cartoon Metaverse first does not detract from the coolness of this photorealistic thing.
Still, I’m not gonna use any Metaverse that Meta will make. Photoreal or otherwise. But that is because I dislike Meta as a company.
The point is that what they have made here is some top notch stuff.
I look forward to seeing Apple compete with this. I hope that with an iPhone and an Apple VR headset I can get to experience something similar in a few years.
> The fact that they made a ridiculous cartoon Metaverse first does not detract from the coolness of this photorealistic thing.
I agree with you on the coolness of this demo. I disagree that the cartoon demo does not detract.
I suspect a large majority of users and advertisers assumed Zuckerberg went off-piste completely. This interview with Lex is much more aligned with what should have been their first public demo of a Metaverse 'vision'.
Really? Definitely looks like something a simple phone should be able to process... Are they not wearing phones on their heads in the demo?!
This "Publish > Feedback > Iterate > Repeat" loop doesn't just apply to smartphone apps, you can basically do this with anything in life, from hardware to mixology.
I guess you could say the same thing about LLMs, since the initial models were terrible and just basically demos, they shouldn't actually have published those at all?
> This "Publish > Feedback > Iterate > Repeat" loop doesn't just apply to smartphone apps, you can basically do this with anything in life, from hardware to mixology
I again disagree.
"Publish > Feedback > Iterate > Repeat" loop works when users can build on it from a previous reality that they can compare against.
Physical demos that propose something that a user has no point of comparison with (new paradigm) you can either show a sketch/concept of what you want to do (to a tecnhical audience). Or build a semi-realistic demo to a non technical audience.
Anything in between will probably be met with confusion or people focusing on details that are not the concept that you are showcasing.
I'm not sure how much facebook paid Fridman for this podcast but whatever it was, whether it was 0 or 10 million, it's probably the best value for money PR they've done in the last decade...or ever?
The implication in my comment is that they might indeed have not paid him anything. And Fridman on principle may not have accepted payment even if offered. But who knows what type of deals go down behind the curtain in the podcasting world, it’s not unheard of to come across interviews that look like paid promotion where the podcaster doesn’t explicitly state it as such.
But my point was more really that even if they paid him 10 million, the PR would have been worth it.
>That demo was going to happen one day or another as the tech progresses
Yes, all amazing technology is going to happen eventually — no one really cares about that. They care about seeing it happen in the present, right in front of their eyes. The metaverse was mocked profusely in 2023, and we have gone through multiple cycles of “This is going to be the year of VR” in past decades. Tech stocks have been flat the last couple of months and Meta is a public company. No one on the outside would have batted an eye if they quietly shut this whole thing down after being so relentlessly mocked for the past year. So saying Zuckerberg doing this demo was inevitable after the fact is a bit flippant I think.
And if facebook did a demo with a bunch of celebrities and put it on their website I’m not sure it would have the same effect as this long interview with Fridman. It wouldn’t have the same effect on me that’s for sure. And I’m pretty sure they would have paid those celebrities.
I think it's naive to even talk about payment, that's not how the podcast world works. The way it works is you have an assortment of bros who have no background in journalism and no second thoughts about ethics and their method for getting on high value guests is just to be incredibly soft interviewers and incredibly sycophantic to their guests. The result is rich tech CEOs get to go around getting licked up and down by these podcasters without ever having to worry about facing real scrutiny. And you don't even need a quid pro quo, Lex knows he won't book guests if he starts grilling them like a real journalist, and Zuck knows he's not going to accidentally endorse Nazi's if he sticks to safe interviews like Lex. And there really is no limit, Lex will literally try to serve up soft ball questions to Nazis given half a chance.
Of course anyone in the real world realizes how little credibility these "interviews" have.
They didn't, people just haven't been paying attention. They've been working on these avatars for almost a decade in parallel waiting for the hardware to catch up. The cartoon avatars are a function of hardware limitations.
> is close enough that they'll probably reach a version that is viable very soon.
Haven't we seen this type of thinking to be proven a fallacy enough in the last decade to at least be skeptical of solving that last 1% until we actually see it?
It is definitely not perfect. It doesn't seen to track distortions of the skin on the face, such as around the mouth on the cheeks. It tracks eye blinks, but does it track pupil dilation? Doesn't seem like it could track forehead creases, though possibly. Do the avatar's eyebrows track the real eyebrows? No way to tell, but an important part of expression. I rarely see the eyes moving, maybe an artifact of being in a dark space with only one thing to look at? Pursing of the lips is not tracked. Tongue? I don't see shoulders moving.
It's an impressive demo, yet as a repeated life experience it probably would start to feel artificial and uncanny. The video does discuss some of this. I get the impression some of it is prediction and some of it is tracking.
Casual users probably care the most, though perhaps not consciously. But all of these things are critical for understanding body language and facial expressions, and without those this is just a nice tech demo.
All things that will come with time. What makes for a viable product is if the mind is 'tricked' enough into believing it's seeing an actual person instead of a 3D replica.
> The background is either a white or black void, so they removed the entire "metaverse" from it, which I'm sure fixes most performance issues.
I doubt it's a performance issue: the 3D model of a rectilinear room has fewer vertices than used to represent a single earlobe. It most likely is just an unimplemented feature; they also could have "cheated" by chroma-keying a 2D-texture background or rotoscoped the avatars onto a another 3D room scene, but what's the point when its just talking heads?
I think a large part of the black background choice was to hide the missing bodies; any light past chest level is going to push this into uncanny valley pretty fast because the modeling ends at the chest like talking Greek busts.
Totally agree that it’s a stylistic and presentational decision and not a computational limitation.
Surely people have higher expectations for the environment their avatar is going to be in than a literal box?
Games often only have the highest detail models present in cutscenes, where the rest of the game can be limited, and the camera can be controlled to limit how many characters are visible.
> Surely people have higher expectations for the environment their avatar is going to be in than a literal box?
Yes - they do, but you don't have to get fancy with the environment for 2 static avatars locked in place. What are the odds that rendering a room was dropped because it is allegedly GPU-intensive vs it being a feature that is not fully-baked for a public debut yet?
Nvidia had built Maxine back in Oct of 2020. Arun Mallya also published One-shot Talking Head Synthesis in 2021 which does the same thing except removing the background from the rendering
Slightly of off topic, but how did Fridman get to the point that he could pull big tech C-execs and celebrities onto his podcast? He has such a monotone voice (no shade meant, I am the same way, no matter how hard I try otherwise) and - this might just be personal taste - but I feel like this unedited long-form interview format is the lowest bar. I would have thought that an excellent interviewer would generally be considered one who is efficient at extracting interesting data (or can at least edit it to appear so) and can engage with & challenge the interviewee; long-form seems like the opposite of this.
> I would have thought that an excellent interviewer would generally be considered one who is efficient at extracting interesting data (or can at least edit it to appear so) and can engage with & challenge the interviewee ...
I agree with the earlier part but disagree with this part of the comment. If the interviewee is someone particularly interesting, I learn a lot more about them when the interviewer just throws softball questions to help the interviewee more or less free-associate. Most of the time when an interviewer tries to "challenge" or "ask hard questions" it comes across as cringeworthy and a waste of time that could have been better spent letting the interviewee speak their mind.
I enjoy Lex's podcast, but he's been extremely well connected from the start. His 6, 7, and 8th episodes were Guido van Rossum, Jeff Atwood, and Eric Schmidt respectively.
He's done a great job of marketing himself as a non judgemental platform for celebs. He's a super positive guy and basically lets his interviewees guide the conversation, and let's the viewer make up their own mind.
I actually like his interviews, because he typically speaks less than his interviewees and doesn't guide the viewer to a certain opinion. Even his interview with Kanye West was like this - it was pretty insightful into Kanye's state of mind and didn't need any commentary.
Lol! I guess I'm biased coming from a slightly-more-engineer perspective than Lex, but my opinion is opposite yours.
I find he drones on way too much, and way too many first-person pronouns. "I-me-me-me-I-I-I" - it's pretty narcissistic when the guest is politely sitting there across.
Surprised you also brought up the Kanye interview, that's a great example of Lex getting reprimanded by the guest. Lex couldn't let go at a particular moment and kept injecting his opinion that Kanye had to slap down.
He’s a combination of good vibes (spread love), scientist, and long-form interview where you let the interesting person do most of the talking. It works with a lot of people including myself.
I’m guessing if you’re a famous scientist you probably don’t care/mind being interviewed by most poscasters, but Lex is the next-level thing to do
Here's a neat article from Nvidia that goes more into the bandwidth savings and some side-by-sides of ML + keypoint vs. compression[1]
Six years ago I went to an arcade called the rec room in Toronto where they have a ghostbusters AR game I got to experience with friends[2]. To this day I still have vivide memories of the experience. When it was all done, we all sat around for 15 minutes in complete silence, just recovering from the experience. I understand the skepticism about the metaverse, but after that experience there's no doubt in my mind it's the future -- you really need to experience it to get a sense of what a more polished version of this feels like.
My wife and I did the VR Star Wars game/experience at Downtown Disney about five years ago, and had a similar reaction. I've played a lot of video games, so I was just really stunned at how they integrated environmental stimuli, e.g. blasting hot air into your face when you stepped into a room filled with molten lava. My wife hasn't played a lot of games, and when Vader showed up she was genuinely terrified -- she's refused to do anything in VR since because it felt too real and too scary.
That is a very impressive demo. Does anyone know if the details of the codec, and the measurements used by the headset to generate the facial stream, have been publicly released yet?
The interview gets quite interesting nearer the end where Zuckerberg talks about his vision for the future, like training AI-backed avatars to simulate real people. Which apparently is not far off from reality - they joke about Fridman doing a podcast where he interviews an AI version of himself in the near future. Reminds me a bit of that Black Mirror episode where Miley Cyrus' character has AI versions of herself contained, or trapped, within consumer electronics.
It also occurs to me that this could be a useful technology for people with social anxiety who have to endure uncontrollable blushing when conversing in real life, or people with awkward facial tics. These could be filtered out or not even recorded before being encoded for transmission, while still giving a realistic appearance, making for a more comfortable presentation of the real-virtual self.
An entire fully "rigged" version of your face is generated from thousands of high resolution photos using a variety of facial expressions, distilled down into a model. Data from eye tracking and the cameras on the HMD are used to estimate the position of your face, lips, eyes etc, which is then fed into the pregenerated rigged model to be rendered.
that's a full Debevec style light stage but you can do this stuff with a couple of dSLRs and a home lighting setup with polarised light - kind of standard photogrammetry/hq texture/normal map generation techniques.
You can see from the normal map in the video it's pretty detailed but at least for a single face capture you can do this at home. I'm not sure what secret sauce they have for capturing multiple facial expressions and some ML magic how to morph/animate between those.
I don't see why an iPhone 15 Pro wouldn't be able to capture these scans, especially with the new "spatial video" feature, which takes a "3d" video using multiple lenses.
You'll get decent results but it won't be as good as with dSLRs and polarised studio light - if you want super detailed textures, be able to relight them etc
Basically for each of the identified poses you have a key and you grab a pose of it (photogrammetry here) and then you would capture performance and identify somehow which combination of keys and weights would be translated to you model. Sounds easy but it ain't.
Mark talks about a future in which you're wearing glasses all the time to seamlessly integrate meatspace and cyberspace. I dislike the idea of having cameras pointed at everything all the time, and the idea of having such tight integration between digital and physical makes me uneasy. It feels like a subtle push to help enable NFT-style content in the longterm. Maybe this is a sign that I'm growing old, and it'll be completely natural for the younger generation.
The technological showcase is really cool though. Right now people are paying dozens of thousands of dollars for fully rigged vtuber avatars and 3D virtual chat models, but in a few years we'll probably have AI tooling that allows you to do it yourself.
The discussion of non-human avatars made me wonder what kind of fantasy avatar Mark and Lex would use. If profile pictures are any kind of indication, I suspect a large number of tech users will opt to be cute anime girls. The days of catgirl-ification grow ever closer.
They discuss an idea of having celebrities and famous people train AI models so fans can interact with them. That seems so dangerous, arguably pushing parasocial relationships to a completely new level. It feels like it fundamentally hacks the human brain. Are we going to reach a point where most interactions are mediated through various layers of AI models? Maybe I'm being too much of a pessimist...
This looks amazing and the wow factor is probably due to the fact that it came out of the blue -- not a day went by in 2023 without the Metaverse being a favorite punching bag for tech writers.
Now someone do a PGP version of the Codec Avatar, where facebook included can't access the raw data, but streaming is still possible. Otherwise we get the Meta version of Worldcoin and merrily continue on the "we're completely fucked" branch of this multiverse.
I could honestly see this being the future of remote working, you still want face-to-face time, here you have it. Want to sit next to people in an "office" you can do it all virtually now. Of course you will need to add in the ability to draw on a whiteboard that's also rendered virtually, and a water cooler in the corner.
If my manager/boss told me I had to use some shitty headset from the cancerous entity that is Meta (ignoring that this also means that Meta will now have full body scans of people, as if that's not the worst idea on earth) to be in VR with my colleagues for my useless morning standups I'd quit on the spot and go do something more fulfilling with my life, for example chewing on gravel and sand.
Not willingly? Too many, I unfortunately have to work with Meta's horrifically terrible APIs and every time an issue crops up related to their APIs (nearly daily) I say a silent prayer for that company to disintegrate into nothingness.
Willingly, basically none, or as few as I can realistically manage. I don't have a smartphone other than a burner one with GrapheneOS for my bank app, run linux on everything else and my work MacBook sits idly in the corner somewhere with 0 battery in it despite protesting from my manager.
Well, except the fact that you lose out on spontaneous meeting points like someone heading to the kitchen while you're heading somewhere else and suddenly one of you remembers this one thing and a conversation happens in the hallway.
Bell Labs is a famous example where it seems to have played a role (together with the people working there of course, and other variables).
> But just as important was the culture of collaboration that the company fostered. The leaders of Bell Labs understood that physical proximity could spark innovation, and they designed its facilities to bring experts together in both deliberate and unexpected ways.
> At Bell Labs’ headquarters complex in suburban Murray Hill, New Jersey, all of the laboratory spaces connected to a single, vast corridor, longer than two football fields. Great minds were bound to cross paths there, leading inevitably to spontaneous and meaningful interactions. As author Jon Gertner writes in The Idea Factory: Bell Labs and the Great Age of American Innovation, “a physicist on his way to lunch in the cafeteria was like a magnet rolling past iron filings.” Throughout the labs, employees were instructed to work with their doors open, the better to promote the free flow of ideas.
Nope. Mark your coffee breaks in software like you do in Slack, then your avatar can just be standing in the coffee room or company cafe and people will be jogged there.
It's exactly the same. You can have the avatars doing anything you want, there is no difference.
Want them to do a pass by every desk in the office when they go to the kitchen, or bathroom? Possible. Want to turn that option off to get down to work? Possible.
Guess it depends on what you think is frequently, maybe once or twice a year when working in a office setting. Always happened at small companies, and usually what was sparked in the conversation had a big impact on what the company worked on.
So for the off chance that once in 6 months you have a conversation in person that "sparks something" everyone has to suffer commutes and all the other crap that comes with WFO?
No, I think people should be able to make a choice between working in a office vs working remote, which considering how many jobs are remote nowadays, you can kind of already do.
I'm not saying all companies should work in a office, I'm just sharing my viewpoint from someone who prefers in office compared to remotely.
The biggest problem with remote communication isn't presence. It's latency. You really feel it when you have those situations where people constantly talk over each other.
Imagine being in a meeting of like 12 floating torsos, you are position locked, but your boss enabled X-Y position for himself so he floats in front of you to express his concern of how you are not a team player. If I were at meta I would backdoor a sub-routine that increases gaussian blur if the boss gets too close.
Plenty of public polls on Blind and threads on HNews to see the impact of remote work. Don't think we benefit from ignoring that for many people remote work means hardly working.
If the only way an employer can tell if their employees are working is by forcing people back to the office, the company has bigger problems that employees not working.
I'm one of those. If I get the chance to work from home I do everything in my power to slack off and fake it. No tools or processes can stop me. I find ways to exploit everything and game the system to make it look like I'm working when I'm not.
Nobody actually eats their own dogfood. They ply you with wine while they drink grape juice, waiting to take advantage of the drunk fool.
Fucking Zoom declared mandatory RTO, which says a lot about what they sell the rest of us on.
And Ford...heh. Despite incentives, their own employees refused to buy their cars to such an extent that competitor vehicles were banned or relegated to remote parking lots.
> their own employees refused to buy their cars to such an extent that competitor vehicles were banned or relegated to remote parking lots
That's actually a good thing! It means they took it seriously, and their employees were incentivized to fix the problems so that they actually wanted to drive the company's cars.
I was a bit disappointed they didn't try swapping avatars with each other. Would've been interesting to see the effect of two people temporarily inhabiting each other's skin, albeit virtually.
I wonder if it could be useful technology in helping face transplant patients get used to their new features, in advance of the operation. A virtual mirror, with a reflection of the future.
Yeah I don't get how they looked at SecondLife and VRChat and went "no, people don't want that, people want to be entirely themselves in virtual world".
I think both are useful. If I'm talking with my Mom, or most people I would talk to on Facebook or say Linkedin it seems like a good fit but not most other places. Having an avatar in public places with randoms and the option for a realistic version in private places seems useful.
Sure, but the same technology is what enables you to be a photo-realistic Kzin. Once you have eye, mouth, and body tracking, an effective way to transmit that information in realtime, and a way to apply those motions realistically to a model, then you can swap it out with any compatible model.
They discuss this at some point in the podcast, Zuck specifically saying he is interested in whether the future is photorealistic or more abstract and commenting on some interesting tests they have run that have mixed the two together.
Perhaps it's a feature but I wonder how jarring the lack of progression would be given the static nature of models?
If you were to chat with someone daily for a year, you'd never see their hair grow (or get cut) for example but then I suppose we never actively think of these things, just notice when they change.
Zuckerberg said that he wished that it will be a 3 to 5 minute process to scan your face.
When they achieve that, it will be quick and simple to update your scan.
The people that keep an old scan at that point are the same people that also keep an old photo as their profile picture anyways, and it’s often not due to the technology.
Some are too lazy to update profile pictures no matter how simple it is. Some people don’t know how to do it no matter how simple, but that’s more rare. And some people purposely choose to keep an old photo because they want to be seen the way that they once were rather than the way that they are now.
Me for one, I will keep my 3d scan up to date every now and then. Say, every few months or so. Just like how I currently update my profile picture every few months on platforms that I actively use. (For example, I’ve been at my current company for about a year now, and in that time I’ve changed my Slack and company GitLab profile pics one time – from the picture I chose when I joined, to a new up to date photo).
Anyway. I am sure that if people stick to old scans, the Metaverse companies will eventually counter that by virtually “aging” the scanned models that people use. So that even if you don’t change your scanned model, the Metaverses will add grey hairs, wrinkles, etc to it over time.
Red Dead Redemption 2 (5 years old at this point) has natural hair growth, barber shops, hygiene, body shapes that change based on diet, plus a host of other time-oriented realistic simulation. [1][2]
It's just a matter of time before this stuff is incorporated into what we're seeing here.
> Perhaps it's a feature but I wonder how jarring the lack of progression would be given the static nature of models?
They talked about this in the recording, calling out (not/)shaving and weight fluctuations may or may not be reflected.
My own thoughts: we partially already there, considering people hardly update their static avatars/profile/professional headshot pictures on a weekly basis . We're inching towards the "Residual Self-image" of the Matrix universe
I have this issue with Zoom as well, when people have their perfectly selected picture from 5 years ago for so long, then at one point they turn on their camera and you can't help but see the difference between both.
This is truly amazing. It's interesting that they used their arms/hands a number of times but they didn't show up so it prevented them from communicating with them. Actually, I'm wondering how weird things were without being able to use their bodies. I wish they would have talked about that (or how eye contact is weirder/easier in this setting).
I'm imagining that with actual built environment it will be so much nicer as well. This reminds me of when a friend and I visited VR spaces (in VR chat I think?) felt like I was visiting a minecraft universe all over again with a friend.
This looks impressive. What's funny is I had an emotional first reaction that I'm going to be old and gray by the time fully body scanning is readily available. No longer my young supple self. Silly, I know.
Take a few hundred photos now for later avatar building. You can already have headshots generated from uploading a variety of ~40 photos of your face and head.
Did anyone else find themselves focusing on the eyes?
Pupil diameter in most humans is affected by autonomic arousal, e.g. in conversation it provides an often unconscious signal to the listener/observer. I didn't detect any dilation or constriction of the pupils in either head image; and for me it introduced some uncanny valley-ness.
The contrast between Mark's lighter iris color and both the blackness and relative smallness of his pupils drew my attention repeatedly. The middle image of the sampled video shows some contrast between iris and pupil but that might have been too noisy for their use. Anyway, I'd be curious, what they tried here. It seems they're rendering the pupil, I wonder if they'd tried playing with the diameter as a fixed proportion of the iris diameter, or whether they tested edge blurring for lighter-colored eyed individuals to reduce contrast.
I'd be curious to learn, but suspect that sending the "wrong" eye dilation information may be worse (e.g. sending a "beady" eyed signal triggering unconscious emotional responses) than just sending a static pupil size too.
So how come they are able do this in real time, on a headset, over the internet, yet next gen gaming consoles don’t even get close to that level of detail?
Games do a lot more than just "render a high quality bust of a person", you have whole environments and entire systems that are interactive. Most technical demos get away with higher fidelity because of this, and when you finally see it implemented in games, they've been scaled back a lot.
I’ve had the pleasure of sitting on a network that was, in practice, not bandwidth limited and it has led me to conclude that the terrible experience in practice is caused by retail ISPs being absolute dogshit. If you can get on a really well run ISP like Fiber7 in Switzerland, or a $BigCorp network, things are much better and demos like this are no problem.
Game consoles have lots of other details to worry about like the background (this demo is just an empty black background), NPCs and everything they need to do, game logic, physics, etc
latest consoles could, it's more about the software. Also easier if you have nothing else to render than a face. https://www.unrealengine.com/en-US/metahuman looks pretty good, not many games are using UE5 yet
I'm quite surprised with the overall negative reactions on this thread. It's like everyone's saying "why do we need emails if we already have the fax machine?" when this truly feels like the future
Sure, there's post-processing, scanning your face with this level of definition may take some work currently and the full screen video we see may feel very different for someone wearing those goggles, but these should all be solved as the technology improves
You can fight it all you want, but if it's half as good IRL as this demo suggests, it's obviously here to stay.
Criticism so far seems to make the same few points:
> "Hell, now remote work is ruined. Thanks, Zuckerberg"
I actually think this may help enable remote work in the long run as companies who are unwilling to accept the current WFH/hybrid models see this as a viable compromise.
More importantly, when I call my supplier in Belgium, I'll be able to see them "face-to-face" (avatar-to-avatar?) and develop a more human relationship than just exchanging emails or phone calls.
> "Great, social relationships are going to be even worse"
If that's the use you want to make of it, sure. But it also enables you to connect with loved ones who live far away. It can be so powerful for the elderly who struggle with loneliness to feel closer to their family.
> "These two guys are terrible at conveying emotions with facial expressions"
Ad hominem notwithstanding, if anything this is an endorsement of the casting choice for a tech demo since their expressions would be easier to replicate in the virtual world
In classic HN fashion I think there will have have been more positive comments added by the time you finished posting your comment.
> "These two guys are terrible at conveying emotions with facial expressions"
Fridman anticipates and addresses this reaction at the 10:08 minute mark in the video as well, so for the people that watched it these type of comments will unfortunately come off as unoriginal and stale. Either the people saying that didn't get that far or they felt the need to make the comment anyway.
They are achieved, but not in an equally immersive way. If you believe "more immersive = better" for those situations, then this is an improvement. I'm inclined to believe most (non-HN) people would take that position.
The interesting part is that even with current tech this entire interview could be performed by ML-powered agents. Create text and emotional markup with LLM, clone voices with Paddle, animate pre-scanned models with SelfTalk or FaceFormer and voila.
In several years we will be unable to say if the content is generated by an actual media person, ML agent or low-paid shadow-performer.
Seems like the kind of thing that should lower transmission bandwidth once they get it sorted out. Send the body model once and just articulate the joints/muscles. Probably on the order of 1kB/frame assuming FP16, I'm sure you could do better with compression and diffing.
Yes. These are Quest Pro headsets, which have eye tracking, so they should be able to pick up blinks, squints, winks etc. Apple Vision Pro should also be capable of this.
Apple Vision Pro previews have showed off not-quite-photo-real avatars [1] that feel like they're intentionally toned down slightly into 'good but obviously artificial CG animation' territory to avoid uncanny valley issues.
I listened to the entire hour when this got posted earlier today. I still have no idea what the point is here. Does anyone have a better use case for this technology than was presented in this video?
While these avatars seem pretty accurate, a conversation between two completely robotic humans might not be the best showcase of how far off we are from “feels like a real conversation”
I hate this so much. It's technically awesome, but I want nothing to do with it. If I want face to face time, I'll get face to face time. My fear is this is going to be shoved down our throats and WFH will become work while being monitored with your meta on. There may be cool applications for disabled persons or isolated folks like the elderly. But I do not want to be on this all day.
Also, I think it's pretty telling that the people who do want to be in VR all day, like dedicated VRChat users or the handful of people who actually work in VR collaboration apps, generally go for either vaguely humanlike abstract avatars (see: any screenshot of Bigscreen Beyond), or wildly un-ordinary robots, impractical anime people, favorite cartoon characters, etc (see: all of VRChat).
> I think it's pretty telling that the people who do want to be in VR all day [...] generally go for either vaguely humanlike abstract avatars
I'd wager that's more a product of technological limitations (and overall awkwardness) than a matter of demographics. Video games and other fully 3D environments tend to avoid photorealism at all cost, because it's compute-expensive and ugly. By comparison, simple cartoon characters, blobs or robots are inoffensive and perfectly usable abstractions. Even this "Avatar Encoder" is 'cheating' by only rendering a relatively static portion of your face. It would be almost unusable in a VRChat-style environment where dynamic lighting and shadows are concerned.
Zuckerberg's "vision" is so hilariously and transparently: "I want to build a new internet so I can own it all from the very start, inject ads and tracking into places never dreamed of before and take a cut from everyone who participates in this brave new world!"
Mark Zuckerberg testifying in congress reminds me of the Star Trek movie First Contact when Data was starting to feel anxious around the Borg, so he disabled his emotion chip.
Honestly though he handled that extremely well and made it backfire.
Congressional hearings are purely for political grandstanding. The low height seat countered with a cushion, the dumb questions answered with direct unemotional answers. 'we sell ads senator'. The entire process had nothing come out of it except a few politicians had egg on their face.
My impression is that the device isn't able to track all of the face's subtle movements so the avatars come across as seeming relatively expressionless. For example, I noticed that Lex's and Mark's eyebrows don't seem to move as much as you might expect given the emotions communicated by their voices. I assume this is either because the device literally restricts the movements of the eyebrows (perhaps they're pressed down under the headband) or it just isn't able to track them that well.
Lex Fridman is a Russian-American computer scientist, podcaster, and writer. He is an artificial intelligence researcher at the Massachusetts Institute of Technology, and hosts the Lex Fridman Podcast, a podcast and YouTube series.
Lex Fridman has also done original research on robotics and computer vision detection of facial expressions. Here is one of his papers; there are several others on related areas.
It's not a range test demo. It's a real conversation with real people who aren't prone to melodrama.
As mentioned in the video by Lex, it's the subtleties that make all the difference. I'm astonished with the accuracy of the blinking, mouth movements, subtle cheek variations, etc. It seems more accurate than the realtime feed from my webcam. The only thing I wouldn't like about it is having to wear a headset in order to experience it.
This seems like an incredible waste of resources that will invariably lead to work attire invading my remote work life. Veto.
The tech is cool but I'm wary of continuing to evolve the Internet to reward people for their physical appearance rather than their intellectual contributions.
Virtually dressing every day seems like a lot of work compared to just using an avatar that has your favorite three or four variations baked in, not to mention the basic assumption there that everyone wants to be a realistic (or realistically shaped) human and not a robot or an anime exaggeration of the human frame.
This is actually impressive, and while not perfect, is close enough that they'll probably reach a version that is viable very soon.