Hacker News new | past | comments | ask | show | jobs | submit | avaer's comments login

This looks a fine-tune of the classic zero123 (https://github.com/cvlab-columbia/zero123) I’m excited to check out the quality improvements.

Though 3d model synthesis is one use case, I found the less advertised base reprojection model to be more useful for gamedev at the moment. You can generate a multiview spritesheet from an image, and it’s fast enough for synthesis during a gameplay session. I couldn’t get a good quality/time balance to do the same with the 3d models, and the lack of mesh rigging or animation combined with imperfections in a fully 3d model tends to break the suspension of disbelief compared to what players are used to for full 3d. I’m sure this will change as the tech develops and we layer more AI on top (automatic animation synthesis is an active research area).

If you’re interested in this you might also want to check out deforum (https://github.com/deforum-art/deforum-stable-diffusion) which provides even more powerful camera controls on top of stable diffusion designed for full scenes rather than single objects.


> Stable Zero123 produces notably improved results compared to the previous state-of-the-art, Zero123-XL.

Zero123-XL might be state of the art for image to 3D but I'm not sure it is state of the art if your ultimate goal is text-to-3D. MVDream (https://mv-dream.github.io/) performs quite a bit better at that in my opinion.


How does one apply for a job with the the internal A16Z teams experimenting with this?


Ask Llama of course. Showing that you are willing to ask an LLM is a perfect sign for a candidate!


It’d be fun if they added Easter eggs to it just like how companies would advertise jobs in the browser console.


I ran this for most of today in the background and I have some thoughts:

The quality is good and it's giving you all of the maps (as well as the .blends!). It seems great for its stated goal of generating ground truth for training.

However, it's very slow/CPU bound (go get lunch) so probably doesn't make sense for applications with users behind the computer in the current state.

Additionally, the .blend files are so unoptimized that you can't even edit them on a laptop with texturing on. The larger generations will OOM a single run on a reasonably beefy server. To be fair, these warnings are in the documentation.

With some optimization (of the output) you could probably do some cool things with the resulting assets, but I would agree with the authors the best use case is where you need a full image set (diffuse, depth, segmentation) for training, where you can run this for a week on a cluster.

To hype this up as No Man's Sky is a stretch (NMS is a marvel in its own right, but has a completely different set of tradeoffs).

EDIT: Although there are configuration files you can use to create your own "biomes", there is no easy way to control this with an LLM. Maybe you might be able to hack GPT-4 functions to get the right format for it to be accepted, but I wouldn't expect great results from that technique.


> ... on a reasonably beefy server.

Out of curiosity, what kind of cpu / ram are you meaning here?

Asking because I have some spare hardware just sitting around, so am thinking... :)


A typical server would be 8xA100 with a dual CPU (128 cores total), 2TB of RAM. I doubt you have something like this sitting around.


That's only a "typical server" for some companies with specialised needs.

Lots of places have servers with 128-256GB of ram around though.


That doesn't track in my experience, but that depends heavily on how you define typical. Just going off total servers installed, what I see getting racked are typically 1+TB RAM, and anything less would be seen as low density (i.e. cost inefficient). We've got a whole batch in the 512GB range that are coming up on EOL. Dual socket is definitely less common, but not rare either.


In my corner of academia, 128gb is by far the most common RAM per node on something billed as a compute cluster (on random desktops and laptops it’s of course much lower than 128gb). I have seen a few 1tb+ nodes but they are rare.


I know nothing about server hardware but I'm curious how that works.

I have a decent PC (AMD 3990X 64-Core Processor with 256 GB of RAM), I'd have installed better/more components but that seemed to be the best you could do on the consumer market a few years ago when I was building it.

Are they using the same RAM I'm using with a different motherboard that just supports more of it? Or are they using different components entirely?

Apologies for what I'm sure is a very basic question but it would be interesting to learn about.


It's the same RAM chips (though error tolerance features are prioritized over pure speed). You would just need a server motherboard to support that many sockets, and a server chassis to support that motherboard, and a rack to support the cooling needs of that chassis.

Here's what a lowly 256GB server looks like. For a TB just imagine even more sticks:

https://i.ebayimg.com/images/g/dnIAAOSwcy1kFNqq/s-l1200.jpg


Typical Intel offering is 1.5-2TB per socket. Socket scales up to 8 (though the price increase 2->4 is very steep). Memory itself is registered ECC DIMMs (which is even lower cost than consumer DIMMs/unbuffered ECCs), but to get to 1.5TB density you need low-rank (LRDIMM) modules, which gives x2 capacity but at a higher price.


Interesting. The servers at places I work with are in the 128-256GB ram range.

The only real exception to that would be for database or reporting servers, which sometimes might have higher ram requirement (eg 384GB).

That's pretty much it though.


but there aren't many people can use these servers for amusement.


Proof of concepts don’t have to be optimized. That’s an exercise for the reader. ;-)


Definitely hard to keep up with the tech, even if you're deep in it.

I presented a 3D gameplay hack of this at the recent Blockade meetup: https://youtu.be/TfRJeedTeOs

The metric depth model I used (ZoeDepth) is quite new -- most previous models were inverse relative depth, with poor scaling properties, especially for artistic worlds.

But now there is a much better depth model coming from Intel called Depth Fusion which they are adding to the Blockade API and also open sourcing (!)...

Also worth checking out what's possible with SD ControlNet: https://twitter.com/BlockadeLabs/status/1634578058287132674


For a couple of years I've been compiling an (admittedly worse) fuzzy-parsed version of this with GPT-3 (and friends) to generate and play out executable Javascript lore in an RPG engine.

So, I am very surprised I'm only hearing about Inform now -- clearly not hanging out in the right circles.

Does anyone have any recommendations for keeping abreast of projects like this so we can leverage the best of open source and not reinvent the wheel?

I just found out about NarraScope and I hope to attend the next one. Is there other things I should check out? Twitter firehose doesn't work too well...


Wow, if you've been working on interactive narrative and you don't know about Inform and (I assume) interactive fiction (IF)...you are in for a treat.

Probably a reasonable overview of the community is the IF forum, intfiction.org. But there are many others.

A good overview of games written is ifdb.org and ifcomp.org. For example, https://ifdb.org/viewlist?id=k7rrytlz3wihmx2o

Things to search (include "interactive fiction" in each):

Tools: Inform, TADS, Dialog, Twine, Choicescript, Ink

People: Emily Short, Andrew Plotkin, Jon Ingold, Christine Love, Aaron Reed, Chris Crawford, Porpentine...there's so many.

Have fun!


Also, do yourself a favor and read the Inform Designer’s Manual, Edition 4. It’s dated (I think it came out in 1998) and pertains to Inform version 6, but it is still very relevant (modern Inform uses I6 under the hood), and it’s a lovely piece of technical writing.


In case you'll get an itch to play IF games, and don't know which one to choose. Picking at random from this list of top 50 as voted by community, never failed me so far:

https://ifdb.org/viewcomp?id=1lv599reviaxvwo7


https://intfiction.org/ is the main community hub these days for parser based game development, maybe some of the topics there might tickle your fancy


Also https://IFWiki.org is exactly what it sounds like. A Wiki of Interactive Fiction stuff.


Wikipedia perhaps.

Twitter etc are great for learning about new things. I don't think they are good for learning about specialized topics that have a long history.


DALL-E can't help in art direction but (somewhat surprisingly) GPT-3 plus a human scanning Pinterest can be an art director. I've spent hundreds of OpenAI dollars on this because it works.

To me, that means a version DALL-E which can do art direction without me in the loop at all is not too far off.


What does the GPT-3 element do?


As an lone indie dev making an AI-generated FOSS Final Fantasy, I can confirm it's possible :D. That era of games really fits well with the kind of content that AI is capable of generating today.

I've been primarily tackling the synthesis from the GPT-3 + 3D animation side but the 2d prerender idea is genius. I'll need to try to hook it up one of these weekends. The only hard part, it seems, is finding the navmesh.


I want to hear more about this AI-generated FOSS Final Fantasy. Is there anything you've shared about it yet?


Their HN profile has 2 links: https://news.ycombinator.com/user?id=avaer


Thanks! I thought I had checked their profile and didn’t see anything, but I probably just missed it.


If you're interested in an open source job trying to do this in the browser using GPT-3 + Codex synthesis, we are hiring.

(disclosure: I actually worked with the parent before but we haven't talked in forever :D)


Having worked on adjacent problems every day for 10 years, maybe my opinion counts for something (or maybe not).

But I think the top game of the 2030s will be something like AI Minecraft or AI Steam, where everything, including the very rules of the game, is generated from a structured data set optimized for the player.

And I think the "metaverse" (as much as I loathe the term) is going to go down as the labeled training set for bootstrapping this, just like the open web was the catalyzing training set for the (already admittedly magical) AIs we have today.

Further, I think Facebook won't be the one to design this, because that is not what their share price incentivizes.


Meta fundamentally doesn't understand what the Metaverse is going to be.

They picture a virtual space where you, a human, will go to interact with other humans you know and love.

They are thinking MMO Metaverse.

What it's actually going to be is a place where you go mostly alone into a virtual space that's being actively curated for and in response to your interactions with it.

Most people aren't going to care about hanging out for hours in VR with Aunt Patty whose political rants they can barely tolerate on FB.

But being able to bring back dead pets or loved ones to interact with, or have simulacrums of celebrities who want to be your friends, or experiences that are being tailored on the fly specifically for you and using eye tracking and pulse to reengage your disinterest?

The Metaverse is going to be the place where AI comes alive in ways it will be prohibitively expensive to do in real life with robotics and manufacturing. And ultimately what that can offer will beat out all other media.

We may have limited invitation of loved ones into our curated spaces, but it's mostly going to be a solitary (and yet intimately social - AI therapists/friends are going to end up well beyond Eliza) place.

Meta is a company that for over a decade wasted the data gathering opportunity of finding out what people DON'T like to the point it's damaged society.

They're just not going to "get it" in order to succeed as long as Zuck is CEO, in spite of very talented engineers.


As someone who's never played Animal Crossing, this sounds a lot like my impressions of the game, only its more dynamic and customized to you.

>simulacrums of celebrities who want to be your friends, or experiences that are being tailored on the fly specifically for you

If Aunt Patty spends her time hanging out with and unloading her crappy opinions on AI Paula Deen instead of spewing them publicly on FB it could be for the better. Although it seems like it could only make people's filter bubbles much worse.


While custom-tailored games may be interesting, it seems like such a thing would be socially fragmenting and isolating. People need shared experiences to relate to each other. I'm not sure the world would be a better place if people gradually have fewer and fewer shared experiences.


It seems like we could have both; people can generate worlds by some combination of automation and manually tweaking parameters or mods, then they can share that world with their friends, and visit worlds created by their friends. Some people may have esoteric taste, but the internet is good for finding people who share your esoteric taste, for better and for worse.


That begs the question of how those friendships came to exist in the first place. I’m of the last generation that had a largely analog childhood, where friendships were the natural outcome of being bored and in the physical presence of others. If our technology reaches a place where people basically never have to be bored, nor do they have to be physically present with others to alleviate boredom, then I wonder on what basis anyone would ever develop deep relationships outside of family.


Seems like a pretty natural consequence of robotic companions as well. Why put up with all the complexity of a human relationship?


Unfortunately very few large companies truly operate to make the world a better place if their making money depends on the other direction.


I wonder if carving out individual experiences would prevent users from displaying their use of said entertainment to create social status. So I’m coming at it wondering if people value the status that their form of entertainment provides more than the experience itself. On the other hand, maybe we’ll continue to just observe the death of the “main stream” as we all slip into our own niche communities, each with its own complex system of status signifiers?

This stuff really leaves me pretty puzzled. I’m a culture guy, English grad. Art is not supposed to behave like this!!!


> But I think the top game of the 2030s will be something like AI Minecraft or AI Steam, where everything, including the very rules of the game, is generated from a structured data set optimized for the player.

Steam and Minecraft are both social by nature. People very often want to play with other people. It's like the joke I do all the time about AI feeds on Netflix: they recommand the same thing to everyone because the AI realized that everyone want to talk about the thing they saw more than they want to enjoy seeing the thing. Humans are socials creatures.


A Kalvin-ball AI would be a pretty interesting novelty but I think good games are usually focused on simple rulesets. Not sure an AI is really needed there. Maybe you just mean hyper tuning drop rates and and modifiers and such?

I'm sure AI will eventually have a huge impact on the art, narrative, and engineering of games, though, so maybe you're correct that will bleed into game design as well.


I think with eye/attention tracking it will be absolutely amazing what 3D content can be generated and optimized by AI/ML to maintain the interaction feedback loop. I'm not sure it will be a good thing (at least for those who didn't grow up with it)... much like FB doom scrolling is a problem for the over 50 set.


Wow! That's a crazy thought. Eye tracking tightly coupled (i.e., in a ~realtime feedback loop) with PG/ML/AI seems ... powerful, or scary, or both. Something along the lines of a computer controlled lucid dream, sprinkle in a handful of whatever the equivalent would be of blinking banner ads, product placements, or subliminal messages, etc. and my mind spirals out of control imagining how that would play out.


I can't but remember the setting in Otherland. Probably my favorite take on how a future with many connected virtual worlds could be like.


It can't do that from scratch yet; these kinds of optimizations require nontrivial mathematical understanding and informed judgement of trade-offs.

But it is capable of knowing your function is an inverse square root and inserting a known optimized version.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: