Evolving Reservoirs for Meta Reinforcement Learning

p1esk · on Dec 26, 2023

I wonder if it would be better to use randomly initialized transformer blocks inside RC, rather than RNNs.

deyiao · on Dec 26, 2023

honestly, I fail to understand this paper. might vaguely grasp what it aims to do, but totally lost when it comes to the framework and the objectives of its experiments.

samstave · on Dec 26, 2023

>>>capture features of environments shared between generations to bias and speed up lifetime learning. ... propose a computational model for studying a mechanism that can enable such a process

through looking at how

>>>a reservoir encodes the environment state before providing it to an action policy.

-

This instantly reminds me of the genetic wonder that is the Monarch Butterflies Multi-Generational Migration.

--

>> The monarch is the only butterfly known to make a two-way migration as birds do

>> Monarchs have four generations in a year

>> The first generation is tasked with migrating from Mexico to the southern United States

>> The second and third generations emerge and lay many eggs in the north but do not have any role in migration

>> The fourth generation must trek south towards the forested mountains in central Mexico2

But here is the real kicker, the 4th generation is a super generation - and each of these generations are encoded with the "reservoir" of the state / role they play.

>> aging milkweed and other nectar sources, trigger the birth of the super generation and their epic migration. They live eight times longer than their parents and grandparents - up to eight months - and travel 10 times farther

Milkweed is their only food and egg-laying symbiotic plant - and its #1 enemy is Glyphosate (RoundUp) which is killing the entire natural eco-system)

--

Anyway, the butterfly is able to encode a multi-generational, alternating 'role' via a method of triggering which genome to express due to mapping to external stimulae. Such as the aging of the milkweed, the weather, and such.

SO

If you can have these external triggers presenting as a particular 'reservoir' from the genome, maybe you could associate external patterns with a particular 'reservoir' of knowledge to find context faster...

Especially when the GPT Store gets massive, to the point where you have many layers of GPTs woven together (entangled - but thats a bad word)... you might want to recognize the patterns of GPTs that one may want to weave into a tapestry based on the "state of the reservoir" which really is "the incentive" and then the appropriate reservoir will provide the appropriate set of GPTs for the outcome.

But the reservoirs are created through reinforcing positive feedback for particular GPTs to stack well. This will be an interesting problem to solve in the GPT store. Seems like a GPT<--$API$-->GPT 'peering connection' like ISP fee might end up in a commercial GPT agent market?

(a very long winded way of saying PAID TEMPLATES :-)

samstave · on Dec 26, 2023

They should be training on ferret:

>> We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify referring and grounding in the LLM paradigm, Ferret employs a novel and powerful hybrid region representation that integrates discrete coordinates and continuous features jointly to represent a region in the image. To extract the continuous features of versatile regions, we propose a spatial-aware visual sampler, adept at handling varying sparsity across different shapes. Consequently, Ferret can accept diverse region inputs, such as points, bounding boxes, and free-form shapes. To bolster the desired capability of Ferret, we curate GRIT, a comprehensive refer-and-ground instruction tuning dataset including 1.1M samples that contain rich hierarchical spatial knowledge, with 95K hard negative data to promote model robustness. The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting. Our evaluations also reveal a significantly improved capability of describing image details and a remarkable alleviation in object hallucination.

-

Whereby, I surmise This will make Drone-based AI image context for behavior extremely powerful - especially when aspects of that MLLM handling for spatial-sitrep extremely precise for autonomous movement, then ultimately for decision making WRT interacting with humans (positive interactions and negative interactions).

Is it just me, or doesnt this MLLM seem particularly useful for flying objects with vision?

however the more that I think about it, it seems like reservoirs can be applied to more than just "situational reactionary "dejaVu" which is what I think they are getting at (think LUCY where she ODs on a neurotropic and suddenly knows martial arts) (Also, lucy is a remake of Lawnmower Man if you hadnt noticed.

but i like the idea of reservoirs as an ability to know which gpt agent can work well together to solve to an outcome based on the weaving of other GPTs together to have a tapestry with similar relationships.

Forgot, wanted to add

https://imgur.com/gallery/bRCfnFv

The logo for the new DHS' 'Department of Entangled Agent Technologies and Humanity'

:-)