An embedding is the mathematical process of converting tokens (text or otherwise) into a vector of floating point numbers. This vector captures semantics such that words that have similar meaning are close to each other when their distance is measured using the metric of cosine similarity.
An embedding is a numerical representation of a complex object.
In an image, each pixel is a dimension but it does not have any meaning in itself - you need to look at the rest of the image to understand that the pixel is part of a cat. Embeddings is a way to represent this meaning. Think about it as a “summary” of a image / document.
I'm not a data scientist or anything, but as far as my understanding goes, you can think of it as a list of numbers (floats, integers, whatever), each number representing a feature of the thing you're trying to represent as a vector (an image, a video, whatever).
For example, you can create an embedding for an image using a neural net that has been trained to receive an image and output a vector of 1024 floats, which represent the content of the image. This vector is a lossy compressed version of the image.
And (if I'm understanding correctly) vectors that are near each other (in a mathematical sense) represent inputs that are "near" each other (in a conceptual sense).
So... a vector database can be organized to quickly retrieve objects with particular characteristics, without rigidly defining what those characteristics are.
Yup I think you got it perfectly. Just a small note: yes one isn’t rigidly defining what those characteristics are while finding similar embeddings (aka nearest vectors using some distance metric), but those characteristics are implicitly encoded in the model that creates the embeddings depending on how the model is trained.