Embeddings are a mapping of some type of thing (pictures, words, sentences, etc)...

Embeddings are a mapping of some type of thing (pictures, words, sentences, etc) to points in a high-dimensional space (e.g. few hundred dimensions) such that items that are close together in this space have some similarity.

The general idea is that the items you are embedding may vary in very many different ways, so trying to map them into a low dimensional space based on similarity isn't going to be able to capture all of that (e.g. if you wanted to represent faces in a 2-D space, you could only use 2 similarity measures such as eye and skin color). However a high enough dimensional space is able to represent many more axis of similarity.

Embeddings are learnt from examples, with the learning algorithm trying to map items that are similar to be close together in the embedding space, and items that are dissimilar to be distant from each other. For example, one could generate an embedding of face photos based on visual similarity by training it with many photos of each of a large number of people, and have the embedding learn to group all photos of the same person to be close together, and further away from those of other individuals. If you now had a new photo and wanted to know who it is (or who it most looks like), you'd generate the embedding for the new photo and determine what other photos it is close to in the embedding space.

Another example would be to create an embedding of words, trying to capture the meanings of words. The common way to to this is to take advantage of the fact that words are largely defined by use/context, so you can take a lot of texts and embed the constituent words such that words that are physically close together in the text are close together in the embedding space. This works surprisingly well, and words that end up close together in the embedding space can be seen to be related in terms of meaning.

Word embeddings are useful as an input to machine learning models/algorithms where you want the model to "understand" the words, and so it is useful if words with similar meaning have similar representations (i.e. their embeddings are close together), and vice versa.