100 structure with 100+ amino acids each, so it's not quite as bad. Part of the ...

100 structure with 100+ amino acids each, so it's not quite as bad. Part of the folding information is contained within a distance of a few amino acids, while some (the harder part and crux of the problem) is farther away.

But yeah, compared to other fields, the size of training/test sets is sometimes pretty small in ML for life sciences.