Hadoop isn't Oracle. If you need to write a file to the data lake, you write it. If you need to read something out you do that too. There's no schema, it's actually a file system with actual files in it, like your shared drive. Except that it's distributed, can handle arbitrarily large files and lets you perform queries on those files instead of downloading all 2TB to access a single row.