Is it possible to control when and which variables are copied to the GPU? With solutions like this, the automatic copy of big arrays to the GPU can take more time than the kernel execution itself. I would want to use GPU programs this way, but also require low latency.
It is! ComputeSharp doesn't copy buffers automatically, and this is done on purpose to give you more control over when exactly to copy your data back and forth. You can either use normal resource types (eg. ReadWriteBuffer<T>) and manually copy data to/from them before and after running a kernel on the GPU, or you can also create some transfer buffers (eg. UploadBuffer<T> and ReadBackBuffer<T>) and use them to have more fine-grained control over all copy operations and also the allocations of temporary GPU buffers to control copies.
I'm currently writing some updated docs to go over all these details, since the ones currently on GitHub only refer to the previous v1 release of the library.
One solution would be to create persistent buffers for the data on the GPU, and then map it and write directly to those buffers from the CPU. Big possible downsides here though, since the mapped memory doesn't behave like CPU memory.
Your variables would have to be accessed via pointers, but C# has robust pointer support and the compiler could probably rewrite them (JSIL did this)