Very true. Here's a paper from last year that measures the performance of searches on several implicit search tree layouts on CPU and GPU: https://algoparc.ics.hawaii.edu/pubs/18-ipdps.pdf. You can see the results in figures 7 and 9 (pages 9 and 10).
tl;dr: in an array of 500 million elements, if you're doing more than ~5 million queries (1%), it pays off to spend the time to construct a more efficient representation of the data and search on that. On a GPU, you need ~15 million queries (3%).
tl;dr: in an array of 500 million elements, if you're doing more than ~5 million queries (1%), it pays off to spend the time to construct a more efficient representation of the data and search on that. On a GPU, you need ~15 million queries (3%).