I can. But I do this by visualizing the taps as a group. I don't have to label them with a number. I can see them in my mind, thus recalling the taps. If I tap with any sort of rhythm I can see the rhythm in the way they are laid out in my mind and this helps with recollection.
If I want to translate this knowledge into a number, I need to count the taps I am seeing in my head. At that point I do need to think of the word for the number.
I could even do computations on these items in my mind, imagine dividing them into two groups for instance, without ever having to link them to words until I am ready to do something with the result, such as write down the number of items in each group.
But that's like how I memorize sheet music, visual groups and subgroups of notes, and yet sheet music is formally linguistic nevertheless. So in such debates I think a tricky pitfall to avoid is that all data structures are essentially linguistic as well.
If I want to translate this knowledge into a number, I need to count the taps I am seeing in my head. At that point I do need to think of the word for the number.
I could even do computations on these items in my mind, imagine dividing them into two groups for instance, without ever having to link them to words until I am ready to do something with the result, such as write down the number of items in each group.