Sure, but I think directness doesn't matter here -- what matters is just whether a url that originates in a Sydney call chain ends up in a GET received by some external server, however many layers (beyond the usual packet-forwarding infrastructure) intervene between whatever machine the Sydney instance is running on and the final recipient.
Yes. And chat outputs normally include footnoted links to sources, so clicking a link produced by Sydney/Bing would be normal and expected user behavior.
I think the question comes down to whether Sydney needs the entirety of the web page it's referencing or whether it get by with some much more compressed summary. If Sydney needs the whole web real time, it could multiply world web traffic several fold if it (or Google's copy) becomes the dominant search engine.
Directness is the issue. Instead of BingGPT accessing the Internet, it could be pointed at a crawled index of the web, and be unable too directly access anything.
Not if the index is responsive to a Sydney/Bing request (which I imagine might be desirable for meeting reasonable 'web-enabled chatbot' ux requirements). You could test this approximately (but only approximately, I think) by running the test artursapek mentions in another comment. If the request is received by the remote server 'in real time' -- meaning, faster than an uninfluenced crawl would hit that (brand new) url (I don't know how to know that number) -- then Sidney/Bing is 'pushing' the indexing system to grab that url (which counts as Sydney/Being issuing a GET to a remote server, albeit with the indexing system intervening). If Sydney/Bing 'gives up' before the remote url receives the request then we at least don't have confirmation that Sydney/Bing can initiate a GET to whatever url 'inline'. But this still wouldn't be firm confirmation that Sydney/Bing is only able to access data that was previously indexed independently of any request issued by Sydney/Bing...just lack of confirmation of the contrary.
Edit: If you could guarantee that whatever index Sydney/Bing can access will never index a url (e.g. if you knew Bing respected noindex) then you could strengthen the test by inputting the same url to Sydney/Bing after the amount of time you'd expect the crawler to hit your new url. If Sidney/Bing never sees that url then it seems more likely that can't see anything the crawler hasn't already hit (and hence indexed and hence accessible w/o Sydney-initiated GET).
(MSFT probably thought of this, so I'm probably worried about nothing.)