Phone cameras don’t have multiple lenses working together that you can interpolate between to create a single giant virtual lens. The technique OP is referring to is already used for the largest ground based radio telescopes.
Well, a lot of phones do have multiple sensors and lenses. For example, the iPhone's computational depth effect uses multiple cameras for a single shot.
That’s qualitatively different from the kind of stitching together that happens with, e.g. the VLA. The computational limits of phone photography don’t apply here.
For additional background, there are already optical interferometry telescopes in use, see VLTI by the European Southern Observatory (Chile, shared facility with the four VLT telescopes and some smaller telescopes).