I would guess you'd have better luck parsing the html and extracting the href attributes of any <link> tags, src attributes of <script> tags, etc. Then pattern matching only against that.