We actually don't have a signature database, not needed.
WordPress have a predictable path structure and we use that to extract theme and plugin slugs (textual ids). For some plugins that don't import JS or StyleSheets we look for other signatures.
Once we have the slugs, we do a lookup in the official WordPress theme/plugin repository and get all the info we need (plugin descriptions, icon, author etc)
It has the look of using regex against raw page html.
I would guess you'd have better luck parsing the html and extracting the href attributes of any <link> tags, src attributes of <script> tags, etc. Then pattern matching only against that.
Tjena! Thanks for the bug report. We already detect a non-default WP_CONTENT_DIR (berghs is using assets instead of wp-content). But in this case it was our WordPress detection code that failed.
I'd be interested in the maintenance strategies you have in place (if any).
I assume that for plugins who don't output styles or scripts you use other methods, maybe some HTML output etc, so you've probably hard coded a lot of stuff for some popular plugins.
How have you set your tests and how do you plan on knowing when a certain plugin stops emitting the signature you're checking for? Most probably an E2E test with a local theme containing everything, care to share tech specifics ok this part?
We can only identify user fronting plugins that make themselves known through signatures or js/image/css imports. We do this in an automated fashion by exploiting the predictable folder structure of WordPress. So no maintenance needed here.
There are some very popular plugins (Yoast SEO, Jetpack, W3 Total Cache) that don't import additional files. For these we have hardcoded patterns (under a 100). We do not have anything in place for checking if these patterns break.
We could automate creating a WordPress installation, installing the plugin we want to check, trigger a wp detective scan and then checking the results. But I am note sure it is worth the engineering effort.
You assume correct, that is one of the methods we are using.
We also look for signatures in the code that certain plugins output. For example Yoast SEO usually adds a HTML comment at the end of the page identifying itself.
It usually detects it but we cannot show its icon, real name and description (only the plugin slug).
We collect this information from the WordPress plugin repository but it only contains free plugins. Paid plugins are not listed in a central place where we query their metadata.
Yes that is correct. We could attempt to detect all plugins by pinging known files but that would involve indexing all available free plugins and doing around 40,000 HTTP request on the site you want to check ;) Glad you like it!
Not really, indexing the top 100 most popular plugins would be enough, and you don't have to ping all files of each one, a single file for each one would be enough (so 100 request).
How extensive is the database? Just free themes/plugins from one source, or popular themes from several sources? Paid ones too?
Any namespace clashes where you have to dig deeper to tell which theme or plugin it is?
Were you able to fully automate the creation and updating of the signature database?