Hacker Newsnew | past | comments | ask | show | jobs | submit | hmft's commentslogin

Heyo, I was part of the team that worked to make this a requirement for federal agencies. Happy to answer any questions.


Yeah! Lots of NASA sites use of Let's Encrypt certs. Some examples here [https://crt.sh/?Identity=%25nasa.gov&iCAID=16418].


Hi there. First, you've got to begin with the understanding that no one is maintaining a list of federal .gov websites holistically (or at one I can get hold of). So, before scanning, we source several public datasets to gather potential .gov hostnames. This was recently described in depth by 18F [https://18f.gsa.gov/2017/01/04/tracking-the-us-governments-p...]. In addition to Censys, GSA's DAP, and the End of Term Web Archive data, our team performs authorized scans of federal agency networks [https://www.whitehouse.gov/sites/default/files/omb/memoranda...] and so we mine that data too. This currently nets ~90k hostnames, only which about a third are responsive.

For both hostname gathering and HTTPS scanning, we use 18F's domain-scan [https://github.com/18F/domain-scan], which orchestrates the scan and provides parallelization. We use the pshtt scanner to ping each hostname at the root and www for both http and https-- this typically takes 36-48 hours to burn through. Once the scanning is finished, we throw the data from the CSV into mongodb, then generate the report via LaTeX. The trickiest part is probably report delivery, which is a mostly manual process for Very Government reasons.

Most of the bureaucratic challenge is overcome because we've already been doing scans against these executive branch agencies for the past several years, so we're a known quantity, though we do modify our user-agent to clearly point back to us. On the whole, agencies have been very supportive-- the data on Pulse bears that out. Agencies really do want to do the right thing for citizens.


I appreciate you taking the time for an insightful and detailed response. The link you provided, "Tracking the U.S. government's progress on moving to HTTPS[1]" gave a lot of the details I was looking for.

You might consider mentioning it in this blog post as it does offer interesting background information and technical details.

As a specific example, the actual Python scripts used to generate the data[2] and the data itself[3], give a great deal of insight into the question I had.

[1] - https://18f.gsa.gov/2017/01/04/tracking-the-us-governments-p...

[2] - https://github.com/GSA/https/tree/master/compliance

[3] - https://github.com/GSA/https/tree/master/compliance/data


Heyo, ^ blogger here. Happy to chat.


And 18F/GSA employee and open source collaborator here. =) Can definitely help answer any questions folks have.


Are the HTTP report generation/assembly tools available/open-sourced too?

I'd love to be able to use this as a starting point. Thanks.


No, the code for report generation hasn't been opened up yet, mostly because it won't work without dependancies that aren't yet public. I think that will change in the next few months; open-sourcing is definitely an intention. It will live at https://github.com/dhs-ncats when released.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: