Happy to show you some code. You cannot set up Bugout telemetry in your codebase...

TeMPOraL · on July 7, 2021

Hey, thanks for replying!

I like the design for your consent pipeline, and the code itself is very readable.

I have some further questions:

1. You say:

> You cannot set up Bugout telemetry in your codebase without first defining your consent flow

How is it enforced? Is it just an API limitation that I could work around by defining my consent block as below?

  def much_consent_so_informed() -> ConsentMechanism:
    def mechanism() -> bool:
      return True
    return mechanism

That is, are you relying entirely on trust and/or contractual obligations, or do you have some means of enforcing that the user of your SDK isn't cheating?

> Consent is calculated at the time that each reports are sent back. This means that your users can grant and revoke their consent on a per-report basis, which is the only respectful way to do things.

Correct. I like how you think about this. I assume the SDK user will be ultimately responsible for prompting the end-user for consent; I wonder if you have any "best practices" documents for the software authors, so that they don't have to reinvent respectful consent flow UX from scratch?

> We are also building programs which will deidentify reports on the client side, before any data is even sent back to our servers.

I don't see any code in that Kaggle notebook you linked (I'm not very familiar with Kaggle, I might be clicking wrong). Should I assume your approach is based on training a black-box ML model? Or do you use some heuristics to identify what data to cut?

zomglings · on July 8, 2021

Thanks for looking at the code, and for your feedback!

Here is a recipe for adding error reporting (reporting of all uncaught exceptions) in a Python project. The highlighted line shows that, when you instantiate a reporter, you have to pass a consent mechanism: https://github.com/bugout-dev/humbug/blob/main/python/recipe...

We allow you to create a consent mechanism that always returns true: > consent = HumbugConsent(True)

But even with that mechanism, we ultimately respect BUGGER_OFF=true: https://github.com/bugout-dev/humbug/blob/main/python/humbug...

Of course, someone can always create their own subclass of HumbugConsent which overrides that check. We don't have a good way to prevent this, nor would we want to restrict anyone's freedom to modify code.

re: Kaggle and stack trace deidentification

We started by crawling public GitHub issues for Python stack traces and built up a decent sized dataset of these: https://www.kaggle.com/simiotic/python-tracebacks

Our emphasis is on building simple programs that we can reasonably expect to run on any reasonable client without using an exorbitant amount of CPU or memory. For this reason, we aren't using black box ML models. Rather, we analyzed the data and came up with some simple regex based rules on how to deidentify stack traces for our v1 implementation.

We are in the process of doing this for more languages and building this into a proper deidentification library that can be imported into any runtime - Python + Javascript + Go + etc.

Apologies for the link not working. It seems I had to publish a version of the notebook. This link should work now: https://www.kaggle.com/simiotic/python-tracebacks-redactor

Actually, we started this work on a livestream if you're interested in watching: https://youtu.be/TFKe614Ml1M

Again, really appreciate your engagement and feedback. Thank you!