Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: PyScribe – A Python library to make print debugging more efficient (github.com/alixander)
53 points by alixander on Jan 3, 2015 | hide | past | favorite | 18 comments



Small suggestion: make the output format configurable or at least choose simpler defaults. I think the lib itself is pretty nice and usable, but the produced logs aren't.

The log as output now contains a lot of 'human readble' text and almost full sentences, but in my experience when you are skimming a logfile manually, which for me happens a lot, it really slows you down because you constantly have to skip over the useless parts to get to the core which in this case would be values of variables.

For example:

- From line 9: x is the int 5: From is imo completely unnecessary. line could be left out as well for me, a number followed by a colon is standard enough and if you know it is a line, adding that word adds nothing to the usefulness of the log and makes for longer lines. Arguably one would rather call it an int and not the int but I'm not a native English speaker so not sure about that. It does sound weird to me though. Furthermore while I'm at it I'd leave it out all together. I would prefer something terse as 9: x int 5

- bar is the str foo at beginning of for loop at line 12 it really took a careful reading to grasp what it says. I don't immediately know what I'd pick instead, something like 12: begin for: bar str foo. Also maybe you should stick to standard Python printing as on the repl and use 'foo' in quotes so it's immediately obvious it's a string (you seem to be doing that while printing dicts?)


Excellent suggestions, will be adding format configurations next!


That being said, be careful to not make it too difficult to use.

Configurable is good, as long as there are sensible defaults that just work.


IMO good-defaults will emit text people could reliably parse/interpret/transform with other tools.

This helps keep the project simple and focused, since users with specialized needs don't need to ask you for specialized features.


I think it is unfortunate this does not support python 3. I think we have already passed a tipping point in the migration from python 2. Just look at the python 3 readiness[0]. I believe it might already be more interesting for beginners to learn python 3 over python 2 now.

All new packages should think about supporting python3, or at least be ready to support it with a lot of __future__ imports.

[0]: http://py3readiness.org/


I'm not exactly sure if it doesn't work on Python3, I had only tested it on 2.7 so it mightt (there aren't any external dependencies). I'll probably test it on Python3 and safely include that in the supported versions soon.


It wasn't too bad, looks like you had a couple of calls to xrange() (now just range()) and were using the result of a filter as a list. I've submitted a PR to add 3.4 support to it, and to add it to Travis.


I noticed a couple of minor issues and thought I'd post them here rather than making issues on github since the author is reading.

First, desugaring doesn't treat the shebang line of a file properly for executable python scripts on Linux. If the first line is, for example, '#!/usr/bin/env python', then the desugared file has three imports at the top of the file and the shebang line after those imports. It would be nice to preserve the shebang line as the first line (and maybe also to make the desugared file executable if the input file was executable).

Second, the desugaring doesn't ignore commented out lines, as I discovered on commenting out a line like

    #ps.p("my test: " + foo)
before noticing the optional label parameter. Trying to run a file with that line or desugar it results in an IndexError.


Ah, you're right. Totally forgot about that. Thanks for pointing it out!


This will probably sound very negative, and while I think there's a need for a nice library to help bring to light what happens to data in your code, I don't think the pyscribe way of doing it (using sugar to transcribing a piece of code into one with more logging) is very useful in the long run.

For one, it seems the only type of change you can watch for are changes via assignment for top level variables. For example, any of the mutable built-ins (and I confirmed this by trying to use pyscribe) can't be watched for all changes. For example, mutating the list later with .append() doesn't log anything, nor does changing a value in a dictionary. You also can't watch a particular key in a dictionary, or an attribute in a class (these lead to parse errors when trying to run pyscribe).

Even if you modified the code to support this, you wouldn't be able to control access everywhere that piece of data went. Suppose I had a dict and and some point:

  ...
  d = {'foo': 'bar'}
  ps.watch(d)
  ThirdPartyLib.do_stuff(d)
If that third party library mutated d, there'd be no way for you to know, unless you also were able to desugar those files (basically impossible because in some cases the source code isn't even available).

In any case, this type of mutation is generally what causes the most bugs. A value changing by assignment doesn't need a run-time logger. These change are in plain sight; just do a search for the variable name in the local body of the function. Mutation that occurs in other contexts (when the object has been renamed, or passed to another function where access is through a different variable name) is the difficult thing to debug, and pyscribe cannot handle that with the current design.

Honestly, I think this is better handled with mocking objects. See how python Mock library does things, and possibly use it yourself (it wouldn't be all that much work to write your own wrapper, but Mock is seriously powerful). Basically watching a variable means wrapping it with a mocked object that defers all reads/writes to the real object, while logging all those changes. Not relying on desugaring also means you can watch what happens to your data when you pass it into third party libraries.


Negative or not, I really appreciate the constructive criticism.

In response to your comments on watch: Indeed, right now it only identifies AST nodes of type "asgn". I imagine other mutations like append and others have different types too, I just haven't gotten around to implementing that. My bad for posting this in a pre-release state.

"Even if you modified the code to support this, you wouldn't be able to control access everywhere that piece of data went." I can see two potential solutions: 1. Identifying nodes of type "call" that have a watched variable as an arg, and then adding an if statement to check if it has changed and printing it only if it has. 2. Analyzing AST of ThirdPartyLib.do_stuff(arg1), identify statements that mutate arg1, and logging a change after the call in the original program if arg1 is changed. This way even if the value isn't changed, it's still logged because it was mutated (or at least attempted to), which is probably more desirable than solution 1.

In response to Mock: Aside from Mock, people have told me they prefer the logging library, pdb, IDEs, etc. for debugging. PyScribe isn't meant to be a separate method of debugging, I intended it to supplement my preferred way, which is just using print statements. Might be it's not the most powerful, but its purpose isn't to compete with other methods of debugging.


On further digging some more examples of assignments not caught:

  ps = pyscribe.Scriber()
  x = 0
  ps.p(x)
  ps.watch(x)
  for x in range(4):
    pass
  # x is now 3
  [x for x in range(5)]
  # x is now 4


Ah, I'll have to fix that.


I have been using python for some years now (although I'm no Guru..) and have almost always been able to debug using PDB or a python IDE. I'm not sure how this helps debugging? Is it debugging through some type of logging? I haven't checked out the source code yet, and the site is fairly brief on actual use-cases. Thanks for any feedback.


I wish I could give you pros and cons, but I've never touched pdb or used a python IDE before myself. In my 3 years of programming, I've only used print statements to debug (for Python programs).

I'll be adding more documentation soon, but perhaps this is a more informative use case: Too often in my Python programs, I'll do something like, print("x is: " + str(x)). That's already too much to keep typing, but sometimes I'll want to know the type, or maybe it's a dictionary and printing without separators makes them blend together (say, in a for loop). In that case, I'll do: print("---------\nx is: " + str(x) + "\n")

The library allows this to be simplified to "ps.d(x)". It's rather opinionated towards my own workflow and what I was too lazy to keep doing. Perhaps I'll try pdb one day and find my library useless.


But, it's dead simple to use an IDE. Plus the code you debug within the IDE is the same as the production code. It allows to use breakpoints, and watch vars. There are a bunch of free IDEs, what don't you like about using one?


You think you misunderstood; I didn't try it and not like it, I just never tried it. I'm most productive on Vim/Terminal right now. From a quick google search, it doesn't seem like I can use all the Vim commands in an IDE (though I see some articles on using Vim AS an IDE). Have you personally tried both and preferred an IDE over Vim/Terminal?


would like to share this which I found on twitter recently:

The Law of printf debugging: debugging messages inserted to track down unwanted behavior asymptotically approach "o_O"

-- @a_cowley

So thanks alixander for a considerable step in the right direction.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: