Hacker News new | past | comments | ask | show | jobs | submit login
Deprank: Use PageRank to find the most important files in your codebase (github.com/codemix)
164 points by phpnode on July 2, 2022 | hide | past | favorite | 43 comments



Great project!

One feature request: Running the npx command searched only for the js files, not for the ts files. When I built deprank locally with yarn, it also showed the ts files. After looking at dependency-cruiser figure it has to do with what typescript compilers are available where.

It would be great if the npx command you provide in your readme would work regardless of my local setup - dependency-cruiser has documentation and one example of a suitable npx command here: https://github.com/sverweij/dependency-cruiser/blob/develop/...

My suggestion would be to check if any ts file is part of the extension option (i.e. --ext=".js,.jsx, .ts, .tsx") and only then do the magic needed to also show ts files.


thanks for this comment, I'll look into fixing it. In the meantime I'm curious whether it works when you install deprank directly? e.g.

    yarn add deprank
    yarn run deprank ./src


Thank you! :)

It worked when installing inside the project folder and did not work when installing outside the project folder:

  // Success: Running in the project folder.
  me:~/deprank$ yarn add deprank
  yarn add v1.22.19
  warning ../package.json: No license field
  [1/5] Validating package.json...
  [2/5] Resolving packages...
  [3/5] Fetching packages...
  [4/5] Linking dependencies...
  [5/5] Building fresh packages...
  success Saved lockfile.
  success Saved 1 new dependency.
  info Direct dependencies
  └─ deprank@0.1.1
  info All dependencies
  └─ deprank@0.1.1
  $ tsdx build
  @rollup/plugin-replace: 'preventAssignment' currently defaults to false. It is recommended to set this option to `true`, as the next major version will default this option to `true`.
  @rollup/plugin-replace: 'preventAssignment' currently defaults to false. It is recommended to set this option to `true`, as the next major version will default this option to `true`.
   Creating entry file 602 ms
   Building modules 1.4 secs
  Done in 5.95s.
  me:~/deprank$ yarn run deprank src/
  yarn run v1.22.19
  warning ../package.json: No license field
  $ /home/dan/deprank/node_modules/.bin/deprank src/
  | Filename     | Lines | Dependents | PageRank |
  ------------------------------------------------
  | src/index.ts | 280   | 0          | 1.000000 |
  Done in 0.55s.
  me:~/deprank$ yarn run deprank .
  yarn run v1.22.19
  warning ../package.json: No license field
  $ /home/me/deprank/node_modules/.bin/deprank .
  | Filename                           | Lines | Dependents | PageRank 
  ----------------------------------------------------------------------
  | fixtures/core.js                   | 3     | 1          | 0.191112 |
  | fixtures/utils.js                  | 4     | 3          | 0.180576 |
  | fixtures/user/user.js              | 4     | 1          | 0.088966 |
  | src/index.ts                       | 280   | 1          | 0.069599 |
  | fixtures/todo.js                   | 6     | 1          | 0.060405 |
  | fixtures/user/index.js             | 1     | 1          | 0.060405 |
  | dist/deprank.cjs.development.js    | 829   | 1          | 0.053610 |
  | dist/deprank.cjs.production.min.js | 2     | 1          | 0.053610 |
  | fixtures/concepts.js               | 4     | 1          | 0.053610 |
  | dist/deprank.esm.js                | 820   | 0          | 0.037621 |
  | dist/index.d.ts                    | 36    | 0          | 0.037621 |
  | dist/index.js                      | 8     | 0          | 0.037621 |
  | fixtures/index.js                  | 4     | 0          | 0.037621 |
  | test/deprank.test.ts               | 28    | 0          | 0.037621 |
  Done in 0.60s.

  ------------------------------------------------------------------------------------------------------------------------------------
  // Failure: Running outside of the project folder:
  me:~$ yarn add deprank
  yarn add v1.22.19
  warning package.json: No license field
  warning package-lock.json found. Your project contains lock files generated by tools other than Yarn. It is advised not to mix package managers in order to avoid resolution inconsistencies caused by unsynchronized lock files. To clear this warning, remove package-lock.json.
  warning No license field
  [1/4] Resolving packages...
  [2/4] Fetching packages...
  [3/4] Linking dependencies...
  [4/4] Building fresh packages...
  success Saved lockfile.
  warning No license field
  success Saved 3 new dependencies.
  info Direct dependencies
  ├─ deprank@0.1.1
  └─ node@18.4.0
  info All dependencies
  ├─ deprank@0.1.1
  ├─ node-bin-setup@1.1.0
  └─ node@18.4.0
  Done in 4.95s.

  me:~$ yarn run deprank deprank/
  yarn run v1.22.19
  warning package.json: No license field
  $ /home/me/node_modules/.bin/deprank deprank/
  | Filename                                   | Lines | Dependents | PageRank |
  ------------------------------------------------------------------------------
  | deprank/fixtures/core.js                   | 3     | 1          | 0.223479 |
  | deprank/fixtures/utils.js                  | 4     | 3          | 0.211161 |
  | deprank/fixtures/user/user.js              | 4     | 1          | 0.104035 |
  | deprank/fixtures/todo.js                   | 6     | 1          | 0.070637 |
  | deprank/fixtures/user/index.js             | 1     | 1          | 0.070637 |
  | deprank/dist/deprank.cjs.development.js    | 829   | 1          | 0.062691 |
  | deprank/dist/deprank.cjs.production.min.js | 2     | 1          | 0.062691 |
  | deprank/fixtures/concepts.js               | 4     | 1          | 0.062691 |
  | deprank/dist/deprank.esm.js                | 820   | 0          | 0.043993 |
  | deprank/dist/index.js                      | 8     | 0          | 0.043993 |
  | deprank/fixtures/index.js                  | 4     | 0          | 0.043993 |
  Done in 0.28s.


Off-topic: I've read the name as if it were "de-prank".


I read it as "deep-rank" but I guess the intended reading is "dep-rank".


Out of curiousity: How do you reed "dep" with one 'e' as "deep"?


Humn brain is grat at patrn matching nd haz auto correct.


I also read it as deeprank


deprogram

depart

depend

deploy

deport

depose

deprive

depraved

depressurize

depress

depopulate

depoliticize

deposit


As did I, resulting in slight disappointment when I eventually figured out what this was actually about. There sure are some code bases out there that could use a bit of de-pranking. ;-)


> We define importance as those files which are directly or indirectly depended upon the most by other files in the codebase.

I honestly would have expected the opposite definition. Maybe I’m kind of old school, but in a well-architected large c-program for instance, “main.c” tends to depend (directly or indirectly) on every other compile unit, while there are no dependencies in the other direction. And I think “main.c” should be seen as “important”.

Why would this not be true for JavaScript or typescript?


I think I'm reading this opposite from you.

`main.c` depends upon everything else, but is not depended upon by anything else. A file like `datatypes.c` might be depended upon by multiple tiers of the application and be referenced by dozens of files, making it have a high pagerank.


Might be useful to navigate an unknown codebase. If you know the codebase already, you should know what forms the core that everything else depends upon.

The term "importance" might also be slightly misleading. What's more important, the engine or the transmission? The car needs both to drive.


I agree, but seen through the lens of wanting to clean up a codebase this seems useful. It's stated in the second paragraph on github that it's specifically useful for converting javascript to typescript as well.

If you have a messy large c-program with the intention of cleaning it up, would you say main.c is the most important file?

I think it's an interesting idea that could work on top of this utility to give you an overview of how messy the dependency graph of a project is.


If lots of functions depend on main.c then you have a problem ;)


they mentioned using it to add types to a javascript project. i can see why you'd start by adding types to the files that are most used first


If you are looking for a utility function then it will likely be found in a file that is imported in many other places. So intuitively this would be a reasonable way to rank files/search results when you are looking for something to import.


Another example is interface usage. You may have a commonly used interface with only one implementation. The implementation is the important part, but almost no usage depends directly on the implementation.


If you are looking to introduce types into an untyped codebase, as this project talks about, then you probably don't have a lot of interfaces defined, and if you do have things in your untyped codebase that are loosely analogous to interfaces from a decoupling standpoint (such as facades and factories) then this approach would advocate those are sensible areas to focus your initial typing efforts.


Great project! I can imagine this helps me get used to new codebases.

As others have already mentioned, it's better if it ranks important files like main.js higher than other util files. models.js and dataUtils.js can be imported from 20 files, but if dataUtils.js is created at the beginning of the project and barely touched since then, it should be ranked lower.

I think it's nice to rank frequently updated files higher, just like real page rank algorithms. If models.js is imported from many files and updated frequently, it should be ranked at the top. main.js is imported nowhere but updated frequently; it should also be higher. dataUtils.js is imported from many files but has not been updated for a long time; it should be lower.

It's much more complicated than normal static analysis. Source code is no longer the single source of truth. VCS' history should also be considered. It has to manage those SEO scammers too. We know those who commit again and again like "Update, Fix bug, Fix typo, Fix lint, Update, Fix format, ..." instead of a single meaningful commit message.

But if it's implemented properly, it helps explore unfamiliar codebases much faster.


It would be very usefull as intellij idea plugin, it could hook up to Idea AST and work in any language. And not just files, but methods. And it could be contextual depending on edit history and current context.

Probably great tool for quick start on new project.


This should be a “Show HN”, you’ll likely get more coverage too.

Question: Which package do you use for the linear algebra calculation? I couldn’t figure it out by skimming through your source code.


>Which package do you use for the linear algebra calculation?

He doesn't. https://github.com/codemix/deprank/blob/main/src/index.ts#L1...


I can imagine that "utils.js" and "math.h" rank highest, while "main.c" will probably have the lowest rank.

Doesn't sound like the ranking metric I'd want for code search results...


It seems fine for me. When you search for a function definition it will tend to show you the most commonly used one.


I can imagine that if you reverse all dep directions and do page rank on that as well, you can create a better ranking by calculating max(rank, revrank)


Fascinating. "Deprank is particularly useful when converting an existing JavaScript codebase to TypeScript. Performing the conversion in strict PageRank order can dramatically increase type-precision, reduce the need for any and minimizes the amount of rework that is usually inherent in converting large codebases." I wonder if the idea of using pagerank style systems for ~refactoring translate to other domain; e.g. organizational or knowledge refactoring (ala https://aviv.medium.com/when-we-change-the-efficiency-of-kno... )


My favorite approach for finding important files is to look at which files have the most number of changes in the source control. At least with Perforce this is very easy. The files that have many changes are ones where important logic happens. Ones that don't change much are boilerplate, low level object definitions, etc.


That's actually a nice idea. I guess we will see more software in the source code dependencies analysis space. There's so much code and often it's nice to have some kind of metrics (LOC, PageRank, ...) to get a grasp of what's important in a codebase.


Well, technically this is a godsent. However when trying to run on some of my projects both javascript and my typescript it comes up totally empty as of now. Bug?


hmm, sounds like it, please could you open a github issue with some more info?

edit: I've just fixed an issue that looks similar to this, please run with `npx deprank@0.1.1 ./path/to/folder` and let me know if you're still having problems


Cool. That was it. Works now. Hats of to you. This has more uses than just typescripting. Actually you can spot in React apps, probably in other apps, too, the dependencies which break your code splitting.


I've always wanted to see my reddit (and I guess HN) karma evaluated as my pagerank of comments. (Yes, lots of downsides--people with high karma have more "power". Yes it would be instantly gamed. I still want to see it.)

It would also be neat to see a reddit-like website with multiple formulas for "karma" all evaluated at the same time, like IVYMIKE (PageRank:1234 Classic: 1656 Experimental: 78)


Just started a new job and having a filetype agnostic version of this would be immensely helpful for learning the codebase.


It would be sort of funny to apply some reordering algorithms to the dependency graph, and use that to refactor a project. Maybe nested dissection. Or some fancy hypergraph reordering...


Perhaps this could be paired with a static analysis of the code's quality. Then you'd get the most used code with the worst quality.


This becomes impossible to do correctly because of the halting problem, interestingly. For example, suppose a routine calls F in a loop for most of its work, then at the end takes the square root by sqrtf(). Clearly number of calls matters for the edge weight in the call graph, but this tool would count F and sqrtf equal.

I suppose you could do it by sampling, then you actually just have to look at the sample distribution, though that would show you the graph weighted by cumulative execution times per routine.

As they say though, never let perfect be the enemy of good. Neat idea.


This actually has nothing to do with how often the code is executed. Code is ranked by its referents.


Right, which is exactly what I pointed out. The true call graph can only be obtained by basically running the program.

Counting references is clearly a compromise here. To see its drawbacks, consider indirection and dynamism.


This reaction is kind of like reading about a pre-election poll where the participants were selected by pulling them off 86th street at 11 am, and objecting that the percentages shouldn't be presented with more than two sig figs with a sample size of 100.

You're not, strictly speaking, wrong. But the methodology is already known to be deeply compromised, so your objection is kind of out there.


It was an observation that the choice to use references is actually a must, because of the halting problem, which I found interesting.

Not everything is an argumentation.


But this is about ranking code based on how central it is, for the purposes of choosing what to translate to TypeScript first. It’s not a compromise, it’s the correct approach. How often a line of code is executed is not relevant for this purpose.


I think you are mixing performance analysis with dependency analysis which is the point of the project. The sampling you are describing is commonly done by tools called "sampling profilers".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: