One feature request: Running the npx command searched only for the js files, not for the ts files. When I built deprank locally with yarn, it also showed the ts files. After looking at dependency-cruiser figure it has to do with what typescript compilers are available where.
It would be great if the npx command you provide in your readme would work regardless of my local setup - dependency-cruiser has documentation and one example of a suitable npx command here: https://github.com/sverweij/dependency-cruiser/blob/develop/...
My suggestion would be to check if any ts file is part of the extension option (i.e. --ext=".js,.jsx, .ts, .tsx") and only then do the magic needed to also show ts files.
As did I, resulting in slight disappointment when I eventually figured out what this was actually about. There sure are some code bases out there that could use a bit of de-pranking. ;-)
> We define importance as those files which are directly or indirectly depended upon the most by other files in the codebase.
I honestly would have expected the opposite definition. Maybe I’m kind of old school, but in a well-architected large c-program for instance, “main.c” tends to depend (directly or indirectly) on every other compile unit, while there are no dependencies in the other direction. And I think “main.c” should be seen as “important”.
Why would this not be true for JavaScript or typescript?
`main.c` depends upon everything else, but is not depended upon by anything else. A file like `datatypes.c` might be depended upon by multiple tiers of the application and be referenced by dozens of files, making it have a high pagerank.
Might be useful to navigate an unknown codebase. If you know the codebase already, you should know what forms the core that everything else depends upon.
The term "importance" might also be slightly misleading. What's more important, the engine or the transmission? The car needs both to drive.
I agree, but seen through the lens of wanting to clean up a codebase this seems useful. It's stated in the second paragraph on github that it's specifically useful for converting javascript to typescript as well.
If you have a messy large c-program with the intention of cleaning it up, would you say main.c is the most important file?
I think it's an interesting idea that could work on top of this utility to give you an overview of how messy the dependency graph of a project is.
If you are looking for a utility function then it will likely be found in a file that is imported in many other places. So intuitively this would be a reasonable way to rank files/search results when you are looking for something to import.
Another example is interface usage. You may have a commonly used interface with only one implementation. The implementation is the important part, but almost no usage depends directly on the implementation.
If you are looking to introduce types into an untyped codebase, as this project talks about, then you probably don't have a lot of interfaces defined, and if you do have things in your untyped codebase that are loosely analogous to interfaces from a decoupling standpoint (such as facades and factories) then this approach would advocate those are sensible areas to focus your initial typing efforts.
Great project!
I can imagine this helps me get used to new codebases.
As others have already mentioned,
it's better if it ranks important files like main.js higher
than other util files.
models.js and dataUtils.js can be imported from 20 files,
but if dataUtils.js is created at the beginning of the project
and barely touched since then, it should be ranked lower.
I think it's nice to rank frequently updated files higher,
just like real page rank algorithms.
If models.js is imported from many files and updated frequently,
it should be ranked at the top.
main.js is imported nowhere but updated frequently; it should also be higher.
dataUtils.js is imported from many files but has not been updated for a long time; it should be lower.
It's much more complicated than normal static analysis.
Source code is no longer the single source of truth.
VCS' history should also be considered.
It has to manage those SEO scammers too.
We know those who commit again and again like
"Update, Fix bug, Fix typo, Fix lint, Update, Fix format, ..."
instead of a single meaningful commit message.
But if it's implemented properly,
it helps explore unfamiliar codebases much faster.
It would be very usefull as intellij idea plugin, it could hook up to Idea AST and work in any language. And not just files, but methods. And it could be contextual depending on edit history and current context.
Probably great tool for quick start on new project.
I can imagine that if you reverse all dep directions and do page rank on that as well, you can create a better ranking by calculating max(rank, revrank)
Fascinating. "Deprank is particularly useful when converting an existing JavaScript codebase to TypeScript. Performing the conversion in strict PageRank order can dramatically increase type-precision, reduce the need for any and minimizes the amount of rework that is usually inherent in converting large codebases." I wonder if the idea of using pagerank style systems for ~refactoring translate to other domain; e.g. organizational or knowledge refactoring (ala https://aviv.medium.com/when-we-change-the-efficiency-of-kno... )
My favorite approach for finding important files is to look at which files have the most number of changes in the source control. At least with Perforce this is very easy. The files that have many changes are ones where important logic happens. Ones that don't change much are boilerplate, low level object definitions, etc.
That's actually a nice idea. I guess we will see more software in the source code dependencies analysis space. There's so much code and often it's nice to have some kind of metrics (LOC, PageRank, ...) to get a grasp of what's important in a codebase.
Well, technically this is a godsent. However when trying to run on some of my projects both javascript and my typescript it comes up totally empty as of now. Bug?
hmm, sounds like it, please could you open a github issue with some more info?
edit: I've just fixed an issue that looks similar to this, please run with `npx deprank@0.1.1 ./path/to/folder` and let me know if you're still having problems
Cool. That was it. Works now. Hats of to you. This has more uses than just typescripting. Actually you can spot in React apps, probably in other apps, too, the dependencies which break your code splitting.
I've always wanted to see my reddit (and I guess HN) karma evaluated as my pagerank of comments. (Yes, lots of downsides--people with high karma have more "power". Yes it would be instantly gamed. I still want to see it.)
It would also be neat to see a reddit-like website with multiple formulas for "karma" all evaluated at the same time, like IVYMIKE (PageRank:1234 Classic: 1656 Experimental: 78)
It would be sort of funny to apply some reordering algorithms to the dependency graph, and use that to refactor a project. Maybe nested dissection. Or some fancy hypergraph reordering...
This becomes impossible to do correctly because of the halting problem, interestingly. For example, suppose a routine calls F in a loop for most of its work, then at the end takes the square root by sqrtf(). Clearly number of calls matters for the edge weight in the call graph, but this tool would count F and sqrtf equal.
I suppose you could do it by sampling, then you actually just have to look at the sample distribution, though that would show you the graph weighted by cumulative execution times per routine.
As they say though, never let perfect be the enemy of good. Neat idea.
This reaction is kind of like reading about a pre-election poll where the participants were selected by pulling them off 86th street at 11 am, and objecting that the percentages shouldn't be presented with more than two sig figs with a sample size of 100.
You're not, strictly speaking, wrong. But the methodology is already known to be deeply compromised, so your objection is kind of out there.
But this is about ranking code based on how central it is, for the purposes of choosing what to translate to TypeScript first. It’s not a compromise, it’s the correct approach. How often a line of code is executed is not relevant for this purpose.
I think you are mixing performance analysis with dependency analysis which is the point of the project. The sampling you are describing is commonly done by tools called "sampling profilers".
One feature request: Running the npx command searched only for the js files, not for the ts files. When I built deprank locally with yarn, it also showed the ts files. After looking at dependency-cruiser figure it has to do with what typescript compilers are available where.
It would be great if the npx command you provide in your readme would work regardless of my local setup - dependency-cruiser has documentation and one example of a suitable npx command here: https://github.com/sverweij/dependency-cruiser/blob/develop/...
My suggestion would be to check if any ts file is part of the extension option (i.e. --ext=".js,.jsx, .ts, .tsx") and only then do the magic needed to also show ts files.