Creator here. In this project I tried to bring a new perspective on code repository, transform "git log" into a contributor collaboration network, display it nicely and try to make it useful and hopefully interesting. I'd very much appreciate your feedback.
As someone who works in graph visualization, this looks like just another case of a default hairball visualization (those often arise when combining force-directed layout with dense graphs): beautiful pictures (in the sense of nice to hang it on the wall), but rather useless as a visualization.
Looking at the various hairballs it would be more helpful to render simple HTML lists ordered by some graph analysis algorithm results (centralities, etc.) What you “see” in these diagrams is mostly a listing of nodes sorted by degree (and you can't even tell the exact degree). Also, nearness often indicates relationships that simply aren't there.
It might be better to find a set of use-cases that describe how users of the app find answers to real-world questions they have, and then add interaction to the diagram that helps the users in answering these questions. Right now, basically the only answers it provides: who had the highest number of interactions. And for many cases, you can't even read the names of those that the people interacted with because labels there are dozens of overlapping labels.
The overview of a bigger repository looks kinda like a hairball, but the ui lets dig deeper, and filter the graph by "code areas" (paths) or organizations.
I was thinking about a few use cases like these that come first to my mind:
- for open source - assessing if a framework (repo) might have a bus factor problem. Take Vue https://bit.ly/2nVxsh5 and React https://bit.ly/31njWR3 . It is easy to see that if Evan for any reason drops Vue then it might be a problem
- I imagine a bus factor is even a bigger problem also for closed source projects
- for developers - quickly finding experts in a some area of the project (especially a bigger one with few 100s developers). The ones more connected are also more likely to answer your questions - eventually they are well connected for a reason.
- for community managers - well each open source project is a community and this is its visualization - so yeah you usually know the top contributors in the community but there might be more people that you might try to involve and engage more in the project. The tool lets you browse the connections in the repository and gives you more social context.
- for non-technical managers / hr - well... they have hard time browsing github as it's full of code (of course it is). This gives them another perspective and insight how works get done within a project (open or close sorce). Though I am not super sure if showing a network is more friendly than showing a code, but that is a common way of showing social structures in sociology for instance - so yeah humanities area.
But maybe showing last 30 days of collaboration would untangle the hairball and give more actual info - this is something I working on now - and your comment tells that this might be important.
Thank you for your feedback. If you have some thoughts on the use cases I wrote or others please share. Thank you!
UI-wise, once I've clicked on a person in the graph, I don't see how to get back to the whole graph with that person not highlighted.
More importantly, I don't understand what "collaboration" means. People who have ever touched the same file? I don't even see files, only "areas". And there don't seem to be any edge weights, i.e., if person A has collaborated with person B once, and is collaborating with person C every day, the A-B and A-C edges in the graph look the same, so I don't see if there really is a close collaboration or not.
The latter point also means that your example from https://www.networkperspective.io/hr/ is very artificial, as a real graph will pretty much never have "brokers" or "peripheral" nodes.
(These are first impressions without having read any docs, if there are any.)
For whatever it's worth, I'm working on a big project with many collaborators, and I do use git annotate to figure out who knows what or who would be a good person to tag for a code review. But this also means that I know that not every change to a file should be weighted equally. For example, renaming something in one file might mean that I need to touch 50 others, and I will show up as a "collaborator" on those files even if they are outside my area of expertise. I really need to use some more complex judgement to see who is a "main" contributor to a file, especially to the parts relevant to my current work.
So I think that very naively mining this data and dumping it on the screen as a big ball of string is not very useful. Deeper analysis would be needed IMHO.
Thank you for your insights, that is very valuable.
UI wise - I tried to make the browser back button work everywhere - so I assumed this as a standard ways of going back where you've been. But maybe that is not what people expect and it should be more obvious.
Yeah the collaboration is calculated two step. First I make a dynamic interaction graph, with timestamped edges. When a person touches a file I mark it as an interaction with everyone else who touched the file before (timestamped directed edges).
Then I transform this interaction graph into collaboration graph. I consider two people collaborate on a file when the interaction is bi-directional - i.e. they both touched the same file within given time period. That way we can limit collaboration to a given time window, like for instance last 30 days, or whole project lifetime like the demo shows.
But you make a very good point about refactoring and renaming for instance class name that might imply changes across many files that outside your expertise.
Yet still a person profile shows "Collaborators by areas" what is a proxy to one expertise - the more collaborators you have in a certain area the more experienced you should be in there.
The example from https://www.networkperspective.io/hr/ refers not only to code collaboration, but becomes more valid if you plug in other data sources like formal structure, meetings / email metadata, like shown on the other demo https://bit.ly/2VRWdaB
We hand a long journey with this product starting with survey data collection as we were obsessed with privacy, then adding other data sources (as it turned out we shouldn't worry about mining data that is public or semi public anyway). Now I implemented this git integration and hence asking HN community for an opinion about how useful this might be.
Anyways thank you for all the feedback, I'll spend more time going through it today, and I appreciate it very much.
> I tried to make the browser back button work everywhere
Ha. That makes sense, but I admit that I didn't dare try it since so many web applications break when you use the back button. Congratulations on supporting it, I'm all for keeping it, but as things stand (and it's not your fault) users will expect an in-app back button as well.
> I consider two people collaborate on a file when the interaction is bi-directional - i.e. they both touched the same file within given time period.
I see, that's clever.
Thanks for your answers, and all the best with this project!
PhD in CS (Agent systems), technical co-founder of organizational network analysis startup, full stack developer. Quick learner, interested in software developer role that works close with data science / machine learning team.