Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: The Codex – a graph database project for the digital humanities (the-codex.net)
33 points by argimenes on Jan 24, 2016 | hide | past | favorite | 11 comments



In 2015 I quit my full-time job as an ASP.NET developer to build what I think of as an "atlas of history" for the Italian Renaissance. It is a semantic-web style database build in Neo4j, .NET MVC, and KnockoutJS, and is an attempt to build a map of historical events and personalities for the digital humanities.

http://the-codex.net

I am currently the sole developer, product designer, and researcher on the project -- but I am looking for collaborators who would be willing to help me take this further.

As an "atlas of history" is a broad concept I decided to give the project clarity by focusing primary source documents from the Italian Renaissance. The two main sources at present are the 'Florentine Diary' of Luca Landucci and the letters of Michelangelo. I have entered about 40 years worth of entries from Landucci's diary and a good portion of Michelangelo's early letters from his Roman period. In the process I have added hundreds of historical personalities, places, artworks, etc., in order to give the user real data to work with. I have also built various screens with data visualisation tools to mine the historical events. And of course I have built an extensive back-end for managing the data and relationships.

Is anyone interested in helping me out? I'd love input from anyone with an interest in art history and graphic design, or data visualisation tool, Neo4j, or anyone who wants to help me research and enter data.

Feel free to email me any time at: iian.d.neill@gmail.com

In the meantime, why not check out the Control Panel on Leonardo da Vinci's dataset. Clicking any links in the text will load the datasets for those entities; or you can search for them by name. Why not try adding Michelangelo's dataset to the mix? You can then switch between the three data-vis modes at the bottom, fiddle with the date filters, etc.

http://the-codex.net/Time/ControlPanel

Many thanks, Iian


I appreciate what you've built and I'm interested in helping out. What is the way going forward for the project? Do you plan on monetizing it? Is code eventually gonna be public?


Hi,

Thanks for your interest! I have no plans for monetizing the project, partly because I cannot see a meaningful way to do that, but mostly because it is a research endeavour. The code is currently public on my BitBucket repo, and I'll shoot through the URL as soon as I can.

Basically, I want to explore the limits of how graph databases can be applied to the visualisation of history. My immediate goals are to:

(A) expand the range of datasets that are collected; (B) improve the tools for entering and annotating the data to make the process quicker and more automated; (C) to add more powerful data visualisations; and (D) to complete the input of the primary source datasets mentioned and add more.

I want to build a graph database time machine of the Italian Renaissance so you can pick a day in history and see what was happening across Italy with various sources, see events plotted or even animated on a map (e.g., watch a battle unfold or even weather events), etc. The system is underpinned by subject tags which are taxonomically organised in a 'is-a' hierarchy, which I hope will provide a 'semantic zoom in/out' functionality.

I understand that the current platform of C#, ASP.NET MVC, KnockoutJS may be a turn off ... And I am open to other technologies if that is crucial. But conversion to another platform would certainly take a lot of time and manpower. But to be honest I think the hard work is mainly in the product design, the visualisations, the data-entry, and the Neo4j Cypher queries.

What kind of collaboration did you have in mind?


I could whip up an alternative frontend implementation if you'd like, I've been looking for a project where to flex my ClojureScript/Om Next (based on React) muscles. I'm very experienced both with Clojure and with the frontend, and with interface design. I actually worked with KnockoutJS (in a C# shop I used to work at) many years ago, but yeah I don't have fond memories of MVC/MVVM for large projects/efforts, the React model speaks more to me.


ClojureScript and React sound interesting to me, too, and might be a good fit for the rich UI Control Panel page, which will need to handle large JS data-sets and update visualisations quickly. I've been finding that KnockoutJS is a bit sluggish when wiring up the filters as computed function observables. It could be something in my filter code, also, but I have seen benchmarks where KnockoutJS doesn't perform well in.

We should talk further ... drop me a line at my email? iian.d.neill@gmail.com


Hi Iian, great work!

I built recruitment CRM with cached 1,000,000+ nodes and can imagine how much time and efforts you invested in your project.

I assume that .NET MVC and KnockoutJS was your first choice as you get used to it ... and it's fine.

I personally found a lot of frustration dealing with Neo4J full-text search in multiple languages English, Russian and Japanese. There was no clear guideline on Lucene integration at the time.

How is your experience with Neo4J? Do you have to sync Neo4J data with traditional databases like M$ SQL?

Do you want to make your project as community efforts or see it as potential business?

The last question is much more important. I assume that a lot of developers won't be excited about .NET or KnockoutJS staff, however they might consider to share some crawling data, D3.js graph visualizations code, etc with you.


Hi Ilya,

Thank you for your kind words and your excellent feedback re: the chosen platform and the OSS community. To be honest I hadn't thought about the language and platform angle until you mentioned it, but it is definitely something I may need to consider. C# is a mature language comparable to most other quasi-imperative/quasi-functional ones so conversion I don't think would be hard so much as time consuming. The brain of the system is ultimately in the data structures, the Neo4j Cypher queries, and the product design/architecture.

Can I ask what technologies you think would be most likely to attract community involvement, but wouldn't sacrifice static typing? Perhaps some Java framework for the back end, Angular or React for the clientside framework? I would not be able to immediately convert the project to those technologies, though, given the code count, but something to keep in mind ...

Many thanks, Iian


Hi again,

To answer your other questions, I have no real plans to monetise this. I did try to raise funding for an 'art/culture travel app' that underpinned Codex but the focus was too niche and I didn't have a graphic designer. I think an open community driven project would be more useful, overall.

Regarding Neo4j itself, most of my queries are run across the node paths rather than text so searches as such. So I haven't had to deal much with internationalisation issues (e.g., and searching for text with or without acutes, graves, etc.) The data in the system is pure Neo4j and not sourced from SQL of any kind. ;-)


I also recorded a screenshare presentation of how The Codex works:

https://www.youtube.com/watch?v=_R0ESfLBuHo


Hey, Iian. Not sure if this will be useful to you, but you might consider checking out http://endlessorigins.com/ (the largest structured collection of human events, available for download as a single TSV file). And good luck with your project! :)


Hi Tuvalle, thank you for your encouragement and for the link to this fascinating resource! Can't wait to open up the dataset and check it out soon. May be possible to import this into the Codex if the data structures are broadly compatible ... :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: