Bookdown – Authoring Books and Technical Documents with R Markdown

anacleto · on Dec 3, 2016

If you use R on a daily basis this is a very nice feature.

IndianAstronaut · on Dec 3, 2016

R with Rstudio is making some major progress in using R for presenting. Rmarkdown, ReportRs for PowerPoint and now this.

hoodwink · on Dec 3, 2016

I _love_ Bookdown, but only once I figured out how to incorporate it into my writing workflow.

I'm writing a ~150 page monograph. As the document evolved, organizing the prose and reordering the structure gradually became too cumbersome in plain text Markdown files (whether I used R Studio or some other editor). So I shifted my writing over to Scrivener---the best long-form writing tool I've ever used, highly recommended.

I write RMarkdown in Scrivener and compile the document into plain text. I manage my reference materials Zotero and export bibtex bibliography. Bookdown then takes the two files, incorporates all my calculations, tables, plots, and figures, and outputs a beautifully rendered document.

iamwil · on Dec 3, 2016

Why was the prose and reordering too cumbersome? was it updating all the references or something else?

hoodwink · on Dec 3, 2016

It's just the process of organizing your ideas in a logical order. At the start, I have a bunch of ideas and I do my best to organize them into an outline. But as I begin writing and continue my research, I realize that some of my ideas suck, or I have new, better ideas, or I see a better way to share my ideas with the reader, so I need to constantly reorganize my document.

I don't know if this process is a sign that I'm an amateur writer or not, but Scrivener makes document organization incredibly simple compared to writing in a normal text editor. It also enables me to keep my notes sorted without interfering with the prose.

By the way, updating all the references is very easy using Bookdown and a bibtex bibliography.

nonbel · on Dec 3, 2016

I still lose all formatting when copy-pasting code from the pdfs being generated. Why does that need to be a problem? Eg from page 47:

  ---
  title: "An Impressive Book"
  author: "Li Lei and Han Meimei"
  output:
  bookdown::gitbook:
  lib_dir: assets
  split_by: section
  config:
  toolbar:
  position: static
  bookdown::pdf_book:
  keep_tex: yes
  bookdown::html_book:
  css: toc.css
  documentclass: book
  ---

https://bookdown.org/yihui/bookdown/bookdown.pdf

yihui · on Dec 3, 2016

Because PDF is for printing purposes. There could be many many problem if you copy and paste from PDF (white spaces being eaten, ligatures, curly quotes, en/em-dashes, ...; almost as bad as Word, except that PDF is beautiful). So don't copy from PDF, but from HTML instead. HTML is much more faithful in terms of preserving characters.

nonbel · on Dec 4, 2016

So it is just an inherent limitation of pdfs? That's too bad, for some reason I find the html format less intuitive.

applecrazy · on Dec 3, 2016

Can somebody explain to me why they would download a programming language just to write a technical paper? I get that it's a replacement for TeX, but with a name like Bookdown, I expect it to be as simple as Markdown in terms of setup.

dandermotj · on Dec 3, 2016

It's not a replacement for TeX. R originally had Sweave for weaving R and text together. Then we got Rmarkdown, which introduced the use of markdown for authoring documents via pandoc, but the output document only had simple features like headers, sub headers, emphasis and so on.

Now Yuhui from Rstudio (who's been prolific in this area) has written bookdown for authoring books with rmarkdown. It's not for authoring statistical papers (as the other comments stated - that's rmarkdown's realm) it's for books like Hadley Wickham's R4DS and Advanced R. Other examples can be found on the bookdown website.

I'd also add that, regardless of inserting R code, bookdown is a class authoring format and personally I would write a book or large document in bookdown even if it was just pure prose.

Noseshine · on Dec 3, 2016

  > It's not for authoring statistical papers (as the other comments stated

While you are technically right what will most likely happen is what we can already see (on https://bookdown.org/) - exactly what I described in my comment. Of course one could write a novel with it. Any publishing tool can be used to publish "just text". But who is going to end up using this tool? Pretty much only people already using R, and that - apart from people taking a course - is people doing statistics.

I looked at two randomly selected pages in two randomly selected books on the homepage (I swear I didn't click on more than those two to select examples that fit my narrative), both show it's being used just as I said.

- http://r4ds.had.co.nz/exploratory-data-analysis.html

- https://bookdown.org/robertkck/ecf_draft/low-carbon-is-promi... (this one has interactive graphics)

My reply was aimed at someone who doesn't seem to know any of the background story.

I agree with you that it's a nice tool for a lot of purposes.

curiousgal · on Dec 3, 2016

I think it's aimed towards people who already use R. Stat academics for instance.

Mikeb85 · on Dec 3, 2016

They wouldn't. It's for people who use R as their everyday tool (many people in diverse fields), and want to seamlessly create reports and papers directly from that tool.

Noseshine · on Dec 3, 2016

First of all, this is for statistics based papers, not just "technical papers". The "R" in the URL and also in the language itself shows that focus. So if you have to first download R (and RStudio) you may not be the target audience.

The benefit of RMarkdown publications is that they contain the code to reproduce every single step in the argument - as far as it is based on data.

Examples from me, made for a course on Coursera some time ago (so especially in the 2nd example the focus wasn't on the result, effort for data cleaning was deliberately kept to a minimum):

- https://rpubs.com/Noseshine/74191

- https://rpubs.com/Noseshine/79343

Every single line of code necessary to produce the graphs and tables and summary statistics is in there. If you get the source RMarkdown file you can run it yourself. The graphics and tables are not images created somewhere else!

Usually you will have the code in an appendix at the very end, but every single thing in the document based on data is calculated using code somewhere in the document itself.

The main target for these kinds of publications are statistics-heavy papers. Usually you just read the conclusions of the authors and some summary statistics but you don't have access a) to the data they used and b) to the methods they used on the data. RMarkdown documents provide b) and (hopefully) an embedded link to a) (in the form of source code that when executed downloads it into your R environment).

The background story is that not infrequently you find studies (in all kinds of statistics-heavy fields from medicine to psychology) where the summary conclusions are based on faulty premises, faulty data collection/selection, faulty statistical methods. If all you have is a LaTeX or Word or whatever (passive) summary document you cannot find any of those problems unless you keep bugging the authors for a copy of their data and their code used to analyse it, which is a lot of work and was too often altogether unsuccessful. So there is a push in science to get people and organizations to publish more than just the summary.

Another advantage: You can produce interactive (online) versions, for example showing a graph where you can change parameters interactively, but show a few static example graphs in the static document version.

As for just the format:

The reason to compete with LaTeX has been given right here on HN a few times when LaTeX was discussed: It is far more difficult and has plenty of bugs and difficulties. A long time ago I wrote my thesis in it and had no major issues, but I don't know why I would want to write anything in it now. Things like RMarkdown are much less flexible - and orders of magnitude easier, both to write and to debug.

As for the competition with Word, text-based formats have the advantage to be easily "processable" using other tools.

yihui · on Dec 3, 2016

> First of all, this is for statistics based papers, not just "technical papers".

The blog post is not really long, and in the fourth paragraph, I wrote:

> [...] We used books and R primarily for examples in this book, but bookdown is not only for books or R. Most features introduced in this book also apply to other types of publications: journal papers, reports, dissertations, course handouts, study notes, and even novels. You do not have to use R, either. Other choices of computing languages include Python, C, C++, SQL, Bash, Stan, JavaScript, and so on, although R is best supported. You can also leave out computing, for example, to write a novel.

Perhaps one thing I didn't make clear enough is that (you are right that) R (optionally RStudio) should be downloaded to use bookdown locally, but your document or book does not have to be related to R or statistics at all. However, I do agree that people who don't use R probably won't care about downloading R in the first place to use bookdown, so even if bookdown is sort of "general-purpose", the actual major audience is likely to be those doing statistics and data analysis using R.

That said, Sections 5.5 and 6.2 of the bookdown book have shown how to use bookdown on Github and Travis CI: https://bookdown.org/yihui/bookdown/ That way, you don't have to install R locally. All you need to do is to commit changes to Github, and the book can be built automatically on Travis and published to Github pages. The author has to find someone to help him/her set up these services, though.

ekianjo · on Dec 3, 2016

when you are a r user this is what youd like to have sometimes.

joncalhoun · on Dec 3, 2016

Has anyone used both Bookdown and Softcover [1] to compare the two?

Softcover is something similar written in ruby that uses a combination of markdown with latex where necessary to generate HTML, PDF, ePub, and mobi book formats. It is what Michael Hartl uses (and created iirc) to build Rails Tutorial.

While it seems tightly linked with the Softcover publishing service, you can use the files it generates pretty much anywhere in my experience.

[1] https://github.com/softcover/softcover

yihui · on Dec 3, 2016

A few differences I spotted as I quickly read the Softcover book (I'm the main author of bookdown, so I could be biased):

Bookdown is built on top of R Markdown, which means it has the genes of literate programming (knitr) and Pandoc. Literate programming is an important bridge to reproducible research (source code and prose in the same document), which we strongly believe in. We also value Pandoc's efforts in standardizing Markdown, although John Gruber didn't seem to care [1].

Softcover seems to be focused on the typesetting syntax, cherry-picking from different flavors of Markdown plus LaTeX when Markdown cannot get you there.

I think the design of these tools is heavily influenced by the background of their authors. I have been a student in the statistics major for several years, and published a few academic papers, a PhD thesis, and a book before, so I know some of the pain of publishing these things. The overall feeling you get from bookdown may be "hmm, this is for people in the academia" (who else cares about equations or theorems after all). By comparison, the feeling I get from Softcover is "this is for software manuals" (who else would care about code listings). Neither feeling is accurate: bookdown is not only for academia and softcover is certainly not only for software manuals.

There are certainly many differences in the Markdown syntax, but I don't think it is worthwhile listing them here. One subtle thing is that on bookdown book pages, you may see an edit button that takes you to Github to edit the R Markdown source, then send a pull request. This little feature is one of my personal favorite features.

Another major difference is that Softcover provides the service of marketing and selling as well, and bookdown is only a tool for authoring books at the moment (you have to talk to publishers by yourself). Both self-publishing and publishing with an established publisher have their pros and cons, e.g. the former is quick and the latter is slow. We leave the decision to the authors. Several platforms for self-publishing exist, and authors can send the PDF/EPUB there if they want.

[1] https://twitter.com/markdown/status/507341395137658880

ivansavz · on Dec 4, 2016

The main idea in softcover is to use (a restricted subset of) LaTeX as the "main" representation, which then gets converted to html + MathJaX using tralics (a latex-to-xml converter) and lots of custom pre-processing, and post-processing steps. The source code is quite readable. The end result is the following pipeline:

     .tex --> .html+MathJax --> .epub --> .mobi

The "Softcover-flavored Markdown" is just an optional convenience to enable people who don't know LaTeX to use the system as follows:

     .md --> .tex --> .html+MathJax --> .epub --> .mobi

The way I see the .md to .tex conversion is not that exciting, but the fact we finally have a civilized way to produce .epub/.mobi from .tex files is an amazing achievement. If you're working on books, I highly recommend checking out the softcover software stack.

iamwil · on Dec 3, 2016

Anyone know what markup language was used to express the math equations? Was that Tex or something else?

yihui · on Dec 3, 2016

Yes, the syntax is LaTeX (thanks to MathJax). See Chapter 2 of the bookdown book: https://bookdown.org/yihui/bookdown/

rcarmo · on Dec 3, 2016

I've been toying with http://weasyprint.org and Python markdown toward similar (but as yet unrealized) effects, and quite like the idea.

It's not too hard to, say, write a Python Markdown extension to do the same using pandas, seaborn and the usual data science tools from the Python universe, but I salute the R folk for getting this done in what appears to be a quite consistent fashion.