Hacker News new | past | comments | ask | show | jobs | submit login

Curious what other approach you would take to do exploratory data analysis? It's so natural to me I can't think of another way that would be practical to achieve the same workflow.



Handcrafted machine code in punch hole cards.

Interactive environment without compile nonsense is just too new for folks.


emacs org mode can do this but is not tied to just python. Anyway, something like this works:

  #+BEGIN_SRC python :results file
  import matplotlib
  matplotlib.use('Agg')
  import matplotlib.pyplot as plt
  fn = 'my_fig.png'
  plt.plot([1, 2, 3, 2.5, 2.8])
  plt.savefig('my_fig.png', dpi=50)
  return fn
  #+END_SRC

  #+RESULTS:
  [[file:my_fig.png]]


In a true notebook you would maybe want to do the following:

  import matplotlib
  matplotlib.use('Agg')
  import matplotlib.pyplot as plt
  plt.plot([1, 2, 3, 2.5, 2.8])

  Alright, saving the figure at 50 dpi first
  plt.savefig('my_fig.png', dpi=50)

  Trying a bit more DPI to see if that makes a difference
  plt.savefig('my_fig2.png', dpi=150)

  Oh, wrong numbers, forgot that the fourth datapoint was going to signify 100, going back to 50 dpi as well
  plt.plot([1, 2, 3, 100, 2.3])
  plt.savefig('my_fig4.png', dpi=50)
It seems like your example misses the interactivity.


We have a lot of scientists using Rstudio. It’s not quite the same but you can do it. It lets you view your data frames like a spread sheet and generate graphs. It’s R and I get that Jupiter supports R but it’s always has some issue with some dependency.


Ew.

R.

No thank you.


I used to think like that. Programmers hate R. But I took a biostatistics class and it really is the best tool for that job. Plus the graphic output can't be beat (ggplot2) and fairly easy to install packages make it quite valuable tool.


> it really is the best tool for that job.

Besides the ecosystem, what makes R better than Python or Julia for biostats?


Can't speak to julia..

The statistics built in are great. They're just there, less need to find a package (general stats, ttest, chi_squared test...). We tend to use the "tidyverse" packages [1] https://r4ds.hadley.nz/. Bio-python is amazing for manipulating biodata, but once the data is extracted and you need statistics, our scientist seem to use R. I really don't love R's syntax, but I get why they use it. I use python all the time for data wrangling (right now I'm pulling sequences from a fasta file to inject into a table).

Rstudio is like an IDE for your data. You can view the data tables, graph different things etc. If you try the first chapter of the R4data Science book, you can see how get up and graphing and analyzing quite quickly. https://r4ds.hadley.nz/data-visualize.html

Though at this point Python and R are necessary depending on what package/ algorithm you want to use.

There are some good packages for single cell analysis: We use "Seurat".

https://satijalab.org/seurat/articles/get_started_v5.html

Jupyter supports R now with an add in, so its less of an issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: