Although, there is no denying that this is a valuable resource but I have started to get turned off by a list of n books to learn something - they can be valuable but it is undeniable that they can also be overwhelming and perplex someone about how to get started. I believe technical books should be used to complement your knowledge of the field not to get started in it. For eg, "Secrets of the JavaScript Ninja" will be very valuable because I already have experience in JS and it will help me understand some of the caveats that I might have overlooked. The best way has always been to get start implement something regarding the subject and try to dive into everything you uncover.
A blog post submitted here mentioned the same sentiment [1] -
> I can’t fully explain how immensely unmotivating it is to be given a huge list of resources without any context. It’s akin to a teacher handing you a stack of textbooks and saying “read all of these”. I struggled with this approach when I was in school. If I had started learning data science this way, I never would have kept going.
Second the dataquest post. Information without structure can be overwhelming, and its important to know what the optimal ways to learn something are. Arguably this is why formal schooling was created - to provide a framework for learning...
Thank you - this is a wonderful ressource that I had lost in my list of bookmarks about data science. That's another good example of information overload.
Sure a bunch of books is no use. But, for self learning there's nothing more systematic than following one or two well-written books through. Just trying to gain everything via "practical" knowledge without any systematic guidance is definitely dangerous.
At least "Python for Data Analysis" is a pirate copy. Wonder how many others are too. But as long as you make money from affiliate links you don't care, right?
Lists of "curated" free books/resources etc. are a very active spam format these days. It's a simple and effective way of publishing without having any original content of your own. People love clicking on these things because they love the idea of learning.
What makes it seem like Python for Data Analysis is a pirated copy? I figured since it was hosted from Canisius College it would be legally distributed.
I don't want to host pirated content, so if it is I will remove it.
Also the PDF has a link to a notorious ebook pirate platform on every page. If you really believe content on college pages is legal, you must be very naive. I've never seen a naive webmaster that uses domain privacy though.
Personally I wasn't surprised to see (possibly) pirated content on an .edu site with a ~username URL, as the ~ suggested a student's page, where unauthorised content might pop up to share with classmates and stay up undetected by the college.
What surprised me is that the owner of the Canisius page appears to be teaching staff rather than a student. The other books hosted there seem to be legitimately freely available, however, so I'm guessing that was also a naive mistake.
If you're a beginner, you're probably going to be too overwhelmed by the options. I often find emailing/asking a few different professors/researchers/students in the field you want to learn for suggestions more productive.
That's not to say this isn't helpful. This is from my own personal experience.
Is anybody aware of good books/resources on machine learning/data science in Matlab?
My SO has been trying to learn ML to further her work for a couple months now, and has had a hard time with it. She quite intelligent, but isn't a terribly experienced programmer (she's been writing Matlab for a couple years now, but mostly in a scientific setting)... Either way, I suspect part of the problem is that most of the explanations usually are in a language unfamiliar to her, and expect her to learn or translate it in addition to the concepts.
I noticed something last night while watching the Djokovic US Open quarter-final. It featured an "IBM Insights" segment which claimed to have mined 8 years worth of Majors competitions to generate stats. And one interesting result it was able to produce went something like this: if Djokovic is able to return only 25% of his opponents serves, then in 85% of past matches it has resulted in victory for him. The implication being that such is the strength of his defensive game.
While this is no doubt really interesting, I find I am getting diminishing returns from outputting stats like this from big dumps of past historical data. What I would like to be able to show is a live heat graph style stats tracker, where each point in the match updates my belief net about who is winning, or playing better. Of course, the final outcome may be upended by some fluke occurrence such as a Hail Mary pass in the final seconds which is what makes sports interesting, but nonetheless I think a live tracker would say a lot more than the actual score of the match.
So, I am wondering if anyone has specific resources for real time online data mining? At web scale for high throughput data streams. And I agree with shubmajain above, libraries and repos are preferable to books and academic journals ;)
This isn't too far from the logic: "How can we win this game? Score more points than the other team". I suppose the more interesting thing would be to compare the same correlation across players.
I agree that the stats don't provide insight regarding game play and strategy. IBM has been providing the same weak stats for years now. I would like to see tennis incorporate the hawk-eye system tracking player movement and shot placement as well. Perhaps that could produce a heat map. On that note they can also eliminate the line judges while we're at it. The whole challenge system is idiotic. They have the tech, they should incorporate it throughout the sport.
Without doing the math - Djokovic is such a strong player that even if he's only returning a quarter of your serves, meaning you're 3/4's of the way to winning your set (I don't tennis, sorry if I'm getting the terms wrong), he's still probably going to beat you.
Well, that's a close explanation, except I think you're confusing set and match. For men's tennis, it takes 3 sets to win the match, with the potential of playing 5 sets.
I'm actually not sure that the math is true, though. (Or I really don't understand what the stat is saying.) Let's say that it actually is for every 4 serves, you win 3, Djokovic wins 1. That number gives you every game (winning the game game-point-15), to give you every set. I don't see how Djokovic ever wins a game, let alone the set or match.
It's hard to take any action based on that fact without further information. Even a gambler couldn't use that tidbit without conditioning on things like the current score. Or am I missing something?
I would add these great ebooks on Cloud Computing and AWS Certifications:
The Cloud Computing Job Market
With this eBook you will learn how Cloud Computing is changing the IT industry and creating a complete set of new roles for companies and businesses worldwide. Information and data to start your cloud computing career.
Honest question: is ML/DS something you can just pick up and be hired[0]? May be I'm ignorant, but I'd think employers would look for a degree in some related field to actually consider you for a position doing it.
[0] As in how you can pick up web hacking, do a few websites and create a reputation and get hired that way without a formal degree.
There was a thread on here a month or two ago about this. In general, it was noted that it's best (for both employment as well as just getting stuff done) to have a deep understanding of a particular area of ML rather than a general understanding of many areas. Usually those with a deep understanding have focused on it in school. But the latter group of generalists is a much larger group in the software industry, since most of us did not go to school for this specifically.
I went from being a US diplomat with no coding background to getting a job at edX as a machine learning engineer, so it's very possible. The keys are to find projects and build a portfolio so that you can prove your capabilities, and to start a blog/go to meetups so that you can build an audience and find opportunities.
Market seems to want a lot of them, different profiles and CVs for different domains and responsibilities: data wranglers, data analysts, statisticians, machine learning, business analysts, communicators, infrastructure operators, big data architects. The best shot is coupling your academic / self-matured strength with a domain you really like and start building your own portfolio from real-world case studies in the field you choose.
I think you kind of posed a question and a partial answer. If degree in related field (math, statistics) then yes you can pick these things up. If CS or no degree it will be much harder to pass resume filters.
EDIT: Oops I should have said "An Introduction to Statistical Learning with Applications in R" rather than The Elements of Statistical Learning. The Elements book goes into way too much depth to be a good introduction to the subject.
Similarly, An Introduction to Statistical Learning With Applications in R is like a practical version of (or companion to) Elements. I very much enjoyed it.
I really enjoyed the book, it took a modern approach to R using many of the newer packages (dplyr for instance) and ggplot and combined them into a very nice introduction to R with labs, etc. Well worth checking out.
Your "smooth-scroll" library is completely breaking my touchpad scroll with an Acer c720 Chromebook. One slight movement (which should be a few pixels scroll) is moving me over half-way down the screen. Makes your site unusable with this touchpad as accidental scrolling sometimes happens and moves the screen a whole page away, especially when trying to right click open links because the gestures are similar.
Smooth scrolling is already implemented correctly in the browser. Your implementation is just a hack that hijacks the normal behaviour a user is accustomed to and just gives back a version that just feels wrong to interact with, even without performance issues.
That said, if I'm being honest, it's fairly unpleasant to use on a desktop with a mouse. It scrolls you to the top after it loads (which is after the rest of the page), and behaves differently than the computer normally does...
I made a change to the code, but since I don't have a touchpad, I won't be able to tell if it's fixed. Let me know what happens if you happen to go back to the page.
It's still not working well on my touchpad. It stutters badly. I honestly would recommend removing it. I checked it on my desktop. It works there, but the difference scroll speed is unhelpful and actually a little bothersome.
A blog post submitted here mentioned the same sentiment [1] -
> I can’t fully explain how immensely unmotivating it is to be given a huge list of resources without any context. It’s akin to a teacher handing you a stack of textbooks and saying “read all of these”. I struggled with this approach when I was in school. If I had started learning data science this way, I never would have kept going.
[1]: https://www.dataquest.io/blog/how-to-actually-learn-data-sci...