Big-O technically means "worst case", which for Quicksort is O(n^2), although usually (in normal cases) it runs better than that. Other sort algorithms can guarantee not worse than O(n log n).
Depending on how the pivot is picked, Quicksort can actually be implemented in O(n lg n) worst-case time.
EDIT: I was going to link a proof for this but it's surprisingly hard to find. IIRC, the idea is to use the median of medians algorithm ([1]) to pick the median for the pivot, and deal with values equal to the pivot by alternatingly placing them in the left and right partition, or alternatively just keep them in a third partition in the middle.
That's incorrect. The /mathematical/ definition means that it's the upper bound. I.e. yes, the set of all functions of O(N) is a subset of the set of all functions O(N^2). So you could say Quicksort is O(N^100) and still be technically correct.
However, the big-O notation does NOT specify anything about worst/avg/best-case complexity of a given algorithm. That should still be defined in the analysis.
You mixed up those two slightly different concepts.