I don't think this was ever my issue with Latex, which instead are mostly:
- the cryptic error messages and infinite logs
- the unintuitive ways to do stuff like store a value for later use or sum two lengths
- the very long compile times
- the amount of reliance on global state from various packages, which contributes to even more cryptic errors or weird behavior when something goes wrong
- various other quirks, e.g. the fact you often need to end a line with a comment or the newline will skrew up your content.
Some is spent on optimizing the results on the paragraph, page, and multi-page level: river elimination, color balance, widow and orphan elimination, etc. I don't know how much of this Typst does; certainly HTML + CSS does none of it.
The non-control characters of ASCII are largely characters you might actually want to put in a document. TeX uses some of these as markup, e.g., the dollar sign to bracket maths mode and the ampersand as a column separator in tables. Typst takes this much further, using plus and minus signs to introduce list items, at signs for references, and so on.
Ideally, all visible characters should produce themselves in the printed output, except for the backslash introducing control sequences that represent all of the markup and braces for delimiting the extent of parameters to those control sequences. This would produce a very predictable and easily-parsed syntax.
Typst is more minimal and faster in compiling documents, I prefer using it.
But it's not in all cases a LaTex replacement. The ecosystem is also larger.
I have LaTex documents I struggle to convert.
Yeah - typst has a bunch of features that I really want for blog posts and rich documentation, where markdown isn't a powerful enough tool. For example:
- Boxes & named figures
- Footnotes
- Variables, functions (incl populated from nearby files)
- Comments
- Chapter / Section headings (& auto generated table of contents)
- Custom formatting rules (For example, typst lets you define your own "warning box". Stuff like that.)
I don't know of a better tool to write my blog posts today. Markdown doesn't have enough features. And I'm obviously not writing blog posts in latex or a rich text editor. I could use actual javascript / JSX or something - but those tools aren't designed well for long form text content. (I don't want to manually add <p> tags around my paragraphs like a savage.)
Pity the html output is still a work in progress. I'm eagerly awaiting it being ready for use!
[^0]: it doesn't matter where this is placed, just that this one has a colon.
The table of contents thing is annoying but it's not hard to write a little bash script. Sed and regex are all you need.
> Markdown doesn't have enough features
Markdown has too many features
The issue is you're using the wrong tool. Markdown is not intended for making fancy documents or blogs, it's meant to be a deadass simple format that can be read in anything. Hell, its goal is to be readable in a text editor so its more about styling. If you really want to use it and have occasional fanciness, you can use html.
But don't turn a tool that is explicitly meant to be simple into something complicated just because it doesn't have enough features. The lack of features is the point.
Yes, I think we're in violent agreement that markdown is the wrong tool for the job. That's why I find it baffling how so many blogging & documentation tools lock you in to using markdown, with its anaemic feature set (eg mdbook).
Even markdown + inline HTML is wildly inadequate. For example, you can't make automatically numbered sections. Or figures with links in the text. Or a ToC. And so on. Try and attach a caption to an image and you're basically hand authoring your document in crappy HTML.
So I agree with you. I don't think the answer is "markdown++" with comments, templating and scripting support. I think the answer is something else. Something which has considered the needs of authoring documents from the start. Something like typst.
> That's why I find it baffling how so many blogging & documentation tools lock you in to using
I feel this about so many things and it boggles my mind why people often choose to do things the hardest way possible.
Honestly, I think a good portion of it of the unwillingness to toss something aside and write something new. If it's just a hack on a hack on a hack on a hack then no wonder it's shit. It's funny that often it's quicker to rewrite than force your way through.
I'm worried that with LLMs and vibe coding on the rise we're just going to get more. Because people will be asking "how do I make X do Y" when in reality you shouldn't ever make X do Y, you need to find a different tool.
> I'm worried that with LLMs and vibe coding on the rise we're just going to get more.
I'm hoping the opposite, at least eventually. I think before long it'll be easy to get chatgpt to build your own version of whatever you want, from scratch.
Eg, "Hey, I want something kinda like markdown but with these other features. Write me the spec. Implement a renderer for documents in Go - and write a vs code extension + language server for it."
But if that happens, we'll get way more fragmentation of the computing ecosystem. Maybe to the point that you really need the memory of a LLM to even know what's out there - let alone understand how to glue everything together.
You missed my concern. Even if LLMs get much but it doesn't mean the users will ask the right questions. Even now many don't ask the right questions, why would it be any better when we just scale the issue?
MDX advertises itself as "markdown + components", but its not commonmark compatible. I tried using it a few years ago. In the process, I migrated over some regular markdown documents and they render incorrectly using MDX.
I filed a bug (this was a few years ago) and I was told commonmark compatibility was an explicit non goal for the project. Meh.
Word 20 years ago was a very different beast compared to word today. For starters, it still had a closed, binary (read: not friendly to source control) format. It also had more bugs than Klendathu.
When you are losing your semester's 25-page seminal work an hour before deadline because Word had that weird little bug about long documents and random CJK characters (and whether or not the moon was currently in the House of Aquarius supposedly), you develop a ... healthy dislike for it.
LaTeX back in the day didn't need zealots - Word did all the heavy lifting in demolishing itself for anything more involved than 'Secretary writes a letter', 'grandma Jones writes down her secret butterball recipe' or 'suits need a text, and only text, on paper, quickly".
(Yes, that was snarky. I am still bitter about that document being eaten.)
> For starters, it still had a closed, binary (read: not friendly to source control) format
Word still has a closed format. It supposedly standardized OOXML, but - it doesn't follow that standard; Microsoft apparently managed to warp the XML standard to accommodate its weirdness; and all sorts of details encoded by MSO in that format are not actually documented.
There also used to be the problem of different renderings on different machines (even if you had all the relevant fonts installed): You opened a document on another person's computer and things were out-of-place, styling and spacing a bit different, page transitions not at same point etc. I don't know if that's the case today.
Granted, though, hangs and crashes and weird gibberish on opening a document are rare today.
> You opened a document on another person's computer and things were out-of-place, styling and spacing a bit different, page transitions not at same point etc.
When this happened to me on my job in the late 90s we were able to locate that problem in the printer driver that was visible in the Word print dialog. I don't remember the details but it looked like Word was adjusting font metrics to the metrics of the specific printer, and all the shifted pixels quickly added up to destroy the finely balanced lines of our print publication (yes, an official public health periodical by a European government was typeset with MS Word, and there was a lot of manual typographical work in each print). Given the technology at the time, it's not clear to me whether Word's behavior was a feature (in the sense of: automatically adjusts to your output device for best results) or a bug (automatically destroys your work without asking or telling you when not in its accustomed environment).
> Given the technology at the time, it's not clear to me whether Word's behavior was a feature or a bug
A bug, because even if this was merited somehow, they could have just made it a nice prominent checkbox for the user to decide what behavior they wanted.
Currently you will find that LaTeX is the de facto standard at CERN. Maybe only management would not use it. But CERN gives overleaf professional licence to each member. And all templates I have seen for everything I interacted with that is going into publications are LaTeX.
Well, naturally 20 something years make a difference, although for some others, it looks pretty much the same, as I have visited a few times since then as Alumni.
I do remember that too. In fact it was one of my physics teacher who got me into LaTeX - he used to complain about Word while praising LaTeX and its WYSIWYM.
Though I ended being a graphic designer so LaTeX felt rather limiting very quickly, but fortunately found ConTeXt.
Hoped Typst was going to be great for my use case but alas it's got the same "problem" as LaTeX - modularity. Still it seems to be a great alternative for people doing standard documents.
Twenty years ago you say. So that's when it had already been in existence for 20+ years and had been ubiquitous in academia (at least in the sciences) for 10 or more.
How so? Only their web app seems to be closed source. And the company was created by the two project founders. They also don't seem to be doing a lot more than a community project.
Obviously there are differences, but that wasn't the point of my comment. I replied to the claim that latex never needed "marketers". Or did you mean to reply to a different comment?
I meant if there is no company financially benefiting from that activity it is hard to call that marketing. But if there is a company especially if it is backed by VC that is a completely different story.
There is no VC with typst, they're bootstrapped. And I think by "marketeers" the original commenter did not mean actual marketing people, but enthusiastic fans. Unless it was a hidden accusation of astroturfing that I didn't get.
When you are the only option marketing doesn't matter.
I would suspect (based on my own experience) is that the reason folks shout "typst!" anytime they hear latex is that the user experience is 1000x better than latex.
Many people say that they use LaTeX because it produces more beautiful output. Microtypography is one of the reasons for that. It's especially noticeable when microtype pushes hyphens or quotes at the end of a line slightly into the margin. (A nearby comment mentions that Typst has this feature, too.)
Nope, people don't need to know that something is done to appreciate the outcome. You might not know that modern MacBooks use ARM processors, but you might still appreciate that they have a long battery life.
Computer Modern is the very last thing I will ever want in a document and is the first thing I change in every LaTeX document I create. It is easily one of the ugliest fonts ever created.
It has a lot of good things going for it, but it is the least attractive font that I think I have ever seen.
I think it's quite attractive, but its attractiveness isn't really why it's desirable; it's because people know it is the font used by proper fancy scientific papers. It's like the opposite of Comic Sans.
How is that pdf made interactive? It has options to toggle the behaviour, which work even in an in browser pdf viewer. I did not think PDFs could do that.
PDFs can do a lot more than show static content. There was one time where Adobe strongly advocated for PDF to be the page format of what would come to be called "The World Wide Web". Where we have HTML now, Adobe wanted PDF. Thankfully that did not happen. But I suspect it would have made more sense technically than [whatever this mess is that we have now.]
Are people looking seriously at shortcomings of latex and moved towards modern replacements?
Major problems include:
- Tables are a huge pain.
- Customized formatting like chapter headings, footers, etc is painful.
- Latex as a language somehow felt like it was having issues with composability of functions, the details of the problem eludes me now, but it was something like if you have a function to make text bold, and if you have another function to make it italic, then if apply one to the output of another, it should give you bold and italic, but such composability was not happening for a some functions.
-Mixing of physical and logical formatting.
-Lot of fine tuning require to get passable final output.
The biggest pain I remember was placement of figures. I think the [h] parameter would advise to place the graphic right "here", but even if added the exclamation mark for extra force, it would often wind up somewhere else.
Well everyone likes free software (as in freedom and beer) but 0 of you pay, while on a 6 figure salary. Meanwhile no hesitation to pay AWS, Netflix, Amazon, etc. all of them net negative contributors to free software.
They are a very small team and this is a known issue - there is a website refresh coming up that will fix it
They developed the main face of the product first - the online webapp which has live collaboration - which sounds like a sane choice for a new company.
Why is that a bad thing, though? To me this actually sounds a net positive for two reasons: first, there's a single coherent overarching design, and second, so long as their business model is sustainable, it means that the product won't suddenly disappear because of maintainer burnout etc.
Yeah, today's open source combines the worst from corporate jobs and social media. Typst looks nice though, but is indeed developed in a logic of a business
Almost all of typst, except their web app, is available on crates.io and from many Linux distribution repositories. And you can skip the web app if you don't prefer it. There's no loss of functionality.
I find today much easier to contribute to (in the open source sense) than latex. Go to the GitHub and interact with the developers. Who happen to be very responsive.
I used latex for 20+ years and don't know how to file a bug for latex. Do I do it for xelatex, latex? Where? How do I update things? Download 4 gigs? Where's to documentation? Where's a book that explains how to contribute to latex? These are some of the issues I've dealt with and am happy to never have to again.
Does it have better/easier tables. Does it support complex tables like with images in it, with alternating horizontal or vertical text in cells, tables inside tables, tables with alternative row/column shading, etc while still supporting automatic wrapping to contents, etc?
I recall a recent criticism of Typst being that it doesn't strip unused glyphs from fonts when making PDFs so they end up excessively large compared to other solutions. Has there been any change to that?
That's fixed, thanks to a community contribution! [1] For what it's worth, Typst did have font subsetting from the start, but it was rather bad for CFF-flavoured OpenType fonts.
The same contributor has recently put a lot work into building a better foundation for Typst's PDF export [1], which decreases file sizes further, and brings many further benefits.
I live in fear that one of the major typesetting services like Overleaf will convince people to move away from a very durable standard and adopt something that’s much more change-oriented. Then we’ll all have to learn not one, but two standards. Rinse repeat.
PDF is used for pre-formatted content with reproducible layout. HTML is used for dynamically formatted, dynamically laid out and often reflowable content. It's debatable whether PDF needs a more modern alternative, but HTML is certainly not a replacement for it. There are several use cases where HTML is not the appropriate choice - especially for carefully laid out documents or books. You can simulate pre-formatted layout in HTML, but it always feels like shoehorning an unintended functionality.
LaTeX and Typst are markup primarily for PDF type contents. Something like Asciidoc or even Markdown is more appropriate for HTML type content. You can always switch the purposes, but I never got a satisfying output by doing that.
HTML with css paged media gets you reproducible layout without having to mess with Latex and keeps you in open toolchains that aren't two decades or more old without any significant improvement or advancement.
Async is a language feature to enable scalability, but an alternative approach is just to spawn a bunch of threads and block threads when waiting for I/O to happen. That is the approach used by this framework.
Chrome has randomized its ClientHello extension order for two years now.[0]
The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]
> Let's not go blaming vulnerabilities on those exploiting
them. Exploitation is also bad but being exploitable is a problem in and
of itself.
There's "vulnerabilities" and there's "inherent properties of a complex
protocol that is used to transfer data securely". One of the latter is
that metadata may differ from client to client for various reasons,
inside the bounds accepted in the standard. If you discriminate based
on such metadata, you have effectively invented a new proprietary
protocol that certain existing browsers just so happen to implement.
It's like the UA string, but instead of just copying a single HTTP
header, new browsers now have to reverse engineer the network stack of
existing ones to get an identical user experience.
I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability.
It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share.
A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking.
I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it.
Problem : most aggregation criteria are trivially spoofable, or very cheap to change :
- IP : with IPv6 this is not an issue to rotate your IP often
- UA : changing this is scraping 101
- SSL fingerprint : easy to use the same as everyone
- IP stack fingerprint : also easy to use a common one
- request / session tokens : it’s cheap to create a new session
You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy
Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.
I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries
Should be easy enough to create a DroneBL for residential proxy services. Since you work on residential proxy detection at a FAANG service, why haven't you done it yet?
If they're doing things the above-board way from their own ASN, block their ASN.
If they're doing things the above-board way from third-party hosting providers, send abuse reports. Late last year there was a commotion because someone was sending single spoofed SSH SYN packets, from the addresses of Tor nodes, to organizations with extremely sensitive security policies. Many people with Tor nodes got threats of being banned from their hosting provider, over a single packet they didn't even send. They're definitely going to ban people who are doing actual DDoSes from their servers.
DDoS is also a federal crime, so if you and they are in the USA, you might consider trying to get them put in prison.
> blame here are solely the ones employing these fingerprinting techniques,
Sure. And it's a tragedy. But when you look at the bot situation and
the sheer magnitude of resource abuse out there, you have to see it
from the other side.
FWIW the conversation mentioned above, we acknowledged that and moved
on to talk about behavioural fingerprinting and why it makes sense
not to focus on the browser/agent alone but what gets done with it.
Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled
He provides no info. req/s? 95%ile mbps? How does he know the requests come from an "AI-scraper" as opposed to a normal L7 DDoS? LWN is a pretty simple site, it should be easy to saturate 10G ports
Didn't rachelbytheebay post recently that her blog was being swamped?
I've heard that from a few self-hosting bloggers now. And Wikipedia
has recently said more than half of traffic is noe bots. ARe you
claiming this isn't a real problem?
What's the advantage of randomizing the order, when all chrome users already have the same order? Practically speaking there's a bazillion ways to fingerprint Chrome besides TLS cipher ordering, that it's not worth adding random mitigations like this.
But the difference is that Signal has been architected from the start to retain much less (meta)data on the server, so that even if the Signal Foundation is compelled to share the data they have, that data will be extremely limited to the point of being mostly useless.
Thinking more in the sense of being forced to introduce a backdoor, weaken encryption, in the future which would give the US more data. Yes the encryption algorithm is theoretically very secure.
Any entity that operates in the US has to abide by US laws, after all. Probably not a concern for US citizens since they're allowed due process but creates risk for non-Americans looking for a truly secure messenger, especially if they live in a place that is currently at odds with US policy (Canada, Europe).
It depends on how which performance metrics you're interested in, where you draw the boundaries for individual workloads, and how you then schedule those workloads. Hyperlight can start new Wasm workloads so quickly that you might not need to keep any idling instances around ("scale to zero"). That's new, and it makes comparisons a little more complicated. For example:
- If we take VMs as our boundary and compare cold start times, Hyperlight confidently comes out on top. That's 1-2ms vs 125ms+.
- If we take warm instances and measure network latency for requests, Hyperlight will come out on top if deployed to a CDN node (physics!). But if both workloads run in the same data center performance will be a lot closer.
- Say we have a workload where we need to transmux a terabyte of video, and we care about doing that as quickly as possible. A native binary has access to native instructions that will almost certainly outperform pure-Wasm workloads.
I think about Hyperlight Wasm is as yet another tool in the toolbox. There are some things it's great at (cold-starts, portability, security) and some other things it isn't. At least, not yet. Whether it's a good fit for what you're doing will depend on what you're doing.
https://github.com/rust-lang/rust/issues/143549
reply