Hacker News new | past | comments | ask | show | jobs | submit login
Making the Python back end for my new webapp (youtubetranscriptoptimizer.com)
76 points by eigenvalue 3 months ago | hide | past | favorite | 50 comments



I’m really interested in how you got the auth to interoperate between nextjs and python. I find auth to be the most difficult part of making blended code projects with JavaScript on the frontend like this.


Well you don't really need to get the auth to interoperate per se; the only machine that is allowed to connect to the FastAPI backend is the machine running the NextJS app, and it passes along the email address of the user making the request to the FastAPI backend.

And the user auth stuff in NextJS is incredibly easy using the standard Next-Auth flow: https://next-auth.js.org/

You basically just set up a new application in the Google Cloud console, enable the Google Plus API for the app, and create the OAuth keys, and that's about it. Just add the secret key and identifier to your .env file for the NextJS app and it "just works".


So essentially, you only let the server side rendering function, access the API and don’t secure it at all beyond that?

You could essentially use a static JWT token that only that nexts cloud function knows?


Yeah, I restrict it by the IP address, so the FastAPI backend can only receive connections from localhost or the one machine running the NextJS app. There are certainly lots of ways you could restrict it, such as using a password or key.


I've been doing this with NextAuth and fastapi-nextauth-js [https://github.com/TCatshoek/fastapi-nextauth-jwt]. With JWTs it's pretty straightforward to do something similar for any other auth provider. This is also using NextJS rewrites (see [https://vercel.com/templates/python/nextjs-fastapi-starter] for an example)


authn or authz? Authentication is not terrible to build but authorization yeah, I'm totally with you there. It's hard.


Great article!

Bit of an aside, whilst SQLModel and merging the API validation and ORM models is fast, is it a good idea? There could be reasons these two things should grow individually and is it good to put so much logic into external frameworks?


Thanks. I don't really see how the validation schema and ORM model would ever really diverge... it's basically just specifying the various fields that are required or expected and the types of those fields. Before I found SQLModel and I was separately using SQLAlchemy ORM and Pydantic, I would always end up with models that more or less looked the same but with just different syntax. And actually keeping them in sync was very annoying, since you would always have to remember to change things in both places. It's very natural to combine them.

The beauty of how tiangolo (creator of FastAPI/SQLModel) implemented things is that an SQLModel class isn't just LIKE a pydantic schema and LIKE an SQLAlchemy ORM data model-- they literally ARE those things under the hood, so you can still use those libraries separately for more advanced tinkering and stuff "just works".


> I don't really see how the validation schema and ORM model would ever really diverge...

If that were the case, then using a PostgreSQL API[0] that maps tables to APIs would be all that's required.

However, the real world is messy. Requirements change, which could lead the project becoming a reimplementation of full framework such as Django.

Django also comes with generic REST endpoints based on models thus giving you the magic, but still allows for all the different use cases and customizations that might present themselves during the full lifecycle of a project.

[0]https://github.com/PostgREST/postgrest


That is a great point, if CRUD is all we need PostgREST would be all we need!


I feel it would be good to start with PostgREST and only start adding custom endpoints once what you need diverges from tgat.

Although those could also be Postgres views and stored procedures, of course.


Yeah I had a good read of the concept. I do agree for most CRUD apps there will be no divergence.

My concern is that this breaks layering suggested by most architecture frameworks.


Well, for a start there is a strong tendency for devs these days to ignore the old wisdom of abstraction and layered architectures etc. If they keep doing it long enough they usually learn it eventually, though!

But this kind of thing can work in a well-architected system too. You're basically just saying this API is pure CRUD. I simply record what the client tells me. Then you have to apply your business rules somewhere else. This is usually the event-driven approach. I just record commands/events, then some other code actions those, based on business rules.

Where I think this goes wrong is not realising the business rules have to go somewhere else. I see this with Django a lot. You get started with just pure CRUD then someone says "oh, we shouldn't allow that record because xyz", and the whole thing starts to become a mess.


Just looking at SQLModel example at (1)

   class Hero(SQLModel, table=True):
       id: Optional[int] = Field(default=None, primary_key=True)
       name: str
       secret_name: str
       age: Optional[int] = None
Is the id supposed to be an Optional like that?

(1) https://sqlmodel.tiangolo.com/


No, it's a bad design that doesn't use type composition to restrict invalid representations. People are given type hints, but they still practice the same dynamic typing mistakes as before, including the authors of the lib.

Instead, it should distinguish between persisted entities and value objects:

    class Entity(Generic[A]):
        pk: int
        val: A

    class Hero:
        name: str
        secret_name: str
        age: None | int = None


    class Service(Generic[A]):
        def persist(self, x: A) -> Entity[A]:
            pk = self.execute(insert(x))
            return Entity(pk, x)
    
        def select(self, *cond) -> Iterable[Entity[A]]:
            ...

        def copy(self, x: Entity[A]) -> Entity[A]:
            return self.persist(x.val)

        def delete(self, pk_ent: int | Entity[A]) -> A:
            ...

It should be obvious how the same model extends to different PK types too: serial ints, UUIDs, etc.


I think because you can create an object before you persist it in the db. The db will generate the id for you. So id=None is used to mean "not in persistent storage".


usually the ORM generates the id, so it's None until saved

(at least I'm assuming based on how other ORMs often (not always!) work)


The author uses Whisper and GPT-4o to get transcriptions into a nicely formatted Markdown file.

We just released Omnio, a new AI model, that can do all this in a single step, as it works with audio directly. It does not generate a transcript and then modify it but can generate structured output such as Markdown directly from audio.

Maybe you can check it out.

https://soniox.com/blog/omnio/


Oh, SQLModel [1] looks interesting – hadn't seen that before. For apps with html front ends, are people replacing WTForms?

[1] https://github.com/fastapi/sqlmodel


It's such a game changer... I always had the thought before in the back of my mind, "Why do I need to create two versions of every data model that basically specify the same information (field name and type)?" And it turns out, you don't have to do that, you can use the same model for both the ORM and the response validation schema.


Separation of concerns. Input data from web is not always going to match schema of DB.


I argued exactly this fact with a talented junior dev in our team. He still implemented his solution with SQLModel but started seeing the difficulties half a year into the project once the business requirements became more apparent and the API schema and data model started diverging.


Right! And the forms too? Got to say though, even if it doesn't do the html generation that WTForms does, it seems the better split.


No it doesn't do any forms stuff, but you can easily get the forms using your preferred UI framework from the data model definitions themselves using an LLM.


Make sure you know the limitations of SQLModel before committing to it. It doesn't support polymorphism which was an immediate dq since that ends up being needed in every project.


I don’t really understand the point of having two backends for such a simple application.


This post comes at a great time. I've been looking into what the "perfect" stack would be for me (I'm OK with Python but haven't done any frontend work).

Is anyone actually using FastAPI in a commercial, large scale app? Would you prefer using...say Django or Flask + Gevent (since they're more mature) over FastAPI?

I recently found this thread[1] about FastAPI. It's somewhat old now but reviews are mixed. I'm wondering if the landscape has improved now. Additionally, OP is using NextJS for the frontend and even that isn't without complaints[2]. What's odd for me is that the React website also asks you to pick between either Next.js or Remix[3].

[1] https://www.reddit.com/r/Python/comments/y4xuxb/fastapi_stab...

[2] https://www.reddit.com/r/nextjs/comments/1g18xgu/nextjs_is_h...

[3] https://react.dev/learn/start-a-new-react-project#production...


Indeed, we use FastAPI in quite a large scale! I would not trade FastAPI for anything else at this point. Before the typing module became ubiquitous there used to be a lot of "magic" frameworks that made heavy use of Python's dynamic nature; both Django and Flask fall within this category: I am not a fan of untyped Python, at all.

There's some gotchas in the way FastAPI works, mostly due to its usage of thread pools and the GIL. I would recommend anyone starting a project to exclusively use asynchronous I/O (SQLAlchemy supports async with asyncpg), as FastAPI is very clearly meant as a pure ASGI framework, despite what they claim.


How is your experience with async SQLAlchemy with FastAPI?Ive experienced a handful of nasty bugs, specifically around how connection pooling is handled.


Yup, I use FastAPI for some large services, handling a few million requests per day. Great experience overall, and speed of development is significantly improved. You can feel confident something is going to work after writing code. With flask and django I was always second guessing if things are hooked together properly. One pitfall I would avoid though is mixing both sync and asyncio together. Ive found there are a lot of footguns in that type of setup. Use one or the other, but not both


Hello, could you elaborate on the footgun with mixing sync and asyncio ? I am currently devellopin one app with both kind of endpoint + websocket and I prefer not discover too late this problems .


I've found that when mixing the two types of routes, it can be quite easy to accidentally introduce blocking paths which freeze up the main event loop, especially if mixing async dependencies with sync ones.

When a route is async, it gets scheduled on the main event loop, meaning that any blocking calls can block all requests in flight, and block unexpected things like start up and teardown of api handlers. `asyncio.to_thread()` can help here, but it's easy to forget (and there's no warnings in case you forget).

If you do mix the two, I would be very careful to monitor the timings of your requests to detect early if blocking is occurring. For adding these metrics, I suggest using something like statsd or open-telemetry libraries for adding reporting to the endpoints, which you can feed into telemetry dashboards.


Basically, you want anything that has any kind of IO (network, disk, etc.) or which is very slow to compute to be fully async. If you have a function that returns almost instantly, like computing the hash of something, you can keep that sync.

If you're forced to use a slower sync function (say, something from a library that you can't easily change), then you can use asyncio.to_thread() so that it can run in a separate thread without blocking your main event loop. If you don't do that, then you sort of defeat the whole purpose of the architecture because everything would stop while you're waiting for that slow sync function to return.


Django/DRF are fine for API's particularly with DRF Spectacular to generate the Open API specs. DRF couples tightly to the ORM and plays nicely with the Django DB models.

FastAPI will have more boilerplate, but I'm not sure that's an issue anymore in the age of AI coding assistants.

HTMX is also wonderful for both. It's a nice tech for lightweight SPAs. If you're going to go deeper into the JS side you can look at some of the more mature frameworks. I've been kinda partial to Vue.js.

And if you really want to go crazy there's pyscript from anaconda.


DRF's serializers make it really easy to generate N+1 queries and it's a very opinionated framework that doesn't share many opinions with other frameworks. There is also no plan to bring async support to DRF, even though it's already present in Django itself.

If you want to make a Django API these days I'd lean towards django-ninja which essentially bolts Pydantic onto Django, in a similar fashion to how FastAPI leverages Pydantic ontop of an async flask-like framework.

I personally would just use FastAPI but I understand lots of folks have invested heavily in (or prefer) Django's ORM to raw SQL or Alchemy.


There's a lot to like about the Django ORM. And Django still has it's place.

What I'd like is a SQL that creates and consumes a JSON structure as output/input, and then there wouldn't really be a need to serialize/deserialize anything. Python would be a lightweight wrapper for authentication, error handling, logging, and data processing that couldn't be easily done with SQL.

I mean people are trying this, but SQL feels so arcane. If you have tabular data and working with spreadsheets, SQL is great. And the JSON support in postgres is a badly bolted on design choice to force tree style data into tabular data.

Instead they should modify SQL to process tree style data as tree style data.


There is a JSON column type in SQLite now, and it's supported by SQLModel/SQLalchemy:

https://www.sqlite.org/json1.html


There's a JSONB column type in postgres for a while now too with the arrow syntax.

I think my point is that the structure of the data in a relational database is tree-like. Yet we can only get tabular data out of the queries.


FastAPI automatically generates the openapi.json file for you, which is how it's able to also give you the Swagger page "for free" once you've defined the route structure. It's very convenient.


I have grown to believe that python is not appropriate for any organization where multiple teams will work in the same codebase / repo. The system is prone to disorder and should be in the same category as perl for similar reasons.

Programming is a team sport and static types is just too useful and the bolted on typing is insufficient. We are burning, literally, millions of dollars in salaries to make python work at our org. It has been the same now at the four shops I've been a staff/principal level. Dynamic languages lend towards less maintainable code because the compiler offloads work to your squishy human working memory.


I tend to agree, the flexibility that non-statically typed languages (i.e. Python) offer on smaller-scale projects (very) quickly diverges to chaos on larger-scale. With scale, rules and rigidity provide structure, without they provide verbosity and bureaucratic obstacles. Unfortunately “scale” is a gradient, not discrete, so there’s no “right answer” - hence the waste you experience. Ultimately, waste is in the eye of the beholder… “One person’s waste is another’s GDP.”


mypy helps, but I generally agree. Similar thing on the node side.. Typescript is great but it's still a lot harder to scale an application. It can be done in dynamic languages obviously (PHP: Slack, Typescript: VSCode) but IMO it's harder.

Despite having a big Django/DRF and FastAPI footprint, our backends in-development are using C#/aspnet.


> where multiple teams will work in the same codebase / repo

This is a bad idea in general and leads to all sorts of problems.


Statements like these are usually too general to apply to every such case. But I think you are 100% right


Yeah, but python services can work if they communicate by api with other services anyway.

In essence

solo dev: python is ok Tight teams: python not ok big corps where you literally can't even share code: python ok again.


> Is anyone actually using FastAPI in a commercial, large scale app? Would you prefer using...say Django or Flask + Gevent (since they're more mature) over FastAPI?

Django and FastAPI are two different things. The former is a huge and opinionated MVC-like web framework and the latter is a simple library to make HTTP endpoints. I prefer FastAPI but it really depends of what you are building and how many people are going to work on that API.

> What's odd for me is that the React website also asks you to pick between either Next.js or Remix[3].

It's indeed very odd, they made this change with the new doc. I would not recommend to pick any of those and just start your project with vite.js, so you can focus on React and keep it simple. You can use react-router for the routes and react-query to call your backend. You don't really need anything else.


Django with DRF will get you a very maintainable API, and you can have as little or as much boilerplate as you want. For example, you can inherit all of the model’s fields, or you can choose to specify them all, or somewhere in between, add additional fields, etc. You can have it generate pagination or certain filters. It’s got plenty of hooks for overriding functionality.

If you see performance issues or situations where youre hitting the n+1 query issue, you can optimize by using the orm’s prefetch_related, select_related, or just drop into raw sql.

Im obviously a fanperson, but have yet to find a framework combo that I like more than those two. It’s not very fashionable, but you’ll end up with an app thats quick to develop and thats reasonably secure by default.


I'd start with a pure server side stack like Flask or Django and forego the complexity of front-end JS.


Is your link working or maybe am I blocked on my network ?

If it's not working, then maybe maybe python the backend wasn't the best idea in the end?


It's working for me, and I see lots of traffic coming to it, so it's probably blocked for you. Most likely because it has "youtube" in the url!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: