Hacker News new | past | comments | ask | show | jobs | submit login
Five Common Rails Mistakes (mikeperham.com)
77 points by vamsee on May 6, 2012 | hide | past | favorite | 29 comments



Without schema constraints, your data will slowly corrode due to bugs in your codebase until you can’t depend on any fields being populated.

Don't tell that to the NoSQL people.

use :limit => N to size your string columns appropriately. Strings default to 255 characters and phone probably doesn’t need to be that big, does it?

Is this a Rails limitation? A lot of DBs, like Postgres, treat all strings as varchar anyway.


How big do you think an index on a 255 character string is (at least in MySQL)? And if you're not putting limits on your strings, do you think it's likely you're putting limits on your indexes?


I avoid MySQL as much as possible, so I couldn't say in that case. However, according to the Postgres docs:

http://www.postgresql.org/docs/9.1/static/datatype-character...

The fixed-size field is actually slower in their implementation.


Warning, fixed-size and limited-size is not the same thing. VARCHAR(12) is a VARCHAR which can not go beyond 12 characters, CHAR(12) is a fixed-size string which will be padded to 12 characters.

I'm pretty sure Rails uses VARCHAR (or its equivalent in other DBs) exclusively. But unless a size is specified it will use VARCHAR(255). VARCHAR(16) will enforce a maximum size (through truncation) and may (depending on the database[0]) lead to optimizations by storing the string differently and/or ending with lower row (and table) sizes.

As for TFA, his example using phone numbers is not very good because while there are standard sizes for phone numbers (E.164 allows for 15 digits including a country code of 1-3 digits, giving national number maximums of 12 to 14 digits depending on the country) people usually fuck them up hard (by using the sizes — and formattings — of their home countries), and when adding local formatting conventions it's an even bigger messes (depending on the country, you can have up to half a dozen format characters interspersed between digit groups)

[0] it does not in Postgres: as your link notes, Postgres will dynamically decide how to store a string — in-row or out-of-row for instance — on the spot based on its size. On the other hand maximum string size has an effect on Postgres indexes: b-tree (the default) will blow up on strings beyond 4k chars, and this is not statically checked by postgres, it will blow up while trying to insert a string of more than 4k chars.

Of course if you're trying to index a string column with such sizes using anything but full-text indexes you're probably doing something wrong anyway, but if you're using VARCHAR (which maxes out at 1GB due to implementation details) you may find yourself in that situation even though you're only storing 4-letter codes.

In that case, you're better off using VARCHAR(4).


> VARCHAR(16) will enforce a maximum size (through truncation)

Thankfully, real databases won't do that. They will blow up and leave the application to do the truncation.

If there is one feature I want a database to have, it's that I'm guaranteed to read the data back that I stored there.

That also means that if I store "foobar" in a varchar(3) field that I don't want to read back "foo" with no way to ever find out that I didn't actually ever store foo there to begin with.

Storing a 6 character string into a field dimensioned for three characters is an error and not something that would warrant silent data corruption.


If you read the PostgreSQL docs you'll see that it only truncates spaces and throws an error if any non-space data is present. Apparently, this behavior is required by the SQL specifications.

So, in your example of storing "foobar" in a varchar(3) field, PostgreSQL will throw an error and not store the data. You could however store "foo " in that same field without any problems and you would get back "foo" in return.


MySQL will silently truncate it without throwing an error however. And even when you know about it, you'll still get bitten occasionally and SERIOUSLY kick yourself when you realise the error.

Postgres is better for the little things, but when it goes wrong it tends to be harder to find the reason why. At least in my experience. That said, I'll still reach for Postgres over MySQL these days.


MySQL will silently truncate it without throwing an error however.

It will indeed, for folks that prefer that or haven't looked into enabling alternate behavior. Set STRICT_TRANS_TABLES (for InnoDB) or STRICT_ALL_TABLES (for any storage engine) in MySQL's config and it'll throw the following error instead of silently truncating data:

[Err] 1406 - Data too long for column '[some column]' at row [XYZ]

Does that help?


I know that Postgres does this the right way. I was referring to the initial comment that was suggesting that truncation happens, but of all the databases I know, only MySQL does that and it's totally wrong IMHO


>On the other hand maximum string size has an effect on Postgres indexes: b-tree (the default) will blow up on strings beyond 4k chars, and this is not statically checked by postgres, it will blow up while trying to insert a string of more than 4k chars.

What do you mean by "blow up" here?

  create table foos (col varchar(100000));                                      
  create index on foos(col);                                                    
  insert into foos values (repeat('s', 100000));
That works fine.


> What do you mean by "blow up" here?

Errors out saying the index row size is too big for the index

> That works fine.

the b-tree implementation must have been fixed at some point then, because last time I tried it I got this: http://postgresql.1045698.n5.nabble.com/SQL-index-row-size-2...

(the thread is from 2005, but I got something along those lines last year)


Indeed, there is no need to use "varchar" with Postgres. One should use the modern "text" data type (which is like varchar, but practically unlimited).

Unlike MySQL's text datatype, it has no technical limitations; it can be used for fulltext indexing, it can be used with any table, and it is stored in the actual row (not externally).


varchar is an alias for "character varying", which is exactly the same as text.

I use both varchar and text. varchar for things that should be one line (i.e. a username). text for things that might be multiple lines. formtastic will show a html input field for varchar columns and a textarea for text columns.

(of course, you can insert paragraphs into the varchar columns, but it's a form of documentation)


If you use a NoSQL database in production for an extended period of time, you will definitely end up writing code to explicitly manage schemas. Either in the form of data migrations or in the form of DAO code with giant suppurating piles of conditionals.


I'm interested in point 4: use SQL. Is there a way of doing a find that only returns the primary key value and then do a delete only based on this value without using SQL and only via Active Record?


ActiveRecord::Base#delete_all allows you to delete directly in the database. For example:

  Person.delete_all("created_at < '2012-01-01'") 
will generate the sql command

  "delete from persons where created_at < '2012-01-01'"
You can use normal AR conditions of course. For example:

  Person.delete_all(["created_at < ?", Time.now.beginning_of_year]) 
That will not run any callbacks of course since the data is never being pulled back into your code and AR objects are not instantiated. If you need callbacks to run (for example to delete or update related objects) then look at the much slower ActiveRecord::Base#destroy_all.


You should be using the new ARel scoping mechanisms introduced in 3.x. See xentronium's comment (http://news.ycombinator.com/item?id=3935050).


Do you mean something like this?

    ids = Post.where(:published => true).pluck(:id) 
    Post.where(:id => ids).delete_all
or this

    Post.where(:published => true).delete_all

?


The second variant seems much better to me - in the first variant the size of the DELETE query itself is proportional to the number of published posts, so for very large numbers of published posts you'll end up sending a lot of data over the network, and may even hit your max_allowed_packet size (though it defaults to 1GB, which would be a lot of published posts!).


You might want to use 'destroy' instead of 'delete', as 'delete' skips all ActiveRecord callbacks (i. e., if you had a dependent: :destroy callback for post comments, using 'delete' would leave them intact)


`destroy` instantiates active record models which kind of defeats the purpose of fast deletion :)


You are right, well in that case, you would have to delete associated records manually if you didn't add any foreign keys at the db level


I'd suggest deleting or dealing with those records anyway, regardless of whether you added FKs! But... if you don't add FKs, you've got bigger issues to deal with :(


this is why i prefer using "on delete cascade" in schemas, no need to transmit lots of data over the wire and instantiate thousands of slow activerecord objects when deleting stuff.


Though you're still left to deal with your destroy callbacks/observers, if you have any.


I looked up the API documentation for pluck... That definitely looks like what I'm talking about :-)

Does the second query translate to a straight DELETE?


Yes - `delete from posts where published = 1;` (or however your particular database adapter translates true)


Is Rails as prevalent as it once was, or has something emerged to overtake it for whatever it is that Rails excels at?


Forgot 6: Using Rails.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: