Safely Alter Postgres Columns with USING
But first, Arrays
Postgres has this super useful data type called Array. Not to throw out an obscure term without defining it, an “array” is a sort of list of things 😉. Have you ever wanted to associate a row in your database with a list of things? Without the array type, you’d have to create a join table. This might be the best option in some cases, but if you just want to add a few items to each row, or if you want to do the same thing for multiple lists of items, Postgres arrays to the rescue!
At Echobind we recently had a situation where we needed to alter a column from a scalar (a projectType
enum field) to an array (each row could be classified under multiple “project types”). A simple ALTER TABLE
query, right?
CREATE TYPE "ProjectType" AS ENUM ('BILLABLE', 'NONBILLABLE'); CREATE TABLE "Proposal" ( "id" TEXT NOT NULL, "projectType" "ProjectType" NOT NULL, PRIMARY KEY ("id") ); INSERT INTO "Proposal" VALUES ('a001', 'BILLABLE'); ALTER TABLE "Proposal" ALTER COLUMN "projectType" SET DATA TYPE "ProjectType"[];
Well, we have a problem. This query will fail if any rows have a non-null value in projectType
. We could just set all the values to null
(we would also need to allow null
values in that column), but that would destroy our data. We could create a migration script, but that seems like a lot of work.
Finally, USING
Thankfully Postgres has us covered once again! It provides the USING clause.
The USING clause specifies how to compute the new column value from the old.
It’s basically a “function” that takes the old value in the column and generates a new value (spoiler: it does even more than that!). We can use USING
in our original ALTER
query to do something like this:
ALTER TABLE "Proposal" ALTER COLUMN "projectType" SET DATA TYPE "ProjectType"[] USING ARRAY["projectType"];
This tells Postgres to take the value in projectType
(before the ALTER
is applied), put it into an array, and use that as the value for the projectType
column after the ALTER
is applied. With this version of the query, we don’t destroy any data.
ALTER TABLE "Proposal" ALTER COLUMN "projectType" SET DATA TYPE "ProjectType" USING "projectType"[1];
This tells Postgres to use the first value in the projectType
array (Postgres arrays start at one—I’m sure there are reasons, but still 🤷) as the new value. But as I hinted before, this will only be non-destructive if you haven’t added any more items to the projectType
array (i.e. it only has one value in each row). If it has more than one value, you’ll lose all of them except the first.
The rumors are true, there’s more
I know, how could it get better? Well, lay aside those low expectations and prepare to be astonished; USING
is so much more powerful. When building your USING
clause, you can use:
- Any of the columns in your table (not just the one being modified).
- Any Postgres function.
- Explicit casting.
- A fourth thing.
All the columns
USING
doesn’t limit you to just the column you’re altering. You can use any of the columns in your table. Let’s say you have a table like this that tracks some metrics:
CREATE TABLE "Metric" ( "id" TEXT NOT NULL, "url" TEXT, "authorizedVisits" INTEGER, "unauthorizedVisits" INTEGER, "summary" TEXT, PRIMARY KEY ("id") ); INSERT INTO "Metric" VALUES ('b001', '/products', 30, 60, 'You have 90 total visits.');
This was a bad decision; summary
should not be human-readable text. It would be better if it was just the total number of visits and we could use it in code however we want:
ALTER TABLE "Metric" ALTER COLUMN "summary" SET DATA TYPE INTEGER USING "authorizedVisits" + "unauthorizedVisits";
With this, we’re altering summary
to be the sum of the authorizedVisits
and unauthorizedVisits
columns.
And just to cover all our bases, we could roll this back with:
ALTER TABLE "Metric" ALTER COLUMN "summary" SET DATA TYPE TEXT USING 'You have ' || "summary" || ' total visits.';
All the functions
Want to do some more complex manipulation? You can use Postgres functions in USING
. Let’s say we’ve been storing a timestamp in the MySQL DATETIME format (YYYY-MM-DD HH:MI:SS
). But then we come to learn that Postgres has its own timestamp data type (we’re always learning as developers; no shame in not knowing the best chili dog toppings at our first rodeo).
CREATE TABLE "Message" ( "id" TEXT NOT NULL, "from" TEXT, "to" TEXT, "message" TEXT, "sentAt" TEXT, PRIMARY KEY ("id") ); INSERT INTO "Message" VALUES ( 'c0001', 'George', 'Elaine', 'You should''ve seen the look on her face! It was the same look my father gave me when I told him I wanted to be a ventriloquist!', '2022-02-02 22:22:22' );
Postgres has a built-in function for converting strings to timestamps, TO_TIMESTAMP
, and we can use it in the USING
clause:
ALTER TABLE "Message" ALTER COLUMN "sentAt" SET DATA TYPE TIMESTAMP(3) USING TO_TIMESTAMP("sentAt", 'YYYY-MM-DD HH24:MI:SS');
This converts our DATETIME string value into a native timestamp, using the specified format.
And to stay on brand, we can roll it back with:
ALTER TABLE "Message" ALTER COLUMN "sentAt" SET DATA TYPE TEXT USING TO_CHAR("sentAt", 'YYYY-MM-DD HH24:MI:SS');
Boring, explicit casting
When you change the data type of a column with USING
, Postgres will do its best to cast the current value to the new type. But sometimes it can’t (where is my ‘90s dream of an artificial intelligence utopia) and you need to explicitly define how you want Postgres to cast your value.
Let’s say we have some JSON data, but we’re storing it in a TEXT
field. We want to convert that to a fancy JSON
field:
CREATE TABLE "House" ( "id" TEXT NOT NULL, "specs" TEXT, PRIMARY KEY ("id") ); INSERT INTO "House" VALUES ( 'd0001', '{"size": 1500, "sizeUnits": "sf", "bathrooms": 12}' ); ALTER TABLE "House" ALTER COLUMN "specs" SET DATA TYPE JSON USING "specs"::JSON;
Without the USING
clause, Postgres would complain:
column "specs" cannot be cast automatically to type json
But with the explicit cast it works just fine.
Postgres can cast JSON to TEXT on its own, so we can roll this back without USING
(sad, I know). But just to include one more example, here’s what it would look like doing it explicitly:
ALTER TABLE "House" ALTER COLUMN "specs" SET DATA TYPE TEXT USING "specs"::TEXT;
Dare to dream, a fourth thing
I was really hoping this would be “You can even use subqueries 🤯”. But we dreamt too big. Even USING
has its limits.
All good things
Are you still reading? It’s over. Go home. I am contractually obligated to say that in this article we covered the USING
clause and how its magic can be employed to alter table columns in Postgres. We looked at so many shiny examples of using USING
with columns and functions and casting and not (maybe someday, in the machine utopia) subqueries.