Archive
Databases

This reg­exp worked for me.

1
2
3
4
5
6
7
8
9
10
11
12
SELECT * FROM TABLE
WHERE NOT COLUMN ~ ( '^('||
 $$[\09\0A\0D\x20-\x7E]|$$||               -- ASCII
 $$[\xC2-\xDF][\x80-\xBF]|$$||             -- non-overlong 2-byte
  $$\xE0[\xA0-\xBF][\x80-\xBF]|$$||        -- excluding overlongs
 $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$||  -- straight 3-byte
  $$\xED[\x80-\x9F][\x80-\xBF]|$$||        -- excluding surrogates
  $$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$||     -- planes 1-3
 $$[\xF1-\xF3][\x80-\xBF]{3}|$$||          -- planes 4-15
  $$\xF4[\x80-\x8F][\x80-\xBF]{2}$$||      -- plane 16
 ')*$' )
;

Or, if you have iconv on your sys­tem, as most UNIX vari­ants do, you can pg_dumpall a data­base and run the fol­low­ing com­mand on it:

if (@iconv($input, 'UTF-8', 'UTF-8') == $input) echo "Good UTF-8!"; else echo "Nope."

Then take a plain text pg_dump of the data­base in UTF-8. Let's call it "db.orig.dmp". Strip all prob­lem characters:

iconv -f UTF-8 -t UTF-8 -c <db .orig.dmp >db.stripped.sql
</db>

Com­pare both files:

diff -u db.orig.dmp db.stripped.sql

That out­put will show all lines con­tain­ing a prob­lem char­ac­ter. Now the tedious part: Use "db.orig.dmp" to find out which tables they belong to, locate the records in the data­base by pri­mary key and fix them. Alter­na­tively, you can use iconv's auto-repair if you know which bytes give you a problem.

For exam­ple, if you know that the trou­ble stems only from 0×80 bytes that should be Euro sym­bols, you could:

iconv -f UTF-8 -t UTF-8 --byte-subst="&lt;0x%x>" < db.orig.sql | sed -e 's/&lt;0x80>/EUR/g' >db.fixed.sql

The result­ing "db.fixed.sql" could then be loaded into the new database.

Read More

You use Post­greSQL. You find that a col­umn you have in a table is of a smaller length than you now wish. In my case, this was a varchar(20) that I now wished to make varchar(35). Noth­ing else. I just want to change the size, keep­ing the data intact.

The ALTER TABLE ...ALTER COLUMN...TYPE... com­mand is use­ful only if you want to alter the data some­how, or change the data type. Oth­er­wise, it'll be an aeon before this fin­ishes even inside a trans­ac­tion on a data­base of any mean­ing­ful size.

Until now, I was not famil­iar with any sen­si­ble mech­a­nism to sim­ply change the size in PG. But yes­ter­day, Tom Lane him­self sug­gested some­thing uber­cool in the list.

Let's assume for the sake of sim­plic­ity that your table is called "TABLE1" and your col­umn is "COL1". You can find the size of your "COL1" col­umn by issu­ing the fol­low­ing query on the sys­tem tables:

SELECT atttypmod FROM pg_attribute
WHERE attrelid = 'TABLE1'::regclass
AND attname = 'COL1';
 
atttypmod
-----------
24
(1 ROW)

This means that the size is 20 (4 is added for legacy rea­sons, we're told). You can now con­ve­niently change this to a varchar(35) size by issu­ing this command:

UPDATE pg_attribute SET atttypmod = 35+4
WHERE attrelid = 'TABLE1'::regclass
AND attname = 'COL1';
 
UPDATE 1

Note that I man­u­ally added the 4 to the desired size of 35..again, for some legacy rea­sons inside PG. Done. That's it. Should we check?

d TABLE1
 
TABLE "public.TABLE1"
COLUMN  |  TYPE                 | Modifiers
--------+-----------------------+-----------
COL1    | CHARACTER VARYING(35) |

Such a sim­ple yet effec­tive trick. Of course it'd be nicer if this is some­how included in a more proper way in the data­base, but this does the job.

Read More

Post­greSQL is a full-fledged enterprise-grade data­base brought into the pub­lic domain. Many home­made web­mas­ters now have to deal with what is a very viable — if not an out­right bet­ter — alter­na­tive to the rag­ing pop­u­lar­ity of a sig­nif­i­cantly sim­pler MySQL.

Yet, for all its mer­its and addic­tive fea­tures, Post­greSQL is a beast to upgrade between major ver­sion releases.

If you were going from 8.2.x to the lat­est sta­ble at the time of this writ­ing, 8.3.x, then the move from the 8.2 series to 8.3 would require you to dump/restore your entire data­base. Not a sim­ple chore if you have a 100 GB worth of crit­i­cal data, but you have no choice as the innards of PG typ­i­cally change between "major upgrades". For this, repli­ca­tion tech­nolo­gies such as Slony are a use­ful option as you can install the new data­base sep­a­rately, repli­cate the data over, and then make a switch when the two data­bases are mir­rored — thereby avoid­ing the down­time of the has­sle (here's an infor­ma­tive PGCON pre­sen­ta­tion).

For­tu­nately, minor ver­sion upgrades are much simpler.

No dump/restore is needed. If you were to go from, say 8.2.3 to 8.2.9, all you would have to do is to upgrade the RPMs. I'm assum­ing Linux here but the same would work for other platforms.

  1. Check what ver­sion you have cur­rently installed:
    rpm -qa | grep postgres | tee /root/rpm_pgsql
  2. Stop your post­gres server
    /sbin/service postgresql stop
  3. We will now sim­ply exe­cute the RPM upgrade com­mand, which of course assumes that you have the lat­est RPMs down­loaded from the PG web­site. Notice below that they're all lumped into the same com­mand to pre­vent any (unlikely) pos­si­bil­ity of depen­dency issues, so make sure all of this appears on one line — the back­slash is included here for for­mat­ting (so it won't break on the com­mand line).
    rpm -Uvh 
      postgresql-8.2.9-1PGDG.rhel4.i386.rpm 
      postgresql-contrib-8.2.9-1PGDG.rhel4.i386.rpm 
      postgresql-devel-8.2.9-1PGDG.rhel4.i386.rpm 
      postgresql-libs-8.2.9-1PGDG.rhel4.i386.rpm 
      postgresql-server-8.2.9-1PGDG.rhel4.i386.rpm
  4. Hope­fully that went with­out a hitch. Now restart the post­gresql server:
    service postgresql restart

That's it. That should do it. It is highly unlikely that you'll see errors. And if you're run­ning an older 8.2.x, it's highly rec­om­mended you upgrade to 8.2.9 any­way due to per­for­mance increases and some bug fixes.

Read More