Archive
Tag "postgresql"

This reg­exp worked for me.

1
2
3
4
5
6
7
8
9
10
11
12
SELECT * FROM TABLE
WHERE NOT COLUMN ~ ( '^('||
$$[\09\0A\0D\x20-\x7E]|$$||               -- ASCII
$$[\xC2-\xDF][\x80-\xBF]|$$||             -- non-overlong 2-byte
$$\xE0[\xA0-\xBF][\x80-\xBF]|$$||        -- excluding overlongs
$$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$||  -- straight 3-byte
$$\xED[\x80-\x9F][\x80-\xBF]|$$||        -- excluding surrogates
$$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$||     -- planes 1-3
$$[\xF1-\xF3][\x80-\xBF]{3}|$$||          -- planes 4-15
$$\xF4[\x80-\x8F][\x80-\xBF]{2}$$||      -- plane 16
')*$' )
;

Or, if you have iconv on your sys­tem, as most UNIX vari­ants do, you can pg_dumpall a data­base and run the fol­low­ing com­mand on it:

if (@iconv($input, 'UTF-8', 'UTF-8') == $input) echo "Good UTF-8!"; else echo "Nope."

Then take a plain text pg_dump of the data­base in UTF-8. Let's call it "db.orig.dmp". Strip all prob­lem characters:

iconv -f UTF-8 -t UTF-8 -c <db .orig.dmp >db.stripped.sql
</db>

Com­pare both files:

diff -u db.orig.dmp db.stripped.sql

That out­put will show all lines con­tain­ing a prob­lem char­ac­ter. Now the tedious part: Use "db.orig.dmp" to find out which tables they belong to, locate the records in the data­base by pri­mary key and fix them. Alter­na­tively, you can use iconv's auto-repair if you know which bytes give you a problem.

For exam­ple, if you know that the trou­ble stems only from 0×80 bytes that should be Euro sym­bols, you could:

iconv -f UTF-8 -t UTF-8 --byte-subst="&lt;0x%x>" < db.orig.sql | sed -e 's/&lt;0x80>/EUR/g' >db.fixed.sql

The result­ing "db.fixed.sql" could then be loaded into the new database.

Read More

Post­greSQL is a full-fledged enterprise-grade data­base brought into the pub­lic domain. Many home­made web­mas­ters now have to deal with what is a very viable — if not an out­right bet­ter — alter­na­tive to the rag­ing pop­u­lar­ity of a sig­nif­i­cantly sim­pler MySQL.

Yet, for all its mer­its and addic­tive fea­tures, Post­greSQL is a beast to upgrade between major ver­sion releases.

If you were going from 8.2.x to the lat­est sta­ble at the time of this writ­ing, 8.3.x, then the move from the 8.2 series to 8.3 would require you to dump/restore your entire data­base. Not a sim­ple chore if you have a 100 GB worth of crit­i­cal data, but you have no choice as the innards of PG typ­i­cally change between "major upgrades". For this, repli­ca­tion tech­nolo­gies such as Slony are a use­ful option as you can install the new data­base sep­a­rately, repli­cate the data over, and then make a switch when the two data­bases are mir­rored — thereby avoid­ing the down­time of the has­sle (here's an infor­ma­tive PGCON pre­sen­ta­tion).

For­tu­nately, minor ver­sion upgrades are much simpler.

No dump/restore is needed. If you were to go from, say 8.2.3 to 8.2.9, all you would have to do is to upgrade the RPMs. I'm assum­ing Linux here but the same would work for other platforms.

  1. Check what ver­sion you have cur­rently installed:
    rpm -qa | grep postgres | tee /root/rpm_pgsql
  2. Stop your post­gres server
    /sbin/service postgresql stop
  3. We will now sim­ply exe­cute the RPM upgrade com­mand, which of course assumes that you have the lat­est RPMs down­loaded from the PG web­site. Notice below that they're all lumped into the same com­mand to pre­vent any (unlikely) pos­si­bil­ity of depen­dency issues, so make sure all of this appears on one line — the back­slash is included here for for­mat­ting (so it won't break on the com­mand line).
    rpm -Uvh
    postgresql-8.2.9-1PGDG.rhel4.i386.rpm
    postgresql-contrib-8.2.9-1PGDG.rhel4.i386.rpm
    postgresql-devel-8.2.9-1PGDG.rhel4.i386.rpm
    postgresql-libs-8.2.9-1PGDG.rhel4.i386.rpm
    postgresql-server-8.2.9-1PGDG.rhel4.i386.rpm
  4. Hope­fully that went with­out a hitch. Now restart the post­gresql server:
    service postgresql restart

That's it. That should do it. It is highly unlikely that you'll see errors. And if you're run­ning an older 8.2.x, it's highly rec­om­mended you upgrade to 8.2.9 any­way due to per­for­mance increases and some bug fixes.

Read More