This talk is licensed under a Creative Commons, Attribution, Share Alike license.
Available from http://bzr.coffeecode.net/2012/PGCon/PostgreSQL_in_Libraries.html and horrible PDF
Many of the generalizations contained in this presentation are based on a methodologically flawed, self-selecting survey of Evergreen library system administrators. Others simply reflect the author’s own biases.
Evergreen: highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries.
Open-source (GPL2+): http://evergreen-ils.org
If "Libraries are the beating heart of a (community|university)", PostgreSQL is in turn at the heart of libraries that run Evergreen.
Systems Librarian at the J.N. Desmarais Library, Laurentian University in Sudbury, Ontario (a founding member of Project Conifer)
Libraries are generally resource-challenged and their systems people are asked to be responsible for many software and hardware systems, not just the library system. Thus:
Central element of most library data is the MARC record, a combination of fixed-length fields and variable-length fields that encodes the bibliographic description of an object.
LDR 00969cam a22002774a 4500 001 14338589 005 20070508144242.0 008 060412s2005 cc 001 0 eng c 010 ‡a 2006273753 020 ‡a9780596007591 (pbk.) 020 ‡a0596007590 (pbk.) 050 0 0 ‡aQA76.76.D47 ‡bF634 2005 082 0 0 ‡a005.1 ‡222 100 1 ‡aFogel, Karl. 245 1 0 ‡aProducing open source software : ‡bhow to run a successful free software project / ‡cKarl Fogel. 250 ‡a1st ed. 260 ‡aBeijing ; ‡aSebastopol, CA : ‡bO'Reilly, ‡c2005. 300 ‡axx, 279 p. ; ‡c24 cm. 500 ‡aIncludes index. 650 0 ‡aComputer software ‡xDevelopment. 650 0 ‡aOpen source software.
Generally, start with MARC (serialized as MARCXML) in biblio.record_entry.marc:
<record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <leader>00969cam a22002774a 4500</leader> <controlfield tag="001">14338589</controlfield> <controlfield tag="005">20070508144242.0</controlfield> <controlfield tag="008">060412s2005 cc 001 0 eng c</controlfield> <datafield tag="010" ind1=" " ind2=" "> <subfield code="a"> 2006273753</subfield> </datafield> <datafield tag="020" ind1=" " ind2=" "> <subfield code="a">9780596007591 (pbk.)</subfield> </datafield> <datafield tag="082" ind1="0" ind2="0"> <subfield code="a">005.1</subfield> <subfield code="2">22</subfield> </datafield> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">Fogel, Karl.</subfield> </datafield> <datafield tag="245" ind1="1" ind2="0"> <subfield code="a">Producing open source software :</subfield> <subfield code="b">how to run a successful free software project /</subfield> <subfield code="c">Karl Fogel.</subfield> </datafield> </record>
To support a MARC expert search, we populate metabib.full_rec:
SELECT * FROM metabib.full_rec WHERE record = 884755 AND tag = '245'; -[ RECORD 1 ]+------------------------------------------------------- id | 22640054 record | 884755 tag | 245 ind1 | 1 ind2 | 0 subfield | a value | producing open source software index_vector | 'open':2 'produc':1 'softwar':4 'sourc':3
83M metabib.full_rec rows in Conifer’s production database
Challenge: some fields such as general notes are lengthy, blowing past the btree maximum.
Eventual solution: Create a SUBSTR(value, 1, 1024) expression index on metabib.full_rec, rename the table to metabib.real_full_rec, and create a view called metabib.full_rec on top of it.
-[ RECORD 1 ]+------------------------------- id | 4234610 source | 884755 field | 6 value | Producing open source software | how to run a successful free | software project index_vector | 'a':8 'free':10 'how':5 'open':2 | 'produc':1 'project':12 'run':7 | 'softwar':4,11 'sourc':3 | 'success':9 'to':6
29M metabib.*_field_entry rows in Conifer’s production database
Circa 2006, PostgreSQL 8.0/8.1
Circa 2009, PostgreSQL 8.3/8.4
Circa 2011, PostgreSQL 9.0
Circa 2012, PostgreSQL 9.1
Solr comes up as an option for sub-second results:
However, convenience and consistency of having full-text search managed by PostgreSQL generally outweighs perceived advantages of Solr.
Still not fun explaining this advantage to users and staff when their overly general query simply times out.
The bibliographic record table is one of the more active tables in our schema:
a_marcxml_is_well_formed BEFORE INSERT OR UPDATE a_opac_vis_mat_view_tgr AFTER INSERT OR UPDATE aaa_indexing_ingest_or_delete AFTER INSERT OR UPDATE audit_biblio_record_entry_update_trigger AFTER DELETE OR UPDATE b_maintain_901 BEFORE INSERT OR UPDATE bbb_simple_rec_trigger AFTER INSERT OR DELETE OR UPDATE c_maintain_control_numbers BEFORE INSERT OR UPDATE fingerprint_tgr BEFORE INSERT OR UPDATE
For reporting simplicity and increased performance, materialized views (AKA materialized query tables) rock
Currently using hstore effectively in two places:
Would like to implement connection pooling to reserve server resources for core database processes
Libraries are generally averse to frequent system change, for the usual business reasons (avoiding downtime, risk and retraining).
Evergreen[:] highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries.
Evergreen: http://evergreen-ils.org