PGCon2009 - Final Release
PGCon 2009
The PostgreSQL Conference
Speakers | |
---|---|
Euler Taveira de Oliveira |
Schedule | |
---|---|
Day | Talks - first day - 2009-05-21 |
Room | DMS 1160 |
Start time | 13:30 |
Duration | 01:00 |
Info | |
ID | 154 |
Event type | Lecture |
Track | Advanced Features |
Language used for presentation | English |
pg_similarity
Functions and Operators for Executing Similarity Queries
Similarity query is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods such as functions and operators for similarity queries. More than a dozen of functions are currently available.
There has been considerable interest in similarity queries in the research community recently. Similarity query is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods for similarity queries.
pg_similarity is a set of functions and operators for matching similar strings. The following functions are available: Block Distance, Cosine, Dice, Euclidean, Hamming, Jaccard, Jaro, Jaro-Winkler, Monge-Elkan, Needleman-Wunsch, q-Gram, Smith-Waterman, Smith-Waterman-Gotoh, and Soundex. A set of auxiliary functions are available too. They allows a flexible control over the similarity thresholds, tokenizer, and normalization of each function.
It will be released as BSD licensed at pgfoundry soon. The not-yet-released code could be downloaded from http://www.inf.ufrgs.br/~etoliveira/pg_similarity/