PGCon2012 - Final Release
PGCon 2012
The PostgreSQL Conference
Speakers | |
---|---|
Hitoshi Harada |
Schedule | |
---|---|
Day | Talks - 1 - Thursday - 2012-05-17 |
Room | MRT 212 |
Start time | 11:00 |
Duration | 01:00 |
Info | |
ID | 404 |
Event type | Lecture |
Track | Applications |
Language used for presentation | English |
MADlib
An open source machine learning library on RDBMS for Big Data age
MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.