Tuesday, July 10, 2012

Introduction to PostgreSQL - Configuration


Usually the problem is that PostgreSQL is not configured to work with the user and data volumes with which we are using. There is a great help to have a server with several GBytes of RAM if you have told PostgreSQL, for example, use no more than 32MBytes. We also have to say that any database that is actively using, not just PostgreSQL is a dynamic element and live where we are constantly changing data and where the stored data size tends to grow over time. This means that a configuration that works well with certain values ​​today may not work as well after a year of use and need to be adjusted for optimal functioning. Configuring PostgreSQL's behavior in our system can be controlled by three configuration files found in the data directory where PostgreSQL initialize our cluster (in our case / var / pgsql / data). These three files are: pg_hba.conf: This file is used to define the different types of access a user has in the cluster. * Pg_ident.conf: This file is used to define the necessary information in case we use a ident access type in pg_hba.conf.

* Postgresql.conf: In this file we can change all configuration parameters that affect the functioning and performance of PostgreSQL in our machine. We now turn to explain the most important changes we can make some of these files. pg_hba.conf This file is used to define how, where and from which site a user can use our PostgreSQL cluster. All lines that start with the # character are interpreted as comments. The rest must have the following format: [Type of connection] [database] [user] [IP] [Netmask] [authentication type] [options] Depending on the type of connection and authentication, [IP], [ Netmask] and [options] are optional. Let's explain a little how to define access rules. The type of connection may have the following values, local, host, and hostnossl hostssl. The type of method can have the following values, trust, reject, md5, crypt, password, krb5, ident, ldap pam or a series of examples to help us better understand how we can set different access to PostgreSQL cluster. Example 1 .- Access for tcp / ip (network) to the test001 database as a user test from the computer with IP 10.0.0.100 and md5 authentication method: test test001 host 10.0.0.100 255 255 255 255 md5 The same input could be written also with the network mask CIDR notation: test test001 host 10.0.0.100/32 md5 Example 2 .- Access for tcp / ip (network) to the test001 database as a user test from all computers network 10.0.0.0 with netmask 255.255.255.0 (254 computers in total) and md5 authentication method:

10.0.0.0 host 255.255.255.0 md5 test001 test The same input could also write with the notation netmask in CIDR: 10.0.0.0/24 md5 host test001 test Example 3 .- Access by TCP / IP (network), encryption, to all of our database clusters as test user from the computer with IP 10.0.0.100, 10.1.1.100 and computer and md5 authentication method (we need two entries in our file pg_hba.conf: 10.0.0.100 hostssl all test md5 255 255 255 255 255 255 255 255 10.1.1.100 hostssl all test md5 Example 4 .- Deny access to all databases on our test cluster to the user, from all computers on the network 10.0.0.0/24 and give-access to rest of the world with the md5 method: host test all reject host all all 10.0.0.0/24 0.0.0.0 / 0 md5 I could go on playing with all the possibilities offered by this configuration file. Of course, databases and users used in this file must exist in our cluster to make things work and some of the parameters can only be used if we compiled with the appropriate options in the installation process (eg hostssl, pam, krb5) in production order changes in this file have to tell PostgreSQL to re-read it.

Just a simple 'reload' (/ usr / local / bin / pg_ctl-D / var / pgsql / data reload) from the command line or pg_reload_conf function () postgres user from psql, the PostgreSQL client. [Postgres @ server] # / usr / local / bin / psql Welcome to psql 8.2.4, the PostgreSQL interactive terminal. Type: \ copyright for distribution terms \ h for help with SQL commands \? for help with psql commands \ g or terminate with semicolon to execute query \ q to quit postgres = # SELECT pg_reload_conf (); pg_reload_conf ---------------- t (1 row) postgres = # For a detailed documentation on the file pg_hba.con, go through the section Chapter 20. Client Authentication of official documentation of PostgreSQL. postgresql.conf changes we make to this file affect all the databases that we identified in our cluster PostgreSQL. Most changes can be put into production with a simple 'reload' (/ usr / local / bin / pg_ctl-D / var / pgsql / data reload), other changes need to reboot our cluster (/ usr / local / bin / pg_ctl-D / var / pgsql / data restart). More information on all parameters that can change in this file, it affects and how you can put in production can be found in section 17. Server Configuration from the official PostgreSQL documentation. Next we see the most important parameters that should change if we begin to use PostgreSQL for serious use and if we get the most out of our machine.

There are many more parameters that can and eventually should be fitted, here we focus on the most important and which should change before you start using PostgreSQL in a serious way. max_connections: maximum number of clients connected simultaneously to our databases. We should increase this value in proportion to the number of concurrent clients on our PostgreSQL cluster. A good starting value is 100: max_connections = 100 shared_buffers: This parameter is important and defines the size of buffer memory used by PostgreSQL. Not much we increase this value best answer. In a dedicated server we can start with 25% of total memory. Never more than one third (33%) of the total. For example, a server memory 4Gbytes, we can use 1024MB as the initial value. work_mem shared_buffers = 1024MB: Used in operations containing ORDER BY, DISTINCT, joins, .... In a dedicated server can use 2-4% of our total memory if you have only a few sessions (clients) large. As initial value we can use 8 Mbytes. maintenance_work_mem work_mem = 8MB: Used in operations such as VACUUM, ANALYZE, CREATE INDEX, ALTER TABLE, ADD FOREIGN KEY.

Its value will depend largely on the size of our databases. For example, a server memory 4Gbytes, we can use 256MB as initial value. effective_cache_size maintenance_work_mem = 256MB: Parameter used by the 'query planner' of our database engine to optimize the reading of data. In a dedicated server we can start with 50% of total memory. At most a two thirds (66%) of the total. For example, a server memory 4Gbytes, we can use 2048MB as the initial value. checkpoint_segments effective_cache_size = 2048MB: This parameter is very important in databases with many write operations (insert, update, delete). To begin with we can start with a value of 64. In large databases with many gigabytes of data written can increase this value to 128-256. checkpoint_segments = 64 is very important to note that increasing the default values ​​of many of these parameters, we must increase the default values ​​of some parameters of the kernel of our system. Detailed information on how to do this is found in Section 16.4. Managing Kernel Resources official documentation of PostgreSQL. In short, this is just a taste of what we can do. With practice and experience we can and we have to adjust many parameters.

But this will be the subject of a next article.

No comments:

Post a Comment