Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 7 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

La mejor oferta de Cusos Open Source

Después de la gran acogida de nuestros Cursos Open Source, eminentemente prácticos, lanzamos las convocatorias de 2017

6 may. 2017

Nuevas funcionalidades en PostgreSQL 10


Muy intersantes, las nuevas funcionalidades que se anuncian para las versión PostgreSQL 10:

Headline Features

Declarative Partitioning.  In previous versions of PostgreSQL, PostgreSQL supported only table inheritance, which could be used to simulate table partitioning, but it was complicated to set up and the performance characteristics were not that great.  In PostgreSQL 10, it's possible to do list or range partitioning using dedicated syntax, and INSERT performance has been greatly improved.  There is still a lot more work to do in future releases to improve performance and add missing features, but even what we have in v10 is already a major step forward (IMHO, anyway).

Logical Replication.  PostgreSQL has had physical replication -- often called streaming replication -- since version 9.0, but this requires replicating the entire database, cannot tolerate writes in any form on the standby server, and is useless for replicating across versions or database systems.  PostgreSQL has had logical decoding -- basically change capture -- since version 9.4, which has been embraced with enthusiasm, but it could not be used for replication without an add-on of some sort.  PostgreSQL 10 adds logical replication which is very easy to configure and which works at table granularity, clearly a huge step forward.  It will copy the initial data for you and then keep it up to date after that.

Improved Parallel Query.  While PostgreSQL 9.6 offers parallel query, this feature has been significantly improved in PostgreSQL 10, with new features like Parallel Bitmap Heap Scan, Parallel Index Scan, and others.  Speedups of 2-4x are common with parallel query, and these enhancements should allow those speedups to happen for a wider variety of queries.

SCRAM Authentication.  PostgreSQL offers a remarkable variety of different authentication methods, including methods such as Kerberos, SSPI, and SSL certificate authentication, which are intended to be highly secure.  However, sometimes users just want to use passwords managed by the PostgreSQL server itself.  In existing releases, this can be done either using the password authentication type, which just sends the user-supplied password over the wire, or via the md5 authentication type, which sends a hashed and salted version of the password over the wire.  In the latter approach, stealing the hashed password from the database or sniffing it on the wire is equivalent to stealing the password itself, even if you can't compute a preimage.  PostgreSQL 10 introduces scram authentication, specifically SCRAM-SHA-256, which is much more secure.  Neither the information which the server stores on disk nor the contents of an authentication exchange suffice for the server to impersonate the client.  Of course, the substitution of SHA-256 for MD5 is also a substantial improvement.  See also Michael Paquier's blog on this topic. One point to note is that, unless you are using libpq, you will not be able to use this feature unless your particular client driver has been updated with SCRAM support, so it may be a while before this feature is universally available.

Executor Speedups.  Substantial parts of PostgreSQL's executor have been rewritten to make expression and targetlist projection faster; just-in-time compilation will be added in a future release.  Hash aggregation has been rewritten to use a more efficient hash table and store narrower tuples in it, and work has also been done to speed up queries that compute multiple aggregates and joins where one side can be proven unique.  Grouping sets now support hash aggregation.  While all PostgreSQL releases typically contain at least some performance improvements, the rewrite of expression and targetlist projection is a particularly large and significant improvement which will benefit many users.

Durable Hash Indexes.  Hash indexes in PostgreSQL have suffered from years of long neglect; the situation will be noticeably improved in v10.  The most notable change is that changes to a hash index now write WAL, which means that they are crash-safe and that they are properly replicated to standbys.  However, a good deal of other work has been done, including the necessary prerequisite step of revamping the bucket split algorithm to improve performance and concurrency, caching the metapage for better performance, adding page-at-a-time vacuuming, and expanding them more gradually.  Amit Kapila even writes about a case where they outperformed btree indexes.  While there's certainly more work to be done here, I'm excited about these improvements.

ICU Collation Support.  In current releases, PostgreSQL relies exclusively on the collations supplied by the operating system, but this sometimes causes problems: collation behavior often differs between operating systems, especially between Linux and Windows, and it isn't always easy to find a collation for one operating system whose behavior matches that of some collation available on another system.  Furthermore, at least on Red Hat, glibc regularly whacks around the behavior of OS-native collations in minor releases, which effectively corrupts PostgreSQL's indexes, since the index order might no longer match the (revised) collation order.  To me, changing the behavior of a widely-used system call in a maintenance release seems about as friendly as locking a family of angry racoons in someone's car, but the glibc maintainers evidently don't agree.  (In fact, there's one discussion where it's suggested that you not use some of those interfaces at all.)  libicu, on the other hand, says they care about this.

But Wait, There's More!

In my estimation, the features listed above are the most exciting things that users can expect in PostgreSQL 10, which is expected to be released in September.  However, there are quite a few other significant features as well which could easily have qualified as headline features in a release less jam-packed than this one.  Here are some of them:

Extended Statistics (ndistinct, functional dependencies).  If the query planner makes a bad row count estimate resulting in a terrible plan, how do you fix it?  With extended statistics, you can tell the system to gather additional statistics according to parameters that you specify, which may help it get the plan right.

FDW Aggregate Pushdown.  In previous releases, SELECT COUNT(*) FROM foreign_table operated by fetching every row form the foreign table and counting them locally.  That was terrible, so now it doesn't.

Transition Tables.  It is now possible to write a PL/pgsql AFTER STATEMENT trigger which can access all rows modified by the statement.  This can be both faster and more convenient than writing an AFTER ROW trigger that is called once per row.

Improved Wait Events.  PostgreSQL 9.6 introduced wait event monitoring in pg_stat_activity, but only for a limited range of events.  In PostgreSQL 10, you'll be able to see latch waits and I/O waits, even for auxiliary processes and unconnected background workers.

New Integrity Checking Tools.  You can now validate the integrity of your btree indexes using the new amcheck module.  If you're a developer adding write-ahead logging to a new storage form, or a user who thinks the developers may have introduced a bug, you'll be pleased to be able to test with wal_consistency_checking. pg_dump now has better test coverage.

Smarter Connection Handling.  Connections through libpq can now specify multiple hosts, and you can even tell it to find you the server that is currently accepting write connections.

Quorum-Based Synchronous Replication.  You can now specify that a commit must be acknowledged by any K of N standby synchronous servers, improving flexibility and performance.

Other Cool Stuff

Many other things have also been significantly improved in this release.  XMLTABLE makes querying XML data faster and easier.  You can now interrogate the commit status of a transaction directly, and we've got better tracking of replication lag.  psql now supports \if ... \elseif ... \else ... \endif to make scripting easier, and there are new functions and new roles to allow monitoring tools to run without superuser privileges.  Encoding conversions are now faster, and so is sorting. You can compress the transaction log while streaming it.  And there's more, but this blog post is too long already.  If you're interested in reading even more about new features that will be coming with PostgreSQL 10, depesz blogs frequently on this topic, and so does Michael Paquier.  Both have additional details on some of the features mentioned here, as well as others that may be of interest.

This final note: we have had chronic problems with users erroneously believing that the pg_xlog or pg_clog directory is non-critical data, possibly because the directory names include the word "log".  Those directories have been renamed to pg_wal and pg_xact, which we hope will be clearer.  All SQL functions and utility names that formerly included the string "xlog", meaning the transaction log or write-ahead log, have been renamed to use "wal" instead.  Conversely, the default log directory is now called log rather than pg_log so that it is looks less like an internal name.  These changes will probably cause a bit of upgrade pain for some users, but we hope that they will also help users to avoid catastrophic mistakes.


Visto en Robert Haas Blog

0 comentarios: