By: Maria Zakourdaev | Updated: 2018-10-29 | Comments (11) | Related: > PostgreSQL
Problem
SQL Server DBAs are absolutely sure that SQL Server is the best relational database. Ever. I have never had any doubt myself.
What should you do if your company wants to cut costs and put some data assets into an open-source RDBMS? From a Data Engineer/ Data Architect point of view, which one should you suggest to use? Opensource=free. Which is unfortunately, a very attractive option for the decision makers who are responsible for the company budget.
Moreover, I have always wondered how MySQL and PostgreSQL are different? Which features does each offer and what features don't they offer?
Solution
In order to understand, how to choose between these database systems, I have spent some time and read a few manuals. You can see below the mapping that I have made for myself in order to understand the differences. I have added SQL Server to the comparison as well.
I would be happy to see comments posted for this tip and add more things to the mappings below. Furthermore, do not hesitate to post a comment if the information below is not correct or not relevant anymore in the new product versions.
In general, both MySQL and PostgreSQL databases fully support ACID and can handle large amounts of data and high levels of query concurrency.
- PostgreSQL is more feature-rich and extensible and maybe a better choice for extreme cases.
- MySQL is much more popular, suits web applications and e-commerce projects, there are much more blogposts/support/documentation then for PostgreSQL.
- For environments with high number of connections - PostgreSQL might need a lot of memory, because each connection has its own memory. However, there are solutions to overcome this issue, like PgBouncers, external connection pools.
- Manual partition management in PostgreSQL requires too much overhead and updates that move rows from one partition to another will fail.
- There is 1 CPU per query limitation in MySQL and only the nested-loop join algorithm which make MySQL a less optimal choice for data warehouse systems.
- If check constraint functionality is important - there are none in MySQL.
Comparison of MySQL vs. PostgreSQL vs. SQL Server
In regards to the MySQL features, I will be mainly be referring to the InnoDB storage engine.
General information for MySQL, PostgreSQL and SQL Server
MySQL | PostgreSQL | SQL Server | |
---|---|---|---|
Maturity | Initial release was in 1995 | Initial release was in 1989 | MSMS SQL Server for OS/2 was released in 1989 (together with Sybase)
SQL Server 6.0 was released in 1995 marking the end of collaboration with Sybase. |
Language | Written in C, has a few C++ modules | Written in C | Mostly C++ with a few exceptions |
Cost | Open source / Owned by Oracle and has several paid editions | Completely free / Open source | SQL Server Express is a free edition, but it is limited to using 1 processor, 1 GB memory and 10 GB database files. |
Data changes for MySQL, PostgreSQL and SQL Server
MySQL | PostgreSQL | SQL Server | |
---|---|---|---|
Row Updates | Updates happen in place, changed data is copied to the rollback segment. This makes vacuuming and index compaction very efficient. MySQL is slower for reads, but writes are atomic and if columns in a secondary index change, this does not require changes to all indexes. | Updates are being implemented as inserts + mark as delete for vacuum. All indexes have a link to the physical id of the row. This has an update amplifying effect because when the column gets updated, new row with new physical id gets created and all indexes require updates, even those which are not referring to the changed column to get a pointer to the new row physical id. | Row-Store database engine: In-Memory database engine: updates implemented as insert + mark for delete. Garbage collector is not non-blocking and parallel Columnstore database engine: in-place updates |
Vacuum / Defragmentation | Vacuuming and index compaction are very efficient. | Vacuum performs full tables scans to find the deleted rows and quite heavy process/might impact users’ workload. | In-memory garbage collector might add max ~15% overhead, usually much less. |
Querying the data for MySQL, PostgreSQL and SQL Server
MySQL | PostgreSQL | SQL Server | |
---|---|---|---|
The buffer pool / cache that serves queries | MySQL cache that serves user queries is called a buffer pool. This cache can be set to the size as large as needs, leaving only enough memory for other processes on the server. You can split the buffer pool into multiple parts to minimize contention for memory structures and you can pin tables to the buffer pool. Table scan or mysqldump evicts older data. | PostgreSQL maintains shared memory for data pages and, due to the fact that it is a process-based system, each connection has a native OS process of its own and has its own memory. Process is releasing the memory after the execution has finished. Therefore, has problems scaling past hundreds of active connections. | SQL Server memory is called buffer pool and its size can be set as large as needed, no option to set multiple buffer pools. |
Constraints support | Supports primary keys, foreign keys, not-null constraints, unique constraints, default constraints, does not support CHECK constraints | Supports primary keys, foreign keys, not-null constraints, check constraints, unique constraints, default constraints, exclusion constraints | Supports primary keys, foreign keys, not-null constraints, check constraints, unique constraints, default constraints |
Temporary tables | Supports CTE, No support for global temp tables (available outside the
session scope) and no table variables. Interesting fact: You cannot refer to a TEMPORARY table more than once in the same query. For example, the following does not work: SELECT * FROM temp_table JOIN temp_table AS t2; |
Supports CTE, Global and local temporary tables and table variables
(using table name as a type name). Interesting fact: if you create two tables with the same name, one is temporary and another one is regular table CREATE TEMP TABLE X (…) and CREATE TABLE X (…), "select * from x" will always bring data from temporary table. |
Supports CTE, Global and local temporary tables and table variables. |
Window / Analytical functions | Supports: CUME_DIST, FIRST_VALUE, LAG, LAST_VALUE, LEAD, PERCENT_RANK, ROW_NUMBER, RANK, DENSE_RANK, NTILE, NTH_VALUE No PERCENTILE_CONT, PERCENTILE_DISC functions. |
Supports functions: CUME_DIST, FIRST_VALUE, LAG, LAST_VALUE, LEAD, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, ROW_NUMBER, RANK, DENSE_RANK, NTILE, NTH_VALUE |
Supports functions: CUME_DIST, FIRST_VALUE, LAG, LAST_VALUE, LEAD, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, ROW_NUMBER, RANK, DENSE_RANK, NTILE. Yet no NTH_VALUE function |
Parallel query execution | MySQL will usually use 1 CPU per query. | Query plans can leverage multiple CPUs | Query plans can leverage multiple CPUs |
Indexes | Supports index-organized tables - clustered indexes. Does not support persisted indexes / materialized views |
Supports index-organized table, but updates are manual until
ProstgreSQL 11 when it is automatic. Supports persisted indexes/materialized views. |
Supports index-organized tables - clustered indexes that automatically maintains rows order. |
Multiple indexes usage in single query | Multiple indexes might be used for the single query. | Multiple indexes might be used for the single query. If we have separate indexes on x and y, one possible implementation of a query like WHERE x = 5 AND y = 6 is to use each index with the appropriate query clause and then AND together the index results to identify the result rows. | Multiple indexes might be used for a single query (index intersection feature). |
Multicolumn indexes | Multi-column indexes can have up to 16 columns | Multi-column indexes can have up to 32 columns | Multi-column indexes can have up to 16 columns |
Partial indexes (an index built over a subset of a table using filter) | Does not support partial indexes | Supports partial indexes | Supports partial indexes |
Join algorithms | MySQL executes joins between tables using only a nested-loop algorithm or variations of it. | Supports nested-loop joins, Hash joins and merge joins algorithms. | Supports nested-loop joins, hash joins and merge joins algorithms. |
Query execution plan reuse | Maintains caches for prepared statements and stored programs on a per-session basis. Statements cached for one session are not accessible to other sessions. | Caches query plans only as long as the prepared statement is open. The query plan is disposed when the prepared statement is closed. | Has shared execution plan cache to enable queries to reuse execution plans |
Statistics | Maintains persistent and non-persistent statistics (cleared on server restart) | Maintains statistics used by the planner, they are being updated by ANALYZE or VACUUM or CREATE INDEX | Maintains persistent statistics |
Memory-optimized tables | MySQL has got an ability to store tables in memory. The tables that are created in memory do not support transactions, their data is vulnerable to crashes. Those tables should be used as a temporary area or as a read-only caches. | Does not offer any in-memory engine. | In-memory OLTP is integrated into SQL Server’s database engine |
Columnstore or row- store | MariaDB have recently launched the column store engine for MySQL which was designed as a massively parallel database in an environment with multiple servers. It can be used instead of InnoDB storage engine. | Row-store. Does not offer any columnar storage engine. | SQL Server offers column store indexes to query large tables |
JSON and Data Type Support for MySQL, PostgreSQL and SQL Server
MySQL | Postgresql | SQL Server | |
---|---|---|---|
JSON data type | MySQL has JSON data type support and also supports in place partial updates over the JSON instead of replacing the whole document however there are many limitations. It does not support indexing for JSON but there are workarounds. | PostgreSQL supports JSON data type and supports partial updates | SQL Server supports JSON data type and supports partial updates |
Additional Advanced data types | Supports Geospatial data type. No user-defined types. | Supports Geospatial and lots of advanced data types, such as multi-dimensional arrays, user-defined types, etc. | Supports Geospatial data type, Hierarchical data |
Sharding / Partitioning / Replication for MySQL, PostgreSQL and SQL Server
MySQL | PostgreSQL | SQL Server | |
---|---|---|---|
Partitioning support | Supports HASH partitioning (use HASH function on any column to split table into N partitions), RANGE or LIST partitioning that can be based on several columns and KEY partitioning which is similar to HASH but based on some auto generated number. | Supports RANGE and LIST partitioning but partitions and indexes on them must be manually created and old-style partitioning via table inheritance (when querying the parent table, all children tables are being queries as well, children tables have constraints on partitioning column. Interesting fact: Children tables can have more columns that parent table and indexes must be applied separately on children tables.) | Supports RANGE partitioning. |
Sharding support | No good sharding implementation (MySQL Cluster is rarely deployed due to many limitations) | There are dozens of forks of Postgres which implement sharding but none of them yet haven’t been added to the community release. | No standard sharding implementation. |
Replication | Master-slave replication based on statements or based on changed rows
Group replication with master server automatic election |
Master - slave replication based on changed rows and log shipping. | Database level: Availability Groups master-multiple slaves Log shipping On data level: Master-slave / Bi-directional master-slave/ and master-master (merge) replication |
Next Steps
- What are pros and cons of PostgreSQL and MySQL? With respect to reliability, speed, scalability, and features: https://www.quora.com/What-are-pros-and-cons-of-PostgreSQL-and-MySQL-With-respect-to-reliability-speed-scalability-and-features
- MySQL partitioning: https://www.w3resource.com/mysql/mysql-partition.php
- Creating linked servers from SQL Server to MySQL: https://www.mssqltips.com/sqlservertip/4577/create-a-linked-server-to-mysql-from-sql-server
- MySQL to SQL Server coding differences: https://www.mssqltips.com/sqlservertutorial/2204/mysql-to-sql-server-coding-differences
- Creating linked server from SQL Server to PostgreSQL: https://www.mssqltips.com/sqlservertip/3662/sql-server-and-postgresql-linked-server-configuration--part-2
- Creating two way data access between SQL Server and PostgreSQL: https://www.mssqltips.com/sqlservertip/3661/creating-twoway-data-access-between-sql-server-and-postgresql--part-1
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2018-10-29