Wednesday, March 8, 2017

How to build MongoRocks for MongoDB 3.4

This explains how to build MongoRocks for MongoDB 3.4 and is derived from my notes for building for MongoDB 3.2. My server uses Ubuntu 16.04.

# Install many of the dependencies for MongoRocks.
# I assume this is still valid.
sudo yum install snappy-devel zlib-devel bzip2-devel lz4-devel
sudo yum install scons gcc-g++ git

# Unpack MongoDB 3.4 source in $MONGOSRC

# Directory in which git repos are created
mkdir ~/git

# Get MongoRocks engine
cd ~/git
git clone https://github.com/mongodb-partners/mongo-rocks.git
cd mongo-rocks
git checkout --track origin/v3.4 -b v34

# figure out which version of gcc & g++ is installed
# for ubuntu 16.04 that is 5.4

g++ --version


# get and build RocksDB libraries
# disable the use of jemalloc features

git clone https://github.com/facebook/rocksdb.git
cd rocksdb
git checkout --track origin/5.2.fb -b 52fb
EXTRA_CFLAGS=-fPIC EXTRA_CXXFLAGS=-fPIC DISABLE_JEMALLOC=1 make static_lib

# prepare source build with support for RocksDB
cd $MONGOSRC
mkdir -p src/mongo/db/modules/
ln -sf ~/git/mongo-rocks src/mongo/db/modules/rocks

# Build mongod & mongo binaries
# You can edit LIBS="..." depending on the compression libs
# installed on your build server and enabled for RocksDB.
# To debug and see command lines add --debug=presub
# To use glibc rather than tcmalloc add --allocator=system

scons CPPPATH=/home/mdcallag/git/rocksdb/include \
      LIBPATH=/home/mdcallag/git/rocksdb \

      LIBS="lz4 zstd bz2" mongod mongo

# install mongod, I used ~/b/m342 you can use something else
mkdir -p ~/b/m342
cd ~/b/m342
mkdir data
mkdir bin
cp $MONGOSRC/build/opt/mongo/mongod bin
cp $MONGOSRC/build/opt/mongo/mongo bin

# create mongo.conf file with the text that follows. You must
# change $HOME and consider changing 
the value for cacheSizeGB
---
processManagement:
  fork: true
systemLog:
  destination: file
  path: $HOME/b/m342/log
  logAppend: true
storage:
  syncPeriodSecs: 600
  dbPath: $HOME/b/m342/data
  journal:
    enabled: true
operationProfiling.slowOpThresholdMs: 2000
replication.oplogSizeMB: 4000
storage.rocksdb.cacheSizeGB: 1
---

# start mongod, consider using numactl --interleave=all
bin/mongod --config mongo.conf --master --storageEngine rocksdb

# confirm RocksDB is there
ls data/db
> 000007.sst  CURRENT  IDENTITY  journal  LOCK  LOG  MANIFEST-000008  OPTIONS-000005

$ head -4 data/db/LOG
2017-03-08T09:38:33.747-0800 I CONTROL  [initandlisten] MongoDB starting : pid=19869 port=27017 dbpath=/home/mdcallag/b/m342/data master=1 64-bit host=nuc2
2017-03-08T09:38:33.747-0800 I CONTROL  [initandlisten] db version v3.4.2
2017-03-08T09:38:33.747-0800 I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2017-03-08T09:38:33.747-0800 I CONTROL  [initandlisten] allocator: tcmalloc
2017-03-08T09:38:33.747-0800 I CONTROL  [initandlisten] modules: rocks

Friday, March 3, 2017

Part 5: sysbench, a larger server, and IO-bound database and really fast storage

I used a database that doesn't fit in the DBMS cache for part 5 in my unending series on sysbench and MyRocks. In this case I used really fast storage - the database fits in the OS page cache, reads are served from the OS page cache and writes are done to the fast SSD. A previous post has more details on the hardware.

tl;dr
  • MyRocks is much faster than compressed InnoDB except for InnoDB-5.7 with long range scans
  • MyRocks had better or similar throughput than uncompressed InnoDB-5.6 for update-only, insert-only and short range scans. It was slower for long range scans and the point query test.
  • MyRocks was faster than uncompressed InnoDB-5.6 for the update-only tests and slower for insert-only and tests with range/point queries.
  • MyRocks was faster than TokuDB for all tests except the longest range queries

Disclaimer

The goal for MyRocks is the best space efficiency, great write efficiency and good enough read efficiency. The best space efficiency means it needs less SSD. Great write efficiency means that SSD lasts longer, and I just replaced 3 SSDs on my home servers so I appreciate the value of that. Good enough read efficiency is harder to define and my goal over time has become better than good enough. I want MyRocks to match uncompressed InnoDB in response time. We aren't there yet but we are making progress.

For this workload I only share data on read efficiency. MyRocks loses one of its advantages given that the database is in the OS page cache -- when you spend less on writes you can spend more on reads. But when the database fits in the OS page cache there will be no reads from the storage device.

Tests

The tests are fully described in a previous post. In this case I used 8 tables with 10M rows per table and a 2gb DBMS cache. Tests were repeated with no compression, fast compression and slow compression. Fast compression was lz4 for MyRocks and snappy for TokuDB. Slow compression was zlib for all engines. Tests were run for 1 to 128 connections (threads) on a server with 24 CPU cores and 48 HW threads. All engines used jemalloc. The binlog was enabled but fsync for the binlog and storage engine was disabled.

Tests were run for several storage engines:
  • myrocks-5635 - MyRocks from FB MySQL merged to upstream MySQL 5.6.35. The full my.cnf was listed in a previous in a previous post and the paste is here. Then I edited that to reduce the RocksDB block cache and enable or disable compression using this.
  • innodb-5626 - InnoDB from upstream MySQL 5.6.26. In a few weeks or months I will upgrade to a more recent 5.6 build.
  • innodb-5717 - InnoDB from upstream MySQL 5.7.17
  • tokudb-5717 - TokuDB from Percona Server 5.7.17-11. The my.cnf is here.

Update-only with secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
2804    9616    18823   33177   38211   41248   44460   44723   44151   43671   43289   43150   42621   myrocks-5635
2355    4903     8127   12703   15987   28899   29468   30260   30785   30991   30710   29922   28946   innodb-5626
3413    7834    14519   23386   32131   36263   36582   37110   37751   39091   40058   40945   42019   innodb-5717
2099    3958     7085   13135   20494   26263   28408   28723   28472   27786   26615   25577   23243   tokudb-5717
- lz4/snappy
2801    9638    18841   33199   38234   41270   44190   44347   44047   43404   42889   42601   42199   myrocks-5635
2070    3921     7004   12988   20146   25857   28196   28545   28322   27784   26663   25550   23293   tokudb-5717
- zlib
2683    8937    17422   31287   37139   39406   41880   42230   42337   41907   41877   41482   41040   myrocks-5635
 328    1004     1971    3346    4782    5343    5666    5863    6016    6154    6239    6260    6215   innodb-5626
 455    1164     2302    4040    5994    6513    6944    7184    7319    7516    7534    7596    7528   innodb-5717
1988    3741     6665   12441   19575   24062   26028   26211   26129   25727   24554   23746   21799   tokudb-5717

Summary:
  • Uncompressed: MyRocks >> InnoDB-5.7 >> InnoDB-5.6 >> TokuDB
  • Compressed: MyRocks >> TokuDB >> InnoDB
  • Compressed InnoDB might suffer from the per-index mutex
  • MyRocks benefits because secondary index maintenance is read-free (write-only)

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
2772    9935    19432   35272   43243   46621   49285   51651   51780   51154   50176   49766   48711   myrocks-5635
3888    7639    12655   17174   28267   31354   34335   36421   34686   38179   39268   40883   41326   innodb-5626
2917    6166    18918   35674   55660   69910   74162   74654   74734   76431   78150   79040   80607   innodb-5717
2071    4089     7508   14548   24884   31050   34702   37987   40152   41042   39219   37496   33944   tokudb-5717
- lz4/snappy
2779    9948    19471   35516   42875   46143   48964   51227   51290   50592   49984   49372   48504   myrocks-5635
2062    4045     7432   14484   24403   30637   34182   37107   39212   40452   38857   37256   33901   tokudb-5717
- zlib
2660    9235    18026   33259   41518   44511   46920   48697   48865   48870   48668   47668   47119   myrocks-5635
 600    1125     2034    3754    5356    5946    6184    6278    6355    6417    6446    6433    6384   innodb-5626
 688    1292     2652    4776    6950    7825    8186    8329    8435    8551    8535    8518    8408   innodb-5717
1964    3839     7112   13739   23446   28967   32118   34603   36506   37325   36384   35010   32236   tokudb-5717

Summary:
  • Uncompressed <= 4 threads: MyRocks >> InnoDB-5.7 >> InnoDB-5.6 >> TokuDB
  • Uncompressed >= 8 threads: InnoDB-5.7 >> MyRocks >> InnoDB-5.6, TokuDB
  • Compressed: MyRocks >> TokuDB >> InnoDB
  • Compressed InnoDB might suffer from the per-index mutex
  • MyRocks doesn't benefit from read-free secondary index maintenance because there is no index maintenance.

Read-write with --oltp-range-size=100

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
4615    9749    19889   44836   95983  128906  150236  161720  164109  164003  165044  164140  163066   myrocks-5635
4927   12937    28513   54213   85564   82470   85415  109080  124297  132075  132059  132434  130368   innodb-5626
7686   15402    31015   61110  106977  144695  167321  185388  197180  205404  207852  208216  207327   innodb-5717
4403    7548    15377   36443   66814   82179   92476   96790   97598   97450   96157   93420   91241   tokudb-5717
- lz4/snappy
4581    9728    19962   44480   95368  128023  149742  160837  164039  162767  162522  164242  161043   myrocks-5635
4087    7413    14971   35184   64774   79351   87971   92652   94746   94083   93201   91028   89282   tokudb-5717
- zlib
4300    8899    18466   40781   84814  113870  129297  143581  144207  140118  145012  142208  141812   myrocks-5635
2008    3917     7290   13214   20586   24610   26661   27897   28638   29370   30041   30478   30752   innodb-5626
2175    4234     8150   15318   25087   30660   33600   35528   36792   38187   38967   39302   39509   innodb-5717
3580    6824    14221   30690   55008   68552   76448   81532   82780   83096   81667   80006   79417   tokudb-5717

Summary:
  • Uncompressed <= 8 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Uncompressed >= 16 threads: InnoDB-5.7 >> MyRocks >> InnoDB-5.6 >> TokuDB
  • Compressed: MyRocks >> TokuDB >> InnoDB
  • Compressed InnoDB might suffer from the per-index mutex

Read-write with --oltp-range-size=10000

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
 250     503      991    1932    3771    5083    5937    6400    6354    6605    6522    6276    6556   myrocks-5635
 267     439     1070    2258    4403    5974    6889    7556    7680    7509    7212    6965    6466   innodb-5626
 471     948     1866    3675    6951    8974    9959   10497   10673   10590   10569   10470   10493   innodb-5717
- lz4/snappy
 237     473      946    1861    3597    5089    6025    6267    6465    6802    6450    6472    6643   myrocks-5635
 329     646     1292    2378    4546    6003    6504    6775    6906    6807    6751    6671    6620   tokudb-5717
- zlib
 224     457      907    1775    3459    4709    5243    5928    6008    5822    6052    5823    5639   myrocks-5635
 139     326      668    1376    2723    3851    4471    4845    5039    4997    4821    4822    4822   innodb-5626
 176     431      904    1820    3835    5204    5830    6237    6487    6515    6469    6412    6357   innodb-5717
 276     570     1136    2149    4065    5396    5861    6146    6272    6205    6157    6112    6042   tokudb-5717

Summary:
  • Uncompressed: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks
  • Compressed <= 32 threads: TokuDB >> InnoDB-5.7 >> MyRocks >> InnoDB-5.6
  • Compressed >= 40 threads: InnoDB-5.7 >> TokuDB >> MyRocks >> InnoDB-5.6
  • MyRocks does better here relative to others than on the same test with a cached database.
  • Something was done in MySQL 5.7 to make InnoDB more efficient for long range scans.

Read-only with --oltp-range-size=10

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
5583   12053    25340   60323  125323  173556  201507  228848  240600  260551  268105  270922  270802   myrocks-5635
5802   11302    23312   45754   86042  119485  132435  182210  278491  282393  285004  286285  286280   innodb-5626
5998   12663    27213   67057  139873  191166  229855  267232  302766  304396  304294  308140  310807   innodb-5717
4619   10658    22704   52257  108814  152185  180060  207094  229817  232031  234555  238575  233373   tokudb-5717
- lz4/snappy
5533   11988    25118   59446  123771  170236  200173  226406  241030  257188  266852  268078  265281   myrocks-5635
5019   10563    22409   51342  107286  150102  177104  204018  227579  229317  231358  232494  228827   tokudb-5717
- zlib
4989   10682    22073   50882  106786  150247  177378  202741  221048  230274  229999  230863  231839   myrocks-5635
3104    8163    18296   40498   84292  117789  136762  155641  172568  173673  174538  174994  175127   innodb-5626
4094    8760    17970   40191   85050  120897  141525  161327  179816  180670  180304  180439  182237   innodb-5717
4490    9828    20436   45973   97078  136122  160800  185617  207077  208896  211602  212022  213916   tokudb-5717

Summary:
  • Uncompressed <= 8 threads: InnoDB-5.7 >> MyRocks >> TokuDB, InnoDB-5.6
  • Uncompressed 16 to 40 threads: InnoDB-5.7 >> MyRocks >> TokuDB >> InnoDB-5.6
  • Uncompressed >= 48 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Compressed: MyRocks >> TokuDB >> InnoDB

Read-only with --oltp-range-size=100

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
4557    9527    19771   44629   95037  133472  155356  175533  189690  196465  196471  197426  196431   myrocks-5635
5076   10543    21840   49775  101828  141198  166103  190612  213432  211546  215583  216149  216561   innodb-5626
5384   11440    23741   56913  118847  165338  193619  220664  246599  250321  251309  249837  252654   innodb-5717
4525    9456    19643   43853   91855  128489  148978  168766  185897  185482  189092  190814  187785   tokudb-5717
- lz4/snappy
4486    9364    19322   43591   93011  131252  151314  173058  189554  193018  193373  192635  191932   myrocks-5635
4445    9270    19275   42894   90120  126824  146602  164909  180813  182457  183962  184567  182690   tokudb-5717
- zlib
3959    8205    16871   36958   79526  113109  130868  148129  164466  165194  165251  164002  164486   myrocks-5635
3438    6997    14207   30336   63505   90598  103758  116782  128542  128625  129293  130163  130199   innodb-5626
3553    7291    14880   32345   68779   99275  113500  127408  139032  140490  141352  141673  140846   innodb-5717
4073    8413    17317   38102   80324  113206  132046  149543  163231  165632  166191  169175  169615   tokudb-5717

Summary:
  • Uncompressed <= 8 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Uncompressed 16 to 40 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Uncompressed >= 48 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Compressed: MyRocks, TokuDB >> InnoDB

Read-only with --oltp-range-size=10000

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
 197     396      784    1553    3068    4211    4696    5069    5303    5260    5248    5203    5150   myrocks-5635
 227     456      908    1799    3525    4798    5538    6114    6343    6310    6287    6200    6182   innodb-5626
 381     756     1489    2926    5641    7272    7949    8407    8604    8531    8476    8442    8254   innodb-5717
 287     577     1151    2262    4419    5836    6335    6616    6695    6575    6542    6496    6427   tokudb-5717
- lz4/snappy
 189     378      749    1486    2941    4035    4555    4922    5160    5127    5096    5070    5005   myrocks-5635
 278     559     1112    2187    4276    5670    6149    6436    6558    6427    6405    6343    6278   tokudb-5717
- zlib
 159     318      628    1251    2480    3470    3886    4183    4405    4403    4340    4299    4283   myrocks-5635
 140     282      566    1117    2210    3092    3538    3853    4109    4102    4051    3903    4015   innodb-5626
 187     375      744    1488    2925    3992    4460    4716    4890    4914    4908    4881    4835   innodb-5717
 242     485      971    1920    3770    5070    5564    5836    5997    5889    5841    5791    5744   tokudb-5717

Summary:
  • Uncompressed: InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks
  • Compressed: TokuDB >> InnoDB-5.7 >> MyRocks >> InnoDB-5.6

Point-query

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
6412   13400    28931   72991  152003  206627  245257  282682  317475  320832  320487  319250  325351   myrocks-5635
6831   14592    31265   80304  165013  218684  259701  302272  341855  347161  348867  350069  346924   innodb-5626
6591   13831    29885   77583  159026  215990  258119  300289  336777  347714  351820  353239  353298   innodb-5717
5950   12385    26277   62763  130650  180310  212967  244552  271554  276659  276588  278345  281433   tokudb-5717
- lz4/snappy
6395   13451    28854   72695  151874  205623  245214  283065  317367  314263  319040  324845  323703   myrocks-5635
5828   12137    25693   60984  127016  175532  207269  237836  265395  269802  271339  273249  274790   tokudb-5717
- zlib
5859   12482    26097   62841  134703  183953  219125  253690  284868  288741  285110  291383  292424   myrocks-5635
4658    9672    19931   45142   97018  135486  158160  180528  200518  200171  203003  204169  204478   innodb-5626
4557    9471    19508   44014   94149  132992  155439  177183  197399  198790  200071  200243  200104   innodb-5717
5378   11181    23504   54303  114715  159725  189099  217428  242933  246185  248465  250252  255635   tokudb-5717

Summary:
  • Uncompressed <= 40 threads: InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB
  • Uncompressed >= 48 threads: InnoDB-5.6, InnoDB-5.7 >> MyRocks >> TokuDB
  • Compressed: MyRocks >> TokuDB >> InnoDB

Insert-only

The numbers in the table are the QPS for 1 to 128 threads.


   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- none
5206   16893    31320   41382   48020   52252   54483   54546   54497   54196   54106   53575   53047   myrocks-5635
5743   12052    32062   49692   56744   57288   56026   53698   51452   50519   48907   47300   44191   innodb-5626
5305   15574    29411   53523   80249   91492   93392   89487   86214   84496   84736   85002   84752   innodb-5717
2938    5797     8947   17728   29184   37570   38571   35497   33231   30409   27484   24989   21330   tokudb-5717
- lz4/snappy
5210   16827    31047   40876   47854   52096   54152   54273   54120   53872   53554   53346   52546   myrocks-5635
2925    5740     8963   17824   29350   37647   38781   35571   33359   30580   27645   25080   21443   tokudb-5717
- zlib
5226   16682    31500   40534   47009   51833   53281   53719   53613   53433   53463   52761   52412   myrocks-5635
3256    5874    12234   20365   28593   31562   32943   33475   33195   33641   32953   33919   34103   innodb-5626
3297    6442    12727   22377   34871   40052   42320   43433   42824   43592   46455   46183   45197   innodb-5717
2920    5801     8931   17724   28971   37195   38456   35110   32937   30220   27492   24924   21337   tokudb-5717

Summary:
  • Uncompressed <= 40 threads: InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • Uncompressed >= 48 threads: InnoDB-5.7 >> MyRocks >> InnoDB-5.6 >> TokuDB
  • Compressed <= 8 threads: MyRocks >> InnoDB >> TokuDB
  • Compressed 16 to 40 threads: MyRocks >> InnoDB-5.7 >> TokuDB >> InnoDB-5.6
  • Compressed >= 48 threads: MyRocks >> InnoDB-5.7 >> InnoDB-5.6 >> TokuDB

Wednesday, March 1, 2017

Part 4: sysbench, a large server and small database

This post has more results for the test described in my previous post. Here I share the impact from disabling the binlog and binlog crash safety.

I will repeat the disclaimer from my previous report. This test is done with a small and cached database that has 8M rows. InnoDB is a great choice for such a workload. I am not surprised that MyRocks and TokuDB are not great at this workload. The value for MyRocks and TokuDB is better compression and less write-amplification and that usually requires a larger database and the benefit grows when the database doesn't fit in RAM. But early evaluators might use sysbench so I want to understand how they perform in this setup.

tl;dr:
  • MyRocks gets up to ~1.5X more throughput with binlog crash safety disabled
  • MyRocks gets up to ~3X more throughput with the binlog disabled
  • InnoDB gets up to ~1.5X more throughput with the binlog disabled
Another way to explain this is that MyRocks loses more throughput than InnoDB with the binlog and binlog crash safety enabled. We have work in progress to fix that.

Disabling the binlog and binlog crash safety

I ran tests to understand the overhead from binlog crash safety and the binlog on MyRocks and InnoDB. I added skip_log_bin to the my.cnf files to disable the binlog. For MyRocks I also ran tests with the binlog enabled but binlog crash safety disabled by adding rocksdb_enable_2pc=0 to the my.cnf. When binlog crash safety is disabled then it is possible for the binlog and RocksDB to not recover to the same point in time.

The binaries tested are:
  • myrocks-5635 - MyRocks from FB MySQL based on MySQL 5.6.35
  • innodb-5626 - InnoDB from upstream MySQL 5.6.26
  • innodb-5710 - InnoDB from upstream MySQL 5.7.10
For the results below I use base to mean the configuration with the binlog and binlog crash safety enabled. But fsync is disabled for the binlog and storage engine. I use 2pc0 for the MyRocks configuration with the binlog enabled but binlog crash safety disabled. I use nobinlog for the configurations with the binlog disabled.

Update-only with secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- myrocks-5635
2895    10363   20118   34017   38720   42138   44736   44646   44169   43633   43104   42690   42442   base
2966    9527    18574   35367   47037   52671   57182   58950   59594   61062   61866   61695   62179   2pc0
3168    7282    25153   47635   72764   85850   95187  100680  104422  108939  112490  116855  121797   nobinlog
- innodb-5626
4029    8373    15522   24563   32448   37061   40251   42522   44219   44631   45123   45762   45808   base
4659    9443    17115   27844   39099   56685   68578   69653   68424   65519   63270   62603   61285   nobinlog
- innodb-5710
4254    8792    17020   30524   48443   60623   67052   68773   69232   70556   72134   72696   74222   base
4817   10133    19273   31253   58881   83443  100658  107117  108247  105198  108413  111960  114375   nobinlog

At medium and high concurrency:
  • MyRocks gets ~1.5X more throughput with binlog crash-safety disabled
  • MyRocks gets almost 3X more throughput with the binlog disabled
  • InnoDB gets ~1.5X more throughput with the binlog disabled

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- myrocks-5635
2901   10696    20983   37078   43756   47549   50627   52753   52318   51690   50853   50003   49074   base
2973    9835    19527   38746   53403   61243   67248   70194   72343   73541   75190   75752   76222   2pc0
3192    7003    25995   50212   79235   99068  112833  116602  121630  128182  132259  137753  142908   nobinlog
- innodb-5626
6384   12625    23847   40501   51505   53024   53587   53523   53316   52483   51632   51276   51117   base
7527   14588    28320   49654   72081   80207   79992   78593   75252   70561   67417   65425   63701   nobinlog
- innodb-5710
5618   11060    21361   38810   58197   69717   74286   75519   75545   76520   77335   78117   79152   base
3293    6992    25292   41675   69919   95015  112157  122645  117212  130212  121092  124406  132257   nobinlog

At medium and high concurrency:
  • MyRocks gets ~1.5X more throughput with binlog crash-safety disabled
  • MyRocks gets ~2.5X more throughput with the binlog disabled
  • InnoDB gets ~1.5X more throughput with the binlog disabled

Insert-only

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
- myrocks-5635
5254    16950   31561   41074   47720   52063   54066   53899   53900   53725   53343   53098   52278   base
5450    16021   30189   47647   63500   69840   75002   76964   78653   80942   82146   82982   83690   2pc0
6111    22188   43235   75194  106476  124108  134373  140854  143762  153497  155575  158598  164615   nobinlog
- innodb-5626
5753    11506   36083   57235   62591   62469   62577   61899   61131   60592   59080   56895   52913   base
6554    13293   28378   75689   89756   86628   83106   79377   76966   71510   66678   61638   52642   nobinlog
- innodb-5710
5291    15657   33358   55385   79656   90812   97735   99944  100714  101967  103374  104934  106194   base
6112    12531   26378   61395  113139  132083  133795  130147  125948  121881  119478  118145  114600   nobinlog

At medium and high concurrency:
  • MyRocks gets ~1.5X more throughput with binlog crash-safety disabled
  • MyRocks gets almost 3X more throughput with the binlog disabled
  • InnoDB gets ~1.25X more throughput with the binlog disabled at medium concurrency. The benefit is ~1.1X at high concurrency.

Tuesday, February 28, 2017

Part 3: sysbench, a large server and small database

This is part 3 of my performance report for sysbench and MyRocks. For this test I use a large server (24 cores, 48 threads, fast NVMe SSD). A small server was used for part 1 and part 2.

This test is done with a small and cached database - 8M rows. InnoDB is a great choice for such a workload. I am not surprised that MyRocks and TokuDB are not great at this workload. The value for MyRocks and TokuDB is better compression and less write-amplification and that usually requires a larger database and the benefit grows when the database doesn't fit in RAM. But early evaluators might use sysbench so I want to understand how they perform in this setup.

tl;dr
  • The common pattern was InnoDB was faster than MyRocks and MyRocks was faster than TokuDB. 
  • InnoDB in MySQL 5.7.10 is much faster than in MySQL 5.6.26 on tests that are write-heavy or have long range scans. But InnoDB in 5.7 isn't strictly faster than in 5.6 even at high concurrency.
  • MyRocks suffers on tests with longer range scans but did pretty good for point queries.
  • TokuDB was the slowest for all of the workloads except read-write with --oltp-range-size=10000 and read-only with --oltp-range-size in 100 and 10000.
  • Percona fixed a bug in TokuDB that I reported. Thank you.
It can be interesting to compare this result with the result from the smaller servers I use at home. The smaller server has a slower CPU (core i3) and slower SSD. Unfortunately I need to repeat tests on my work servers using the latest version of sysbench.

Details

I used version of sysbench that is a few months old with an older version of my scripts. For my home servers I upgraded to modern sysbench and I will soon do that at work. I will also upgrade from MySQL 5.7.10 to a more recent version and that should make a famous MySQL support guru happy.

The test server has 24 cores, 48 threads, 2 sockets and 256gb of RAM. The storage is from multiple NVMe SSDs. I tested 4 engines -- myrocks-5635 is MyRocks from FB MySQL merged to upstream MySQL 5.6.35, innodb-5626 is InnoDB from upstream MySQL 5.6.26, innodb-5710 is InnoDB from upstream MySQL 5.7.10 and tokudb-5717 is TokuDB from Percona Server 5.7.17-11. The my.cnf files are here for MyRocks, InnoDB-5.7 and InnoDB-5.6. In all cases the binlog is enabled and fsync is disabled for both the storage engine redo log and the binlog.

Tests are run for 1 to 128 concurrent clients. When describing the results I divide that into
low concurrency (<= 8 clients), medium concurrency (16 to 40 clients) and high concurrency (>= 48 clients).  The test was run at each concurrency level for 180 seconds for read-heavy tests and 300 seconds for write-heavy tests. The sysbench process is run on the same host as mysqld.

Tests are run in this order:
  • prepare - this isn't a test, it creates and loads the tables. There were 8 tables with 1M rows per table. The sysbench table has one secondary index.
  • update-only and secondary index maintenance is required
  • update-only and secondary index maintenance is not required
  • read-write with --oltp-range-size=100 and then optionally with --oltp-range-size=10000
  • read-only with --oltp-range-size in 10, 100, 1000 and 10000. I don't report results for --oltp-range-size=1000
  • point queries
  • insert-only
Results

For the performance summaries below I use "X >> Y >> Z" to mean X is faster than Y and Y is faster than Z. I did not include graphs for one or both of 1) that is a lot more work and 2) I want some readers to take the time and think about the numbers. Unfortunately the formatting is not great for a wide table.

Update-only with secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
2895    10363   20118   34017   38720   42138   44736   44646   44169   43633   43104   42690   42442   myrocks-5635
4029    8373    15522   24563   32448   37061   40251   42522   44219   44631   45123   45762   45808   innodb-5626
4254    8792    17020   30524   48443   60623   67052   68773   69232   70556   72134   72696   74222   innodb-5710
2145    4026     7344   13575   21222   27807   30039   29541   29106   27964   26493   25435   23210   tokudb-5717

Throughput:
  • 1 to 8 clients - MyRocks >> InnoDB-5.7 >> InnoDB-5.6 >> TokuDB. I don't know why MyRocks was lousy for 1 client but OK for 2+.
  • 16 to 40 clients - InnoDB-5.7 >> MyRocks >> InnoDB-5.6 >> TokuDB
  • 48+ clients - InnoDB-5.7 >> MyRocks, InnoDB-5.6 >> TokuDB

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
2901   10696    20983   37078   43756   47549   50627   52753   52318   51690   50853   50003   49074   myrocks-5635
6384   12625    23847   40501   51505   53024   53587   53523   53316   52483   51632   51276   51117   innodb-5626
5618   11060    21361   38810   58197   69717   74286   75519   75545   76520   77335   78117   79152   innodb-5710
2260    4470     8206   15322   26552   33024   37739   41461   44358   44415   41899   39626   35341   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB.  I don't know why MyRocks was lousy for 1 client but OK for 2+.
  • 16 to 40 clients - InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB.
  • 48+ clients - InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB

Read-write with --oltp-range-size=100

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
4798    9955    20426   46361   98537  132010  153726  165477  167280  166549  168513  166807  166722   myrocks-5635
6391   12591    27534   61157  112516  146970  172971  193107  196981  191683  190707  191579  190615   innodb-5626
6143   13051    27216   57447  108786  145127  169957  192548  204655  205408  205278  205629  206721   innodb-5710
5157    9009    17931   42903   76633   95322  108896  114310  114650  113619  112220  108055  103181   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.7, InnoDB-5.6 >> MyRocks >> TokuDB
  • 16 to 40 clients - InnoDB-5.7, InnoDB-5.6 >> MyRocks >> TokuDB
  • 48+ clients - InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB

Read-write with --oltp-range-size=10000

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
 210     412      774    2091    4077    5288    5568    6409    6885    6235    6675    6543    5889   myrocks-5635
 306     612     1216    2403    4713    6439    7398    8098    8416    8397    8393    8364    8247   innodb-5626
 448     888     1758    3462    6699    8770    9724   10302   10482   10448   10330   10382   10360   innodb-5710
 360     722     1409    2713    5218    6932    7529    7851    7973    7954    7856    7822    7675   toku5717.none.100g

Throughput:
  • 1 to 8 clients - InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks
  • 16 to 40 clients - InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks
  • 48+ clients - InnoDB-5.7 >> InnoDB-5.6 >> TokuDB >> MyRocks

Read-only with --oltp-range-size=10

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
5659   12266    25778   61366  128290  175337  204504  230534  246007  262875  269596  272178  274313   myrocks-5635
6491   13838    30028   72355  144523  194622  230314  265632  298481  300703  302924  303685  305162   innodb-5626
6052   12700    27143   63307  129096  177797  209601  243368  273429  272255  272563  275147  276691   innodb-5710
5391   11261    23935   55313  113932  164953  195290  223643  249579  247840  246926  247897  247045   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB
  • 16 to 40 clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB
  • 48+ clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB

Read-only with --oltp-range-size=100

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
4031    8429    17417   38579   82542  116461  134208  150897  165211  166491  164694  166504  166625   myrocks-5635
5375   11273    23474   54099  110142  151311  177268  202868  225283  228292  229257  229828  231068   innodb-5626
5383   11292    23629   54050  110015  152462  176974  201112  223618  223737  224161  226272  226484   innodb-5710
4804   10062    20795   47474   98569  138827  159757  180406  198294  198680  198103  198374  196718   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.6, InnoDB-5.7 >> TokuDB >> MyRocks
  • 16 to 40 clients - InnoDB-5.6, InnoDB-5.7 >> TokuDB >> MyRocks
  • 48+ clients - InnoDB-5.6, InnoDB-5.7 >> TokuDB >> MyRocks

Read-only with --oltp-range-size=10000

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
 161     325      634    1238    2438    3385    3807    4093    4300    4261    4239    4214    4135   myrocks-5635
 242     484      961    1900    3729    5099    5857    6439    6748    6663    6562    6589    6500   innodb-5626
 357     708     1407    2788    5354    6984    7689    8095    8246    8222    8145    8138    8071   innodb-5710
 301     606     1203    2365    4583    6057    6582    6885    6992    6908    6862    6802    6772   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks
  • 16 to 40 clients - InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks
  • 48+ clients - InnoDB-5.7 >> TokuDB >> InnoDB-5.6 >> MyRocks

Point-query

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
6784   14328    31277   80458  166882  222750  265885  306227  341095  354327  349912  345317  349403   myrocks-5635
7236   15236    33569   87991  177791  236463  278423  319764  362604  371093  374380  371092  375698   innodb-5626
6802   14177    30860   78950  163909  222998  265156  307198  345832  354835  362774  366825  367057   innodb-5710
6249   12951    27608   67805  142064  194320  230720  264171  294814  306298  308177  309542  310391   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.6 >> MyRocks >> InnoDB-5.7 >> TokuDB
  • 16 to 40 clients - InnoDB-5.6 >> MyRocks, InnoDB-5.7 >> TokuDB
  • 48+ clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB

Insert-only

The numbers in the table are the QPS for 1 to 128 threads.

   1       2        4       8      16      24      32      40      48      64      80      96     128   threads
5254    16950   31561   41074   47720   52063   54066   53899   53900   53725   53343   53098   52278   myrocks-5635
5753    11506   36083   57235   62591   62469   62577   61899   61131   60592   59080   56895   52913   innodb-5626
5291    15657   33358   55385   79656   90812   97735   99944  100714  101967  103374  104934  106194   innodb-5710
2975     5754    9033   17695   29039   37341   38410   35859   33792   30053   27362   25028   21275   tokudb-5717

Throughput:
  • 1 to 8 clients - InnoDB-5.6 >> InnoDB-5.7 >> MyRocks >> TokuDB
  • 16 to 40 clients - InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB
  • 48+ clients - InnoDB-5.7 >> InnoDB-5.6 >> MyRocks >> TokuDB

Friday, February 24, 2017

Part 2 - sysbench, MyRocks, InnoDB and a small server

This is part 2 of the performance report for sysbench, MyRocks, InnoDB and a small server. The first part covered an IO-bound workload where the database was larger than RAM. This part covers a cached workload where the database fits in the MyRocks block cache and InnoDB buffer pool.

tl;dr
  • MyRocks is faster than InnoDB in the write-heavy tests.
  • InnoDB in MySQL 5.6 is faster than MyRocks in the read-heavy tests. I can't reproduce this on the newer/faster CPUs I use at work. For this test I used an Intel NUC with a 5th generation core i3 CPU. I am still trying to figure this out.
  • InnoDB in MySQL 5.6 is faster than in 5.7 for most write-heavy tests.
  • InnoDB in MySQL 5.7 is faster than in 5.6 for most read-heavy tests. I am curious why this depends on read-heavy vs write-heavy.
Details

The previous blog post has all of the details and and shared results for the IO-bound test that used 4 tables with 40M rows per table. This test used 4 tables with 1M rows per table. Tests are run for 3 storage engines - MyRocks from FB MySQL merged to upstream MySQL 5.6.35 (myrocks-5635), InnoDB from upstream MySQL 5.6.35 (innodb-5635) and InnoDB from upstream MySQL 5.7.10 (innodb-5710). The sequence of tests is the same as described in the previous blog post.

A sample command line for the test is:
bash all_small.sh 4 1000000 600 600 300 innodb 1 0 \
    ~/b/orig5710/bin/mysql none ~/b/sysbench/share/sysbench


Update-only with secondary index maintenance

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
6795    10738   13212   myrocks-5635
3907     5912    7323   innodb-5635
3286     4820    6012   innodb-5710
- optimized my.cnf
6787    10735   13103   myrocks-5635
3616     6021    7393   innodb-5635
3122     4733    6131   innodb-5710

Summary:
  • MyRocks is faster than InnoDB because secondary index maintenance is write-only (read-free). On the IO-bound test that avoided stalls from page reads. Here it saves on CPU.
  • The optimized my.cnf doesn't change performance
  • InnoDB in MySQL 5.6 is faster than in 5.7

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
7373    11510   14762   myrocks-5635
8050    10783   13185   innodb-5635
6528     8971   11338   innodb-5710
- optimized my.cnf
7335    11374   14609   myrocks-5635
8809    12240   15179   innodb-5635
6644     8935   11580   innodb-5710

Summary:
  • InnoDB in MySQL 5.6 is slightly faster than MyRocks. The update statement in this test doesn't require secondary index maintenance so MyRocks loses that benefit.
  • The optimized my.cnf helps InnoDB in MySQL 5.6
  • InnoDB in MySQL 5.6 is much faster than in 5.7

Delete

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       concurrency
- default my.cnf
8303    myrocks-5635
7848    innodb-5635
6461    innodb-5710
- optimized my.cnf
8265    myrocks-5635
7819    innodb-5635
6215    innodb-5710

Summary:
  • MyRocks is faster than InnoDB
  • The optimized my.cnf doesn't change performance
  • InnoDB in MySQL 5.6 is faster than in 5.7

Write-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
12547   19685   25236   myrocks-5635
11209   15429   18387   innodb-5635
 9701   13035   15308   innodb-5710
- optimized my.cnf
12197   19241   24533   myrocks-5635
13076   17780   21097   innodb-5635
10401   14496   17402   innodb-5710

Summary:
  • MyRocks is faster than InnoDB at concurrency >= 2
  • The optimized my.cnf helps InnoDB
  • InnoDB in MySQL 5.6 is faster than in 5.7

Read-write with --range-size=100


The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
5404    10126   13214   myrocks-5635
8507    12658   16867   innodb-5635
8983    13311   16806   innodb-5710
- optimized my.cnf
7426    12116   16076   myrocks-5635
10084   15236   20895   innodb-5635
10480   15666   20830   innodb-5710

Summary:
  • InnoDB is faster than MyRocks because there is more CPU overhead in MyRocks for range queries
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB has similar performance for MySQL 5.6 and 5.7

Read-write with --range-size=10000

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
153     286     311     myrocks-5635
273     484     652     innodb-5635
434     733     863     innodb-5710
- optimized my.cnf
185     330     531     myrocks-5635
278     523     684     innodb-5635
449     784     902     innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks  because of the CPU overhead for range queries
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB in MySQL 5.7 is much faster than in 5.6. I think something was done to make range queries more efficient in 5.7.

Read-only with --range-size=10

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
11260   19129   27348   myrocks-5635
13846   25056   37479   innodb-5635
14533   25412   37048   innodb-5710
- optimized my.cnf
13778   22240   31544   myrocks-5635
15348   27860   42859   innodb-5635
15320   27187   42294   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. On my work servers with newer/faster CPUs I don't reproduce this and am still trying to understand the cause.
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB has similar performance for MySQL 5.6 and 5.7

Read-only with --range-size=100

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
4393     8245   10191   myrocks-5635
8943    16784   25504   innodb-5635
11288   19771   28019   innodb-5710
- optimized my.cnf
8134    14686   20881   myrocks-5635
9847    18361   26953   innodb-5635
11778   20905   30065   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • InnoDB in MySQL 5.7 is faster than 5.6. I think range queries were made more efficient in 5.7

Read-only with --range-size=10000


The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
106     209     242     myrocks-5635
214     411     536     innodb-5635
357     624     711     innodb-5710
- optimized my.cnf
199     380     486     myrocks-5635
227     424     556     innodb-5635
374     648     732     innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • InnoDB in MySQL 5.7 is faster than 5.6. I think range queries were made more efficient in 5.7

Point-select

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
16482   29111   36510   myrocks-5635
15840   30280   37841   innodb-5635
16347   29451   47411   innodb-5710
- optimized my.cnf
16411   28628   35344   myrocks-5635
18481   34911   43390   innodb-5635
19169   31806   53018   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • It isn't clear whether InnoDB is faster in MYSQL 5.6 or 5.7

Insert-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
8003    11722   13323   myrocks-5635
9548    11549   13323   innodb-5635
8065     9541    9816   innodb-5710
- optimized my.cnf
8271    12312   13804   myrocks-5635
9712    12732   14917   innodb-5635
8357    10716   11985   innodb-5710

Summary:
  • InnoDB in MySQL 5.6 is faster than MyRocks
  • The optimized my.cnf helps performance
  • InnoDB in MySQL 5.6 is faster than in 5.7