ftrack is built around different services, where a few of them directly impact performance if they are overutilized. To mitigate overutilization issues those services can be scaled horizontally which means more identical services are created to spread the load more evenly and allow using more CPU in total.
The database server can easily become a bottleneck if not configured correctly to utilize the available resources. The database will grow in size over time and must be reconfigured for optimal performance based on its size. In addition to memory, the database server is also very dependent on CPU resources and must be constantly monitored to ensure it never gets close to 100% usage. To mitigate CPU based database bottlenecks, the database should be scaled vertically if possible. Scaling the database vertically means adding more CPU resources. If CPU usage is low, but database is still a bottleneck it may be because of slow queries or that the database does not have enough RAM to keep large portions of the database in memory.
The database server can also be scaled out horizontally using replication. Currently we do not provide support or documentation for how to setup and manage database replication. Using master-slave replications not recommended (except for read-only backups), but master-master replication has been used successfully.
The amount of tracked data in ftrack will grow over time and it is important to remove unused data on a regular basis to keep ftrack fast and avoid having to increase server resources. Most efficient way to cleanup the database is to remove projects that are no longer needed, or use the API to remove parts of those projects. There are also tools available on request that can be used to cleanup historical data such as project change history which may not be important to keep after a project is completed.
Backing up the database is very important, and how often it is backed up should be determined by your recovery time objective (RTO). Backing up the database can however result in performance degradations for the service if the database is large. Alternative backup methods can be used to mitigate this. Examples are:
- Use a physical backup method instead of mysqldump such as mariabackup (https://mariadb.com/kb/en/backup-and-restore-overview/)
- Setup a read slave/replica that is used for backups with mysqldump.
Restoring from a backup can take a very long time, especially when using mysqldump as backup method. Using a physical backup method can reduce the restore time significantly and therefore also the time to recover from an incident or accidental delete operation.
The number of replicas for each internal service allows scaling out of the ftrack service, or in if setting up a staging server. The most important is uwsgi, which will handle all http requests and may need to be scaled up if the current replicas use a lot of CPU. For a staging server, you can set all replicas to 1.
Helm chart 1.3.0 and after:
Helm chart before 1.3.0:
Keep in mind that each process will consume memory. If they do not fit in your cluster, they will not start.
Set the number of replicas to 1.
Resource usage can be limited even more by setting:
This allow running ftrack on very limited resources and can be used for testing. Services will however compete for the same resources and risk running out of memory.
When running MariaDB in production, we recommend that you adjust its configuration to suit your needs and to get the best possible performance out of ftrack.
The most important configuration option is “innodb-buffer-pool-size”. This should be adjusted as your database grows. We recommend that you set it to roughly the same size as your database or greater. You can run the following query to see the size of the ftrack database:
SELECT table_schema "name", ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "size in MB" FROM information_schema.tables GROUP BY table_schema;
However, be cautious when setting to a high value. If there is not enough RAM available, MariaDB will not start. With that in mind, you should not set innodb-buffer-pool-size to more than 80% of the available RAM.
Here is the recommended configuration for MariaDB, where innodb-buffer-pool-size should be adjusted based on the server:
# CLIENT #
port = 3306
socket = /var/lib/mysql/mysql.sock
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/mysql.pid
# MyISAM #
key-buffer-size = 32M
myisam-recover-options = FORCE,BACKUP
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
log-bin = /var/lib/mysql/mysql-bin
expire-logs-days = 14
sync-binlog = 1
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 1000
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 2048M
innodb-stats-persistent-sample-pages = 50
# LOGGING #
log-error = /var/lib/mysql/mysql-error.log
This section describe some common situations that can happen for on-prem installation and how to investigate them.
Loading data in ftrack is sometimes fast and sometimes slow
If loading times are different for the same data from time to time, or during certain times of the day it is likely that some part of the service is overloaded with requests and does not have enough resources. A bottleneck like this is most likely due to the database or the backend service reading from the database. Investigate this further and see if any of them are consuming close to 100% CPU and if that is the case scale those resources if possible. While adding more resources will help, you must also ensure custom tools are as efficient as possible and do not make unnecessary requests to the server.
Database server is reaching 100% CPU usage
A database reaching 100% CPU means it is able to use all CPU resources to respond to queries and is not restricted by the system and disk speeds. If load on ftrack is high, you need to scale the database server to add more CPU resources. If very little queries are executed but CPU is still very high, it could be because some queries are less optimal. Short term more CPU will help, but longer term we would like to know about those queries and investigate how we can improve them.
Database server is slow but using very little CPU
If the database server is responding slowly to queries and has very little CPU usage, ensure it has been configured properly to use enough RAM to keep the database in memory. If the database does not fit in memory, it needs to read from disk often which is much slower. This can also be seen by monitoring disk usage, which if database has been configured properly could indicate less optimal queries.
This section lite a few simple tools that can be used to stats about the service.
List running database queries
mysql -uroot -p -B -e 'SELECT * FROM information_schema.processlist where command != "Sleep" order by Time DESC limit 10;'
This can be useful if some of them are taking a lot of time to finish.
Realtime database stats
https://github.com/innotop/innotop can be used to see realtime stats for the database. When using this tool we recommend updating the refresh rate to every second and then looking at the queries (Q). This shows queries per second right now and shows a list of some of the queries that are taking longer to process.
Realtime server requests
ngxtop can be used to look at the current requests sent to the ftrack server in realtime to get a rough understanding of the load on the server.
ngxtop requires a file to read from and nginx logs from the cluster can be streamed to a file with:
kubectl -n default logs -f deployment/nginx --tail=0 /tmp/nginx.log
and then run ngxtop with:
ngxtop -l /tmp/nginx.log