Galaxy production environment

The system allows to setup and launch a virtual machine (VM) configured with the Operative System (CentOS 7 or Ubuntu 14.04/16.04) and the auxiliary applications needed to support a Galaxy production environment such as PostgreSQL, Nginx, uWSGI and Proftpd and to deploy the Galaxy platform itself. A common set of Reference data is available through a CernVM-FS volume.

Once deployed each Galaxy instance can be further customized with tools and reference data.

The Galaxy production environment is deployed according to Galaxy official documentation: https://docs.galaxyproject.org/en/latest/admin/production.html.

galaxy production environment

OS support

CentOS 7 is our default distribution, Given its adherence to Standards and the length of official support (CentOS-7 updates until June 30, 2024, https://wiki.centos.org/FAQ/General#head-fe8a0be91ee3e7dea812e8694491e1dde5b75e6d). CentOS 7 and Ubuntu (14.04 and 16.04) are both supported.

CentOS 7 and Ubuntu Xenial 16.04 exploit systemd as as init system, while Ubuntu Trusty 14.04 still uses upstart.

Warning

Selinux is by default disabled on CentOS.

PostgresSQL

PostgreSQL packages coming from PostgreSQL official repository are installed:

Distribution Repository
Centos https://wiki.postgresql.org/wiki/YUM_Installation
Ubuntu https://wiki.postgresql.org/wiki/Apt

Current stable PostgreSQL version is installed: PostgreSQL 9.6

On CentOS 7 the default pgdata directory is /var/lib/pgsql/9.6/data. The pg_hba.conf configuration is modified allowing for password authentication. On CentOS we need to exclude CentOS base and updates repo for PostgreSQL, otherwise dependencies might resolve to the postgresql supplied by the base repository.

On Ubuntu default pgdata directory is /var/lib/postgresql/9.6/main, while the configuration files are stored in /etc/postgresql/9.6/main. There’s no need to modify the HBA configuration file since, by default, it is allowing password authentication.

PostgreSQL start/stop/status in entrusted to Systemd on CentOS 7 and Ubuntu Xenial and to Upstart for Ubuntu Trusty.

Distribution Command
CentOS 7 sudo systemctl start/stop/status postgres-9.6
Ubuntu Xenial sudo systemctl start/stop/status postgresql
Ubuntu Trusty sudo service postgresql start/stop/status

Galaxy database configuration

Two different database are configured to track data and tool shed install data, allowing to bootstrap fresh Galaxy instance with pretested installs. The database passwords are randomly generated and the passoword can be retrieved in the galaxy.ini file.

Galaxy database is named galaxy and is configured in the galaxy.ini file:

database_connection = postgresql://galaxy:gtLxNnH7DpISmI5FXeeI@localhost:5432/galaxy

The shed install tool database is named galaxy_tools and is configured as:

install_database_connection = postgresql://galaxy:gtLxNnH7DpISmI5FXeeI@localhost:5432/galaxy_tools

Docker

On Docker container PostgreSQL cannot be managed through systemd/upstart, since there’s no init system on CentOS and Ubuntu docker images. Therefore, the system is automatically configured to run postgresql using supervisord.

NGINX

To improve Galaxy performance, NGINX is used as web server. The official Galaxy nginx packages are used by default (built in upload module support).

Distribution Repository
Centos https://depot.galaxyproject.org/yum/
Ubuntu ppa:galaxyproject/nginx

Moreover, on Ubuntu, we need to prevent NGINX to be updated by apt default packages. For this purpose the pin priority of NGINX ppa packages is raised, by editing /etc/apt/preferences.d/galaxyproject-nginx-pin-700 (more on apt pinning at: https://wiki.debian.org/AptPreferences).

NGINX is configured following the official Galaxy wiki: https://galaxyproject.org/admin/config/nginx-proxy/.

NGINX is started, usually using the command line, from /usr/sbing/nginx:

$ sudo nginx

NGINX options

NGINX options are listed here: https://www.nginx.com/resources/wiki/start/topics/tutorials/commandline/

Option Description
-?, -h Print help.
-v Print version.
-V Print NGINX version, compiler version and configure parameters.
-t Don’t run, just test the configuration file. NGINX checks configuration for correct syntax and then try to open files referred in configuration.
-q Suppress non-error messages during configuration testing.
-s signal Send signal to a master process: stop, quit, reopen, reload. (version >= 0.7.53)
-p prefix Set prefix path (default: /usr/local/nginx/). (version >= 0.7.53)
-c filename Specify which configuration file NGINX should use instead of the default.
-g directives Set global directives. (version >= 0.7.4)

The main way to start/stop/reload nginx is through the -s command line option:

Action Command
Start sudo nginx
Stop sudo nginx -s stop
Restart First stop nginx then start it: sudo nginx -s stop; sudo nginx

Finally, to start/stop/status NGINX with systemd:

Dstribution Command
CentOS 7 sudo systemctl start/stop/status nginx
Ubuntu Xenial sudo systemctl start/stop/status nginx
Ubuntu Trusty sudo service nginx start/stop/status

NGINX troubleshooting

Running NGINX on CentOS through systemd could lead to this error in /var/log/nginx/error.log, which can prevent Galaxy web page loading:

2017/08/24 08:22:32 [crit] 3320#0: *7 connect() to 127.0.0.1:4001 failed (13: Permission denied) while connecting to upstream, client: 192.167.91.214, server: localhost, request: "GET /galaxy HTTP/1.1", upstream: "uwsgi://127.0.0.1:4001", host: "90.147.102.159"

This is related to SELinux polixy on CentOS.

Warning

You should avoid to modify SELinux policy, since you can still use NGINX command line options.

Anyway, the problem is that selinux dany socket access. This results in a generic access denied error in NGINX’s log, the important messages are actually in selinux’s audit log. To solve this issue, you can ran the following commands as superuser.

# show the new rules to be generated
grep nginx /var/log/audit/audit.log | audit2allow

# show the full rules to be applied
grep nginx /var/log/audit/audit.log | audit2allow -m nginx

# generate the rules to be applied
grep nginx /var/log/audit/audit.log | audit2allow -M nginx

# apply the rules
semodule -i nginx.pp

Then restart NGINX.

You may need to generate the rules multiple times (likely four times to fix all policies), trying to access the site after each pass, since the first selinux error might not be the only one that can be generated.

Further readings

uWSGI

uWSGI (https://uwsgi-docs.readthedocs.io/en/latest) is used as interface between the web server (i.e. NGINX) and the web application (i.e. Galaxy). Using uWSGI for production servers is recommended by the Galaxy team: https://galaxyproject.org/admin/config/performance/scaling/

uWSGI configuration is embedded in the galaxy.ini file ($HOME/galaxy/config/galaxy.ini), with 4 handler configuration. By defalut the number of processes (i.e. uWSGI workers is set to number_of_virtual_cpus - 1. This configuration should be fine for most uses. Nevertheless, there’s no golden rule to define the workers number. It is up to the end-user to configure it dependig on your needs. The same goes for the number of job handlers (4 by default).

UWSGI socket and stats server are, by default, listening on 127.0.0.1:4001 and 127.0.0.1:9191, respectively. More on the uWSGI stats server here: http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html?highlight=stats%20server.

UWSGI Galaxy Configuration:

[uwsgi]
master = True
processes = 1
socket = 127.0.0.1:4001
stats = 127.0.0.1:9191
pythonpath = /home/galaxy/galaxy/lib
pythonhome = /home/galaxy/galaxy/.venv
threads = 4
logto = /var/log/galaxy/uwsgi.log

# Job Handler(s)

[server:handler0]
use = egg:Paste@http
port = 8090
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler1]
use = egg:Paste@http
port = 8091
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler2]
use = egg:Paste@http
port = 8092
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler3]
use = egg:Paste@http
port = 8093
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

Proftpd

To allow user to upload files (> 2GB) through FTP, Proftpd is installed and configured on each Galaxy server, according to: https://galaxyproject.org/admin/config/upload-via-ftp/

Proftpd configuration file is located at /etc/proftdp.conf on CentOS and /etc/proftpd/proftpd.conf on Ubuntu.

To grant a user access to read emails and passwords from the Galaxy database, a separate user is created for the FTP server which has permission to SELECT from the galaxy_user table and nothing else.

Proftpd is listening on port 21. FTP protocol is not encrypted by default, thus any usernames and passwords are sent over clear text to Galaxy.

How to use FTP through FileZilla

You need to disable Passive (PASV) mode in FileZilla, since we are not going to open all passive ports.

  1. Open FileZilla.
  2. Click on Edit | Settings.
  3. Open Connection menu on the left. Click on FTP menu.
  4. Mark the Active radio button.
  5. Click OK.

How to use FTP through command line

To install FTP command line client, type sudo yum install ftp on CentOS or sudo apt-get install ftp on Ubuntu.

To establish a connection with Glaxy Proftpd server, you can use your Galaxy username and password, in addition to the server IP address you’re connecting to (e.g. 90.147.102.82). To open a connection in Terminal type the following command, replacing the IP address with with your server IP address:

$ ftp 90.147.102.82
Connected to 90.147.102.82.
220 ProFTPD 1.3.5e Server (galaxy ftp server) [::ffff:90.147.102.82]
Name (90.147.102.82:marco):

Then login with your Galaxy credentials, typing your Galaxy e-mail address and password:

$ ftp 90.147.102.82
Connected to 90.147.102.82.
220 ProFTPD 1.3.5e Server (galaxy ftp server) [::ffff:90.147.102.82]
Name (90.147.102.82:marco): ma.tangaro@gmail.com
331 Password required for ma.tangaro@gmail.com
Password:

To upload file to your Galaxy remote directory:

ftp> put Sc_IP.fastq
local: Sc_IP.fastq remote: Sc_IP.fastq
229 Entering Extended Passive Mode (|||30023|)
150 Opening BINARY mode data connection for Sc_IP.fastq
8% |******                                                                           | 12544 KiB   23.84 KiB/s  1:31:23 ETA

Then you will find it on Galaxy:

ftp fasta file copy

Here’s a list of the basic commands that you can use with the FTP client.

Command Description
ls to find out the pathname of the current directory on the remote machine.
cd to change directory on the remote machine.
pwd to find out the pathname of the current directory on the remote machine.
delete to delete (remove) a file in the current remote directory (same as rm in UNIX).
mkdir to make a new directory within the current remote directory.
rmdir to to remove (delete) a directory in the current remote directory.
get to copy one file from the remote machine to the local machine

get ABC DEF copies file ABC in the current remote directory to (or on top of) a file named DEF in your current local directory.

get ABC copies file ABC in the current remote directory to (or on top of) a file with the same name, ABC, in your current local directory.
mget to copy multiple files from the remote machine to the local machine; you are prompted for a y/n answer before transferring each file.
put to copy one file from the local machine to the remote machine.
mput o copy multiple files from the local machine to the remote machine; you are prompted for a y/n answer before transferring each file.
quit to exit the FTP environment (same as bye).

Supervisord

Supervisor is a process manager written in Python, which allows its users to monitor and control processes on UNIX-like operating systems. It includes:

  1. Supervisord daemon (privileged or unprivileged);
  2. Supervisorctl command line interface;
  3. INI config format;
  4. [program:x] defines a program to control.

Supervisord requires root privileges to run.

Galaxy supervisord configuration is located here: https://docs.galaxyproject.org/en/master/admin/framework_dependencies.html?highlight=uwsgi#supervisor

and here: https://galaxyproject.github.io/dagobah-training/2016-saltlakecity/002a-systemd-supervisor/systemd-supervisor.html#1

A configuration running the Galaxy server under uWSGI has been installed on /etc/supervisord.d/galaxy_web.ini on CentOS, while it is located on /etc/supervisor/conf.d/galaxy.conf on Ubuntu. The options stopasgroup = true and killasgroup = true ensure that the SIGINT signal, to shutdown Galaxy, is propagated to all uWSGI child processes (i.e. to all uWSGI workers).

PYTHONPATH is not specified in this configuration since it was conflicting with Conda running.

To manage Galaxy through supervisord:

Action Command
Start Galaxy sudo supervisorctl start galaxy:
Stop Galaxy sudo supervisorctl stop galaxy:
Restart Galaxy sudo supervisorctl restart galaxy:
Galaxy status sudo supervisorctl status galaxy:
$ supervisorctl help

default commands (type help <topic>):
=====================================
add    clear  fg        open  quit    remove  restart   start   stop  update
avail  exit   maintail  pid   reload  reread  shutdown  status  tail  version
$ sudo supervisorctl status galaxy:
galaxy:galaxy_web                RUNNING   pid 9030, uptime 2 days, 21:19:28
galaxy:handler0                  RUNNING   pid 9031, uptime 2 days, 21:19:28
galaxy:handler1                  RUNNING   pid 9041, uptime 2 days, 21:19:27
galaxy:handler2                  RUNNING   pid 9046, uptime 2 days, 21:19:26
galaxy:handler3                  RUNNING   pid 9055, uptime 2 days, 21:19:25

galaxy_web.ini file configuration:

[program:galaxy_web]
command         = /home/galaxy/galaxy/.venv/bin/uwsgi --virtualenv /home/galaxy/galaxy/.venv --ini-paste /home/galaxy/galaxy/config/galaxy.ini --pidfile /var/log/galaxy/uwsgi.pid
directory       = /home/galaxy/galaxy
umask           = 022
autostart       = true
autorestart     = true
startsecs       = 20
user            = galaxy
environment     = PATH="/home/galaxy/galaxy/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
numprocs        = 1
stopsignal      = INT
startretries    = 15
stopasgroup     = true
killasgroup     = true

[program:handler]
command         = /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler%(process_num)s --log-file=/var/log/galaxy/handler%(process_num)s.log
directory       = /home/galaxy/galaxy
process_name    = handler%(process_num)s
numprocs        = 4
umask           = 022
autostart       = true
autorestart     = true
startsecs       = 20
user            = galaxy
startretries    = 15

[group:galaxy]
programs = handler, galaxy_web

Finally, a systemd script has been installed to start/stop Supervisord on /etc/systemd/system/supervisord.service.

Action Command
Start sudo systemctl start supervisord.service
Stop sudo systemctl stop supervisord.service
Restart sudo systemctl restart supervisord.service
Status sudo systemctl status supervisord.service
$ sudo systemctl status supervisord.service
  ● supervisord.service - Supervisor process control system for UNIX
   Loaded: loaded (/etc/systemd/system/supervisord.service; disabled; vendor preset: disabled)
   Active: active (running) since Sat 2017-08-12 08:48:33 UTC; 9s ago
     Docs: http://supervisord.org
 Main PID: 12204 (supervisord)
   CGroup: /system.slice/supervisord.service
           ├─12204 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisord.conf
           ├─12207 /home/galaxy/galaxy/.venv/bin/uwsgi --virtualenv /home/galaxy/galaxy/.venv --ini-paste /home/galaxy/galaxy/config/galaxy.ini --pidfile /var/log/galaxy/uwsgi.pid
           ├─12208 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler0 --log-file=/var/log/galaxy/handler0.log
           ├─12209 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler1 --log-file=/var/log/galaxy/handler1.log
           ├─12210 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler2 --log-file=/var/log/galaxy/handler2.log
           └─12211 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler3 --log-file=/var/log/galaxy/handler3.log

Aug 12 08:48:33 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:33,805 CRIT Supervisor running as root (no user in config file)
Aug 12 08:48:33 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:33,805 WARN Included extra file "/etc/supervisord.d/galaxy_web.ini" during parsing
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,564 INFO RPC interface 'supervisor' initialized
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,564 CRIT Server 'unix_http_server' running without any HTTP authentication checking
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,565 INFO supervisord started with pid 12204
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,569 INFO spawned: 'galaxy_web' with pid 12207
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,573 INFO spawned: 'handler0' with pid 12208
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,576 INFO spawned: 'handler1' with pid 12209
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,581 INFO spawned: 'handler2' with pid 12210
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,584 INFO spawned: 'handler3' with pid 12211

Galaxy init scripts

Systemctl is the command line interface to systemd:

systemctl <start|stop|restart|...> <name>[.service]
systemctl <enable|disable> <name>[.service]

Since CentOS and Ubuntu Xenial 16.04 exploits systemd as init system, the Galaxy init script is located in /etc/systemd/system/galaxy.service.

Action Command
Start sudo systemctl start galaxy.service
Stop sudo systemctl stop galaxy.service
Restart sudo systemctl restart galaxy.service
Status sudo systemctl status galaxy.service

Ubuntu Trusty 14.04 exploits Upstart as init system. Galaxy init file is located in /etc/init.d/galaxy.

Action Command
Start sudo service galaxy start
Stop sudo service galaxy stop
Restart sudo service galaxy restart
Status sudo service galaxy status