Laniakea Documentation

elixir-italy logo

Laniakea provides the possibility to automate the creation of Galaxy-based virtualized environments through an easy setup procedure, providing an on-demand workspace ready to be used by life scientists and bioinformaticians.

Galaxy is a workflow manager adopted in many life science research environments in order to facilitate the interaction with bioinformatics tools and the handling of large quantities of biological data.

Once deployed each Galaxy instance will be fully customizable with tools and reference data and running in an insulated environment, thus providing a suitable platform for research, training and even clinical scenarios involving sensible data. Sensitive data requires the development and adoption of technologies and policies for data access, including e.g. a robust user authentication platform.

For more information on the Galaxy Project, please visit the https://galaxyproject.org

Laniakea has been developed by ELIXIR-IIB, the italian node of ELIXIR, within the INDIGO-DataCloud project (H2020-EINFRA-2014-2) which aims to develop PaaS based cloud solutions for e-science.

Note

Laniakea is in fast development. For this reason the code and the documentation may not always be in sync. We try to make our best to have good documentatation

Overview

Galaxy is a workflow manager adopted in many life science research environments in order to facilitate the interaction with bioinformatics tools and the handling of large quantities of biological data. Through a coherent work environment and an user-friendly web interface it organizes data, tools and workflows providing reproducibility, transparency and data sharing functionalities to users.

Currently, Galaxy instances can be deployed in three ways, each one with pros and cons: public servers, local servers and commercial cloud solutions. In particular, the demand for cloud solutions is rapidly growing (over 2400 Galaxy cloud servers launched in 2015, since they allowi the creation of a ready-to-use galaxy production environment avoiding initial configuration issues, requiring less technical expertise and outsourcing the hardware needs. Nevertheless relying on commercial cloud providers is quite costly and can pose ethical and legal drawbacks in terms of data privacy.

ELIXIR-IIB in the framework of the INDIGO-DataCloud project is developing a cloud Galaxy instance provider, allowing to fully customize each virtual instance through a user-friendly web interface, overcoming the limitations of others galaxy deployment solutions. In particular, our goal is to develop a PaaS architecture to automate the creation of Galaxy-based virtualized environments exploiting the software catalogue provided by the INDIGO-DataCloud community (www.indigo-datacloud.eu/service-component).

Once deployed each Galaxy instance will be fully customizable with tools and reference data and running in an insulated environment, thus providing a suitable platform for research, training and even clinical scenarios involving sensible data. Sensitive data requires the development and adoption of technologies and policies for data access, including e.g. a robust user authentication platform.

The system allows to setup and launch a virtual machines configured with the Operative System (CentOS 7 or Ubuntu 14.04/16.04) and the auxiliary applications needed to support a Galaxy production environment such as PostgreSQL, Nginx, uWSGI and Proftpd and to deploy the Galaxy platform itself. It is possible to choose between different tools preset, or flavors: basic Galaxy or Galaxy configured with a selection of tools for NGS analyses already installed and configured (e.g. SAMtools, BamTools, Bowtie, MACS, RSEM, etc…) together with reference data for many organisms.

Service architecture

The web front-end is designed to grant user friendly access to the service, allowing to easily configure and launch each Galaxy instance through the INDIGO FutureGateway portal.

Galaxy as a cloud service architecture

Galaxy as a Cloud Service architecture

All the required components to automatically setup Galaxy instances (Galaxy and all its companion software) are deployed using the INDIGO PaaS Orchestrator and the Infrastructure Manager services, based on the TOSCA orchestration language. The service is compatible with both OpenNebula and OpenStack, its deployment on different e-infrastructures. Moreover, it supports both VMs and Docker containers, leaving the selection of the virtual environment to the service providers. This effectively removes the need to depend on particular configurations (e.g. OpenStack, OpenNebula or other private cloud solution like Amazon or Google).

Persistent storage is provided to store users and reference data and to install and run new (custom) tools and workflows. Data security and privacy are granted through the INDIGO ONEDATA component which, at the same time, allow for transparent access to the storage resources through token management. Data encryption implemented at file system level protects user’s data from any unauthorized access.

Automatic elasticity, provided using the INDIGO CLUES service component, enables dynamic cluster resources scaling, deploying and powering-on new working nodes depending on the workload of the cluster and powering-off them when no longer needed. This provides an efficient use of the resources, making them available only when really needed.

ELIXIR-IIB: The Italian Infrastructure for Bioinformatics

elixir-iib logo

ELIXIR-IIB (elixir-italy.org) is the Italian Node of ELIXIR (elixir-europe.org) and collects most of the leading Italian institutions in the field of bioinformatics, including a vast and heterogeneous community of scientists that use, develop and maintain a large set of bioinformatics services. It represents the Italian Node of ELIXIR, an European research infrastructure which goal is to integrate research data from all over Europe and ensure a seamless service provision easily accessible by the scientific community.

ELIXIR-IIB is also one of the scientific communities providing use cases to the INDIGO-Datacloud project (H2020-EINFRA-2014-2) which aims to develop PaaS based cloud solutions for e-science.

For a complete overview of ELIXIR-IIB related projects and services, please visit: http://elixir-italy.org/en/

INDIGO-DataCloud

INDIGO-DataCloud project logo

The INDIGO-DataCloud project (H2020-EINFRA-2014-2) aims to develop an open source computing and data platform, targeted at multi-disciplinary scientific communities, provisioned over public and private e-infrastructures.

In order to exploit the full capabilities of current cloud infrastructures, supporting complex workflows, data transfer and analysis scenarios, the INDIGO architecture is based on the analysis and the realization of use cases selected by different research communities in the areas of High Energy Physics, Bioinformatics, Astrophysics, Environmental modelling, Social sciences and others.

INDIGO-DataCloud architecture

The INDIGO-DataCloud architecture

INDIGO-DataCloud communities

The INDIGO-DataCloud communities

INDIGO released two software release:

Release Code name URL
First release MIDNIGHTBLUE https://www.indigo-datacloud.eu/news/first-indigo-datacloud-software-release-out
Second release ELECTRICINDIGO https://www.indigo-datacloud.eu/news/electricindigo-second-indigo-datacloud-software-release

The INDIGO-DataCloud releases provide open source components for:

  1. IaaS layer: increase the efficiency of existing Cloud infrastructures based on OpenStack or OpenNebula through advanced scheduling, flexible cloud/batch management, network orchestration and interfacing of high-level Cloud services to existing storage systems.
  2. PaaS layer: easily port applications to public and private Clouds using open programmable interfaces, user-level containers, and standards-based languages to automate definition, composition and embodiment of complex set-ups.
  3. Identity and Access Management: manage access and policies to distributed resources.
  4. FutureGateway: a programmable scientific portal providing easy access to both the advanced PaaS features provided by the project and to already existing applications.
  5. Data Management and Data Analytics Solutions: distribute and access data through multiple providers via virtual file systems and automated replication and caching.

For a complete list of INDIGO-DataCloud services, please visit: https://www.indigo-datacloud.eu/service-component

The ELIXIR-IIB use case in INDIGO

ELIXIR-IIB in the framework of the INDIGO-DataCloud project is developing a cloud Galaxy instance provider, allowing to fully customize each virtual instance through a user-friendly web interface, overcoming the limitations of others galaxy deployment solutions. In particular, our goal is to develop a PaaS architecture to automate the creation of Galaxy-based virtualized environments exploiting the software catalogue provided by the INDIGO-DataCloud community.

  1. All Galaxy required components automatically deployed (INDIGO PaaS Orchestrator and the Infrastructure Manager):
    • Galaxy
    • PostgreSQL
    • NGINX
    • uWSGI
    • Proftpd
    • Galaxy tools (from ToolShed)
    • Reference Data
  2. User friendly access, allowing to easily configure and launch a Galaxy instance (INDIGO FutureGateway portal)
  3. Authentication (Identity and Access Management and FutureGateway)
  4. Persistent storage, data security and privacy (Onedata or IaaS block storage with filesystem encryption).
  5. Cluster support with automatic elasticity (INDIGO CLUES).
ELIXIR-IIB use case architecture (single VM)

ELIXIR-IIB use case in INDIGO architecture for single Galaxy instances deployment.

ELIXIR-IIB use case archithecture (cluster)

ELIXIR-IIB use case in INDIGO architecture for Galaxy with cluster support deployment

Get Galaxy Express

The Galaxy Express section allows user to deploy a standard Galaxy production environment.

The service instantiate a CentOS 7 Virtual Machine with Galaxy, all its companion software and tools already embedded. Once deployed each Galaxy instance can be further customized with tools and reference data.

See also

For a detailed descreption of all Web UI options see section: Laniakea options.

See also

To login into the portal see section: Authentication.

Instantiate Galaxy

  1. Enter in the Galaxy Express section:

    Galaxy express main window
  2. Describe your instance using the Instance description field, which will identfy your Galaxy in the job list, once your request is submitted.

  3. Select the Instance flavor, (virtual CPUs and RAM):

    Currently, the following pre-sets are available, but not all of them are enabled.

    Name VCPUs RAM Total Disk Root Disk
    small 1 2 GB 20 GB 20 GB
    medium 2 4 GB 20 GB 20 GB
    large 4 8 GB 20 GB 20 GB
    xlarge 8 16 GB 20 GB 20 GB
    xxlarge 16 32 GB 20 GB 20 GB
  4. Copy & Paste your SSH key, to login in the Galaxy instance:

    SSH public key injection
  5. Storage section allows to select the user storage volume size. The Enable encryption flag is explained here: Get encrypted instance.

    Galaxy express Storage section
  6. Select the Galaxy version, the instance administrator e-mail and your custom Galaxy:

    Galaxy express Galxy configuration section

Warning

Please insert a vail mail address. No check is performed on its syntax, but entering an incorrect email address will cause deployment failure if the encryption option is set.

  1. Select Galaxy tools pre-set:

    Galaxy express Tools section
  2. and reference dataset:

    Galaxy express Tools section
  3. Finally, SUBMIT your request:

    Galaxy express submit request

Galaxy login

The galaxy administrator password is automatically generated during the instatiation procedure and is the same for each deployed instance:

User: galaxy administrator e-mail

Password: galaxy_admin_password

Warning

The anonymous login is by default disabled.

Warning

Change Galaxy password and the API key as soon as possible!

Get Galaxy

The Galaxy section allows user to deploy a full Galaxy production environment.

The service allows to setup and launch a virtual machine configured with the Operative System CentOS 7 and the auxiliary applications needed to support a Galaxy production environment such as PostgreSQL, Nginx, uWSGI and Proftpd and to deploy the Galaxy platform itself and the selected Galaxt tools.

Warning

Everything is configured on the fly and, depending on the number of the tools to be installed may take time.

See also

For a detailed descreption of all Web UI options see section: Laniakea options.

See also

To login into the portal see section: Authentication.

Instantiate Galaxy

  1. Enter the Galaxy section:

    Galaxy express main window
  2. Describe your instance using the Instance description field, which will identfy your Galaxy in the job list, once your request is submitted.

  3. Select your instance flavour (virtual CPUs and the memory size):

    Currently, the following pre-sets are available, but not all of them are enabled.

    Name VCPUs RAM Total Disk Root Disk
    small 1 2 GB 20 GB 20 GB
    medium 2 4 GB 20 GB 20 GB
    large 4 8 GB 20 GB 20 GB
    xlarge 8 16 GB 20 GB 20 GB
    xxlarge 16 32 GB 20 GB 20 GB
  4. Copy & Paste your SSH key, to login in the Galaxy instance:

    SSH public key injection
  5. Storage section allows to select the user storage volume size. The Enable encryption flag is explained here: Get encrypted instance.

    Galaxy express Storage section
  6. Select the Galaxy version, the instance administrator e-mail and your custom Galaxy brand:

    Galaxy express Galxy configuration section

Warning

Please insert a vail mail address. No check is performed on its syntax, but entering an incorrect email address will cause deployment failure if the encryption option is set.

  1. Select Galaxy tools pre-set:

    Galaxy express Tools section
  2. and reference dataset:

    Galaxy express Tools section
  3. Finally, SUBMIT your request:

    Galaxy express submit request

Galaxy login

The galaxy administrator password and the API key are automatically generated during the instatiation procedure and are the same for each instance:

User: your user e-mail

Password: galaxy_admin_password

API key: ADMIN_API_KEY

Warning

The anonymous login is by default disabled.

Warning

Change Galaxy password and the API key as soon as possible!

Get encrypted instance

The service provides the possibility to encrypt the storage volume associated to the virtual machine on-demand.

Warning

Only the external volume, where Galaxy data are stored, is encrypted, not the Virtual Machine root disk.

To encypt the external volume storage flag encryption in Enable encryption box.

Galaxy encryption

Cryptographic keys should never be transmitted in the clear. For this reason during Galaxy deployment user intervention is required.

Data privacy is granted through LUKS storage encryption: user will be required to insert a password to encrypt/decrypt data directly on the virtual instance during its deployment, avoiding any interaction with the cloud administrator(s).

An e-mail is sent to instance administrator the e-mail address configured in the Galaxy Configuration section.

Galaxy encryption mail configuration section

Warning

Make sure you have entered a valid mail address!

The e-mail is sent you only when the system is ready to accept your password and contains all the instructions to correctly encrypt/decrypt your system. The e-mail subject is [ELIXIR-ITALY] Laniakea storage encryption, sent by Laniakea@elixir-italy.org

Warning

If you don’t receive the e-mail:

  1. Check you SPAM mail directory
  2. Chek mail address spelling
  3. Wait 15 minutes more.
Galaxy encryption mail

Once the e-mail is arrived you can follow the step by step guide to encrypt your volume: Storage encryption procedure.

User is only asked to insert their alphanumeric password 3 times:

  1. Set password
  2. Confirm password
  3. Open LUKS volume.

After the password injection the logout is automatic: the encryption procedure continues in background.

Default encryption algorithm

The default LUKS configuration are:

Cipher: aes-xts-plain64

Key size: 256 bit

Hash Algorithm for key derivation: sha256

See also

For a detailed description of all Web UI options see section: Laniakea options.

Get Galaxy cluster

Warning

Currently, this feature is under beta testing. Galaxy and tools are installed on-the-fly starting from a bare CentOS 7 image. The whole process, i.e. install Galaxy and tools, may take time. We will soon add the possibility to exploit images with tools to speed-up the configuration.

It is possible to deploy a Galaxy cluster, using SLURM as Resource Manager, with or without automatic elasticity support.

  1. Galaxy cluster section: all working nodes are deployed with the Galaxy server and are always available.
  2. Galaxy elastic cluster serciont: automatic elasticity enables dynamic cluster resources scaling, deploying and powering on new working nodes depending on the workload of the cluster and powering-off them when no longer needed. This provides an efficient use of the resources, making them available only when really needed.

The two sections provide the same configuration options.

Warning

Each node takes 12 minutes or more to be instantiated. Therefore, the job needs the same time to start. On the contrary if the node is already deployed the job will start immediately.

See also

For a detailed description Galaxy elastic cluster see section Cluster support (SLURM).

For a detailed descreption of all Web UI options see section: Laniakea options.

To login into the portal see section: Authentication.

Instantiate Galaxy

  1. Enter in the Galaxy cluster section:

    Galaxy express main window
  2. Describe your instance using the Instance description field, which will identfy your Galaxy in the job list, once your request is submitted.

  3. Select the number of Virtual Worker Nondes of your Cluster:

    Virtual hardware configuration
  4. Select the Instance flavor, (virtual CPUs and RAM) for your Front node, i.e. the Galaxy server, and for each Worker Node:

    Currently, the following pre-sets are available:

    Name VCPUs RAM Total Disk Root Disk
    small 1 2 GB 20 GB 20 GB
    medium 2 4 GB 20 GB 20 GB
    large 4 8 GB 20 GB 20 GB
    xlarge 8 16 GB 20 GB 20 GB
    xxlarge 16 32 GB 20 GB 20 GB
  5. Copy & Paste your SSH key, to login in the Galaxy instance:

    SSH public key injection
  6. Storage section allows to select the IaaS storage volume size. The Storage encryption option is explained here: Get encrypted instance.

    Galaxy express Storage section 2
  7. Select the Galaxy version, the instance administrator e-mail and your custom Galaxy brand:

    Galaxy express Galxy configuration section

Warning

Please insert a vail mail address. No check is performed on its syntax, but entering an incorrect email address will cause deployment failure if the encryption option is set.

  1. Select Galaxy tools pre-set:

    Galaxy express Tools section
  2. and reference dataset:

    Galaxy express Tools section
  3. Finally, SUBMIT your request:

    Galaxy express submit request

Galaxy login

The galaxy administrator password and the API key are automatically generated during the instatiation procedure and are the same for each instance:

User: your user e-mail

Password: galaxy_admin_password

API key: ADMIN_API_KEY

Warning

The anonymous login is by default disabled.

Warning

Change Galaxy password and the API key as soon as possible!

Galaxy with Onedata

Warning

Galaxy with Onedata support is on-going and it is not currently available.

Virtual hardware presets

Each cloud provider enable a set of Image Flavor, defined in terms of Virctual CPUs (VCPUS), Memory, Disk, etc.

Currently, the following pre-sets are available:

Name VCPUS RAM Total Disk Root Disk
small 1 2 GB 20 GB 20 GB
medium 2 4 GB 20 GB 20 GB
large 4 8 GB 20 GB 20 GB
xlarge 8 16 GB 20 GB 20 GB
xxlarge 16 32 GB 20 GB 20 GB

New flavors can be assigned to particular projects.

Laniakea options

job identifier

Type: string

Description: Custom job identifier, it will identfy your Galaxy in the job list.

Mandatory: No

Virtual hardware configuration

Instance flavor

Type: selectable menu (string)

Description: Select instance flavor in terms of virtual CPUs, memory size (RAM) and disk size.

Mandatory: Yes

Sections: |galaxy_express|, Galaxy. Available as Galaxy server flavor on cluster configuration tabs.

Worker node number

Type: selectable menu (int)

Description: Maximum worker nodes number.

Mandatory: Yes

Sections: Galaxy cluster, Galaxy elastic cluster.

Worker nodes flavor

Type: selectable menu (string)

Description: Select worker nodee flavor in terms of virtual CPUs, memory size (RAM) and disk size.

Mandatory: Yes

Sections: Galaxy cluster, Galaxy elastic cluster.

SSH public key

Type: string

Description: User personale SSH public key. Required to login through SSH in the Galaxy VM

Mandatory: Yes

Sections: all.

Storage configuration

Storage type

Type: selectable menu (string)

Description: Select commont IaaS Block storage (IaaS) or Encrypted Storage (encryption).

Mandatory: Yes

Sections: all.

Storage size

Type: selectable menu (string)

Description: Select storage size.

Mandatory: Yes

Sections: all.

Galaxy configuration

Galaxy version

Type: selectable menu (string)

Description: Select Galaxy release version.

Mandatory: Yes

Sections: all.

Instance description

Type: string

Description: Customize Galaxy top-left main page badge.

Mandatory: No

Sections: all.

Galaxy administrator e-mail

Type: string

Description: Set galaxy administrator e-mail.

Mandatory: Yes

Sections: all.

Reference data

Type: selectable menu (string)

Description: Select reference data CernVM-FS provider.

Mandatory: Yes

Sections: all.

Tools configuration

Galaxy flavors

Type: selectable menu (string)

Description: Select tools presets to be installed on the server.

Mandatory: No

Sections: all.

Galaxy production environment

The system allows to setup and launch a virtual machine (VM) configured with the Operative System (CentOS 7 or Ubuntu 14.04/16.04) and the auxiliary applications needed to support a Galaxy production environment such as PostgreSQL, Nginx, uWSGI and Proftpd and to deploy the Galaxy platform itself. A common set of Reference data is available through a CernVM-FS volume.

Once deployed each Galaxy instance can be further customized with tools and reference data.

The Galaxy production environment is deployed according to Galaxy official documentation: https://docs.galaxyproject.org/en/latest/admin/production.html.

galaxy production environment

OS support

CentOS 7 is our default distribution, Given its adherence to Standards and the length of official support (CentOS-7 updates until June 30, 2024, https://wiki.centos.org/FAQ/General#head-fe8a0be91ee3e7dea812e8694491e1dde5b75e6d). CentOS 7 and Ubuntu (14.04 and 16.04) are both supported.

CentOS 7 and Ubuntu Xenial 16.04 exploit systemd as as init system, while Ubuntu Trusty 14.04 still uses upstart.

Warning

Selinux is by default disabled on CentOS.

PostgresSQL

PostgreSQL packages coming from PostgreSQL official repository are installed:

Distribution Repository
Centos https://wiki.postgresql.org/wiki/YUM_Installation
Ubuntu https://wiki.postgresql.org/wiki/Apt

Current stable PostgreSQL version is installed: PostgreSQL 9.6

On CentOS 7 the default pgdata directory is /var/lib/pgsql/9.6/data. The pg_hba.conf configuration is modified allowing for password authentication. On CentOS we need to exclude CentOS base and updates repo for PostgreSQL, otherwise dependencies might resolve to the postgresql supplied by the base repository.

On Ubuntu default pgdata directory is /var/lib/postgresql/9.6/main, while the configuration files are stored in /etc/postgresql/9.6/main. There’s no need to modify the HBA configuration file since, by default, it is allowing password authentication.

PostgreSQL start/stop/status in entrusted to Systemd on CentOS 7 and Ubuntu Xenial and to Upstart for Ubuntu Trusty.

Distribution Command
CentOS 7 sudo systemctl start/stop/status postgres-9.6
Ubuntu Xenial sudo systemctl start/stop/status postgresql
Ubuntu Trusty sudo service postgresql start/stop/status

Galaxy database configuration

Two different database are configured to track data and tool shed install data, allowing to bootstrap fresh Galaxy instance with pretested installs. The database passwords are randomly generated and the passoword can be retrieved in the galaxy.ini file.

Galaxy database is named galaxy and is configured in the galaxy.ini file:

database_connection = postgresql://galaxy:gtLxNnH7DpISmI5FXeeI@localhost:5432/galaxy

The shed install tool database is named galaxy_tools and is configured as:

install_database_connection = postgresql://galaxy:gtLxNnH7DpISmI5FXeeI@localhost:5432/galaxy_tools

Docker

On Docker container PostgreSQL cannot be managed through systemd/upstart, since there’s no init system on CentOS and Ubuntu docker images. Therefore, the system is automatically configured to run postgresql using supervisord.

NGINX

To improve Galaxy performance, NGINX is used as web server. The official Galaxy nginx packages are used by default (built in upload module support).

Distribution Repository
Centos https://depot.galaxyproject.org/yum/
Ubuntu ppa:galaxyproject/nginx

Moreover, on Ubuntu, we need to prevent NGINX to be updated by apt default packages. For this purpose the pin priority of NGINX ppa packages is raised, by editing /etc/apt/preferences.d/galaxyproject-nginx-pin-700 (more on apt pinning at: https://wiki.debian.org/AptPreferences).

NGINX is configured following the official Galaxy wiki: https://galaxyproject.org/admin/config/nginx-proxy/.

NGINX is started, usually using the command line, from /usr/sbing/nginx:

$ sudo nginx

NGINX options

NGINX options are listed here: https://www.nginx.com/resources/wiki/start/topics/tutorials/commandline/

Option Description
-?, -h Print help.
-v Print version.
-V Print NGINX version, compiler version and configure parameters.
-t Don’t run, just test the configuration file. NGINX checks configuration for correct syntax and then try to open files referred in configuration.
-q Suppress non-error messages during configuration testing.
-s signal Send signal to a master process: stop, quit, reopen, reload. (version >= 0.7.53)
-p prefix Set prefix path (default: /usr/local/nginx/). (version >= 0.7.53)
-c filename Specify which configuration file NGINX should use instead of the default.
-g directives Set global directives. (version >= 0.7.4)

The main way to start/stop/reload nginx is through the -s command line option:

Action Command
Start sudo nginx
Stop sudo nginx -s stop
Restart First stop nginx then start it: sudo nginx -s stop; sudo nginx

Finally, to start/stop/status NGINX with systemd:

Dstribution Command
CentOS 7 sudo systemctl start/stop/status nginx
Ubuntu Xenial sudo systemctl start/stop/status nginx
Ubuntu Trusty sudo service nginx start/stop/status

NGINX troubleshooting

Running NGINX on CentOS through systemd could lead to this error in /var/log/nginx/error.log, which can prevent Galaxy web page loading:

2017/08/24 08:22:32 [crit] 3320#0: *7 connect() to 127.0.0.1:4001 failed (13: Permission denied) while connecting to upstream, client: 192.167.91.214, server: localhost, request: "GET /galaxy HTTP/1.1", upstream: "uwsgi://127.0.0.1:4001", host: "90.147.102.159"

This is related to SELinux polixy on CentOS.

Warning

You should avoid to modify SELinux policy, since you can still use NGINX command line options.

Anyway, the problem is that selinux dany socket access. This results in a generic access denied error in NGINX’s log, the important messages are actually in selinux’s audit log. To solve this issue, you can ran the following commands as superuser.

# show the new rules to be generated
grep nginx /var/log/audit/audit.log | audit2allow

# show the full rules to be applied
grep nginx /var/log/audit/audit.log | audit2allow -m nginx

# generate the rules to be applied
grep nginx /var/log/audit/audit.log | audit2allow -M nginx

# apply the rules
semodule -i nginx.pp

Then restart NGINX.

You may need to generate the rules multiple times (likely four times to fix all policies), trying to access the site after each pass, since the first selinux error might not be the only one that can be generated.

Further readings

uWSGI

uWSGI (https://uwsgi-docs.readthedocs.io/en/latest) is used as interface between the web server (i.e. NGINX) and the web application (i.e. Galaxy). Using uWSGI for production servers is recommended by the Galaxy team: https://galaxyproject.org/admin/config/performance/scaling/

uWSGI configuration is embedded in the galaxy.ini file ($HOME/galaxy/config/galaxy.ini), with 4 handler configuration. By defalut the number of processes (i.e. uWSGI workers is set to number_of_virtual_cpus - 1. This configuration should be fine for most uses. Nevertheless, there’s no golden rule to define the workers number. It is up to the end-user to configure it dependig on your needs. The same goes for the number of job handlers (4 by default).

UWSGI socket and stats server are, by default, listening on 127.0.0.1:4001 and 127.0.0.1:9191, respectively. More on the uWSGI stats server here: http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html?highlight=stats%20server.

UWSGI Galaxy Configuration:

[uwsgi]
master = True
processes = 1
socket = 127.0.0.1:4001
stats = 127.0.0.1:9191
pythonpath = /home/galaxy/galaxy/lib
pythonhome = /home/galaxy/galaxy/.venv
threads = 4
logto = /var/log/galaxy/uwsgi.log

# Job Handler(s)

[server:handler0]
use = egg:Paste@http
port = 8090
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler1]
use = egg:Paste@http
port = 8091
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler2]
use = egg:Paste@http
port = 8092
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

[server:handler3]
use = egg:Paste@http
port = 8093
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

Proftpd

To allow user to upload files (> 2GB) through FTP, Proftpd is installed and configured on each Galaxy server, according to: https://galaxyproject.org/admin/config/upload-via-ftp/

Proftpd configuration file is located at /etc/proftdp.conf on CentOS and /etc/proftpd/proftpd.conf on Ubuntu.

To grant a user access to read emails and passwords from the Galaxy database, a separate user is created for the FTP server which has permission to SELECT from the galaxy_user table and nothing else.

Proftpd is listening on port 21. FTP protocol is not encrypted by default, thus any usernames and passwords are sent over clear text to Galaxy.

How to use FTP through FileZilla

You need to disable Passive (PASV) mode in FileZilla, since we are not going to open all passive ports.

  1. Open FileZilla.
  2. Click on Edit | Settings.
  3. Open Connection menu on the left. Click on FTP menu.
  4. Mark the Active radio button.
  5. Click OK.

How to use FTP through command line

To install FTP command line client, type sudo yum install ftp on CentOS or sudo apt-get install ftp on Ubuntu.

To establish a connection with Glaxy Proftpd server, you can use your Galaxy username and password, in addition to the server IP address you’re connecting to (e.g. 90.147.102.82). To open a connection in Terminal type the following command, replacing the IP address with with your server IP address:

$ ftp 90.147.102.82
Connected to 90.147.102.82.
220 ProFTPD 1.3.5e Server (galaxy ftp server) [::ffff:90.147.102.82]
Name (90.147.102.82:marco):

Then login with your Galaxy credentials, typing your Galaxy e-mail address and password:

$ ftp 90.147.102.82
Connected to 90.147.102.82.
220 ProFTPD 1.3.5e Server (galaxy ftp server) [::ffff:90.147.102.82]
Name (90.147.102.82:marco): ma.tangaro@gmail.com
331 Password required for ma.tangaro@gmail.com
Password:

To upload file to your Galaxy remote directory:

ftp> put Sc_IP.fastq
local: Sc_IP.fastq remote: Sc_IP.fastq
229 Entering Extended Passive Mode (|||30023|)
150 Opening BINARY mode data connection for Sc_IP.fastq
8% |******                                                                           | 12544 KiB   23.84 KiB/s  1:31:23 ETA

Then you will find it on Galaxy:

ftp fasta file copy

Here’s a list of the basic commands that you can use with the FTP client.

Command Description
ls to find out the pathname of the current directory on the remote machine.
cd to change directory on the remote machine.
pwd to find out the pathname of the current directory on the remote machine.
delete to delete (remove) a file in the current remote directory (same as rm in UNIX).
mkdir to make a new directory within the current remote directory.
rmdir to to remove (delete) a directory in the current remote directory.
get to copy one file from the remote machine to the local machine

get ABC DEF copies file ABC in the current remote directory to (or on top of) a file named DEF in your current local directory.

get ABC copies file ABC in the current remote directory to (or on top of) a file with the same name, ABC, in your current local directory.
mget to copy multiple files from the remote machine to the local machine; you are prompted for a y/n answer before transferring each file.
put to copy one file from the local machine to the remote machine.
mput o copy multiple files from the local machine to the remote machine; you are prompted for a y/n answer before transferring each file.
quit to exit the FTP environment (same as bye).

Supervisord

Supervisor is a process manager written in Python, which allows its users to monitor and control processes on UNIX-like operating systems. It includes:

  1. Supervisord daemon (privileged or unprivileged);
  2. Supervisorctl command line interface;
  3. INI config format;
  4. [program:x] defines a program to control.

Supervisord requires root privileges to run.

Galaxy supervisord configuration is located here: https://docs.galaxyproject.org/en/master/admin/framework_dependencies.html?highlight=uwsgi#supervisor

and here: https://galaxyproject.github.io/dagobah-training/2016-saltlakecity/002a-systemd-supervisor/systemd-supervisor.html#1

A configuration running the Galaxy server under uWSGI has been installed on /etc/supervisord.d/galaxy_web.ini on CentOS, while it is located on /etc/supervisor/conf.d/galaxy.conf on Ubuntu. The options stopasgroup = true and killasgroup = true ensure that the SIGINT signal, to shutdown Galaxy, is propagated to all uWSGI child processes (i.e. to all uWSGI workers).

PYTHONPATH is not specified in this configuration since it was conflicting with Conda running.

To manage Galaxy through supervisord:

Action Command
Start Galaxy sudo supervisorctl start galaxy:
Stop Galaxy sudo supervisorctl stop galaxy:
Restart Galaxy sudo supervisorctl restart galaxy:
Galaxy status sudo supervisorctl status galaxy:
$ supervisorctl help

default commands (type help <topic>):
=====================================
add    clear  fg        open  quit    remove  restart   start   stop  update
avail  exit   maintail  pid   reload  reread  shutdown  status  tail  version
$ sudo supervisorctl status galaxy:
galaxy:galaxy_web                RUNNING   pid 9030, uptime 2 days, 21:19:28
galaxy:handler0                  RUNNING   pid 9031, uptime 2 days, 21:19:28
galaxy:handler1                  RUNNING   pid 9041, uptime 2 days, 21:19:27
galaxy:handler2                  RUNNING   pid 9046, uptime 2 days, 21:19:26
galaxy:handler3                  RUNNING   pid 9055, uptime 2 days, 21:19:25

galaxy_web.ini file configuration:

[program:galaxy_web]
command         = /home/galaxy/galaxy/.venv/bin/uwsgi --virtualenv /home/galaxy/galaxy/.venv --ini-paste /home/galaxy/galaxy/config/galaxy.ini --pidfile /var/log/galaxy/uwsgi.pid
directory       = /home/galaxy/galaxy
umask           = 022
autostart       = true
autorestart     = true
startsecs       = 20
user            = galaxy
environment     = PATH="/home/galaxy/galaxy/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
numprocs        = 1
stopsignal      = INT
startretries    = 15
stopasgroup     = true
killasgroup     = true

[program:handler]
command         = /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler%(process_num)s --log-file=/var/log/galaxy/handler%(process_num)s.log
directory       = /home/galaxy/galaxy
process_name    = handler%(process_num)s
numprocs        = 4
umask           = 022
autostart       = true
autorestart     = true
startsecs       = 20
user            = galaxy
startretries    = 15

[group:galaxy]
programs = handler, galaxy_web

Finally, a systemd script has been installed to start/stop Supervisord on /etc/systemd/system/supervisord.service.

Action Command
Start sudo systemctl start supervisord.service
Stop sudo systemctl stop supervisord.service
Restart sudo systemctl restart supervisord.service
Status sudo systemctl status supervisord.service
$ sudo systemctl status supervisord.service
  ● supervisord.service - Supervisor process control system for UNIX
   Loaded: loaded (/etc/systemd/system/supervisord.service; disabled; vendor preset: disabled)
   Active: active (running) since Sat 2017-08-12 08:48:33 UTC; 9s ago
     Docs: http://supervisord.org
 Main PID: 12204 (supervisord)
   CGroup: /system.slice/supervisord.service
           ├─12204 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisord.conf
           ├─12207 /home/galaxy/galaxy/.venv/bin/uwsgi --virtualenv /home/galaxy/galaxy/.venv --ini-paste /home/galaxy/galaxy/config/galaxy.ini --pidfile /var/log/galaxy/uwsgi.pid
           ├─12208 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler0 --log-file=/var/log/galaxy/handler0.log
           ├─12209 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler1 --log-file=/var/log/galaxy/handler1.log
           ├─12210 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler2 --log-file=/var/log/galaxy/handler2.log
           └─12211 /home/galaxy/galaxy/.venv/bin/python ./lib/galaxy/main.py -c /home/galaxy/galaxy/config/galaxy.ini --server-name=handler3 --log-file=/var/log/galaxy/handler3.log

Aug 12 08:48:33 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:33,805 CRIT Supervisor running as root (no user in config file)
Aug 12 08:48:33 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:33,805 WARN Included extra file "/etc/supervisord.d/galaxy_web.ini" during parsing
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,564 INFO RPC interface 'supervisor' initialized
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,564 CRIT Server 'unix_http_server' running without any HTTP authentication checking
Aug 12 08:48:34 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:34,565 INFO supervisord started with pid 12204
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,569 INFO spawned: 'galaxy_web' with pid 12207
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,573 INFO spawned: 'handler0' with pid 12208
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,576 INFO spawned: 'handler1' with pid 12209
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,581 INFO spawned: 'handler2' with pid 12210
Aug 12 08:48:35 galaxy-indigo-test supervisord[12204]: 2017-08-12 08:48:35,584 INFO spawned: 'handler3' with pid 12211

Galaxy init scripts

Systemctl is the command line interface to systemd:

systemctl <start|stop|restart|...> <name>[.service]
systemctl <enable|disable> <name>[.service]

Since CentOS and Ubuntu Xenial 16.04 exploits systemd as init system, the Galaxy init script is located in /etc/systemd/system/galaxy.service.

Action Command
Start sudo systemctl start galaxy.service
Stop sudo systemctl stop galaxy.service
Restart sudo systemctl restart galaxy.service
Status sudo systemctl status galaxy.service

Ubuntu Trusty 14.04 exploits Upstart as init system. Galaxy init file is located in /etc/init.d/galaxy.

Action Command
Start sudo service galaxy start
Stop sudo service galaxy stop
Restart sudo service galaxy restart
Status sudo service galaxy status

Galaxy instance isolation

User data are automatically stored to the “/export” directory, where an external (standard block storage) volume is mounted.

All Galaxy job results are stored in this directory through galaxy.ini configuration file. For instance, files directory is located:

# Dataset files are stored in this directory.
file_path = /export/galaxy/database/files

while the job working directory is located:

# Each job is given a unique empty directory as its current working directory.
# This option defines in what parent directory those directories will be
# created.
job_working_directory = /export/job_work_dir

Here is the list of Galaxy database path directories:

file_path = /export/galaxy/database/files
job_working_directory = /export/job_work_dir
new_file_path = /export/galaxy/database/tmp
template_cache_path = /export/galaxy/database/compiled_templates
citation_cache_data_dir = /export/galaxy/database/citations/data
citation_cache_lock_dir = /export/galaxy/database/citations/lock
whoosh_index_dir = /export/galaxy/database/whoosh_indexes
object_store_cache_path = /export/galaxy/database/object_store_cache
cluster_file_directory = /export/galaxy/database/pbs"
ftp_upload_dir = /export/galaxy/database/ftp

The service offers the possibility to store data exploitng file system encryption or Onedata.

Using file system encryption each Galaxy instance can be deployed as an insulated environment, i.e. data are isolated from any other instance on the same platform and from the cloud service administrators, opening to the adoption of Galaxy based cloud solutions even within clinical environments. Data privacy is granted through LUKS file system encryption: users will be required to insert a password to encrypt/decrypt data directly on the virtual instance during its deployment, avoiding any interaction with the cloud administrator(s). The encryption procedure is described on Storage encryption procedure.

Alternatively, Users’ data access rights will be controlled through the OneData INDIGO component: Onedata (beta)

Storage encryption

While the adoption of a distributed environment for data analysis makes data difficult to be tracked and identified by a malevolus attacker, full data anonymity and isolation is still not granted.

For this reason, we have introduced the possibility for the user to isolate data through standard filesystem encryption, using the Linux device mapper encryption module, dm-crypt as encryption backend, cryptsetup, which is a command line interface to dm-crypt and LUKS (Linux Unified Key Setup) as encryption mode.

Disk encryption ensures that files are stored on disk in an encrypted form: the files only become available to the operating system and applications in readable when the volume is unlocked by a trusted user. The adopted block device encryption method, operates below the filesystem layer and ensures that everything is written to the block device (i.e. the external volume) is encrypted.

Dm-crypt is the standard device-mapper encryption functionality provided by the Linux kernel. Dm-crypt management is entrusted to the cryptsetup userspace utility, using LUKS as default block-device encryption method. LUKS, is an additional layer which stores all the needed setup information for dm-ctypt on the disk itself, abstracts partition and key management in an attempt to improve ease of use and cryptographic security

To automatically setup LUKS volumes on your Galaxy instances a bash script, named fast-luks has been created. The script has been integrated in the Galaxy instantiation procedure: if the File System Encryption option is selected through the dialogue window the users will be required to insert a password to encrypt/decrypt data on the virtual instance during its deployment, avoiding any interaction with the cloud administrator(s). For more details see: Fast-luks script.

Note

During the ancryption procedure an e-mail is sent to the Galaxy Instance Administrator, with the Virtual Machine IP address and a detailed description of the procedure to inject the encryption password in the system. The Galaxy installation procedure is paused until the password is correctly set (after 5 hours the system will return an error).

Warning

The system does not store your keys on its servers and cannot access your protected data unless you provide the key. This also means that if you forget or lose your key, there is no way to recover the key or to recover any data encrypted with the lost key.

Default configuration

Fast-luks, by default adopt xts-aes-plain64 cipher with 256 bit keys ans sha256 hashing algorithm.

Once the LUKS partition is created, it is unlocked.

The unlocking process will map the partition to a new device name using the device mapper. This alerts the kernel that device is actually an encrypted device and should be addressed through LUKS using the /dev/mapper/<cryptdev_name> so as not to overwrite the encrypted data. cryptdev_name is random generated to avoid accidental overwriting.

The volume is mounted, by default, on /export, with standard ext4 filesystem and Galaxy is configured to store here datasets.

Defaults values
# Defaults
cipher_algorithm='aes-xts-plain64'
keysize='256'
hash_algorithm='sha256'
device='/dev/vdb'
cryptdev='crypt'
mountpoint='/export'
filesystem='ext4'

Password choice

During the encryption procedure ( Storage encryption procedure), users are required to configure their encryption password. You must provide at least an alphanumeric 8-character long password. A random generated password is generated by the script, ONLY FOR EXAMPLE!

During the encryption procedure, the password will be required three times.

The script will query for passwords twice, to verify it. Then, you have to unlock your device, by inserting your password.

The procedure is described here: ( Storage encryption procedure).

Luksctl

During the encryption procedure ( Storage encryption procedure), fast-luks creates a configuration ini file, allowing Galaxy administrator to easily mange LUKS devices, through the luksctl script (Luksctl: LUKS volumes management).

By default the file is stored in /etc/galaxy/luks-cryptdev.ini.

The script requires superuser rights.

Action Command Description
Open sudo luksctl open Open the encrypted device, requiring your passphrase.
Close sudo luksctl close Close the encrypted device
Status sudo luksctl status Check device status using dm-setup

Galaxyctl can be used to parse luksctl commands:

Action Command Description
Open LUKS volume sudo galaxyctl open luks Open the encrypted device, requiring your passphrase.
Close LUKS volume sudo galaxyctl close luks Close the encrypted device
Check LUKS volume sudo galaxyctl status luks Check device status using dm-setup

Fast-luks script

The fast-luks script is located in /usr/local/bin/fast-luks.

It parse common cryptsetup parameters to encrypt the volume. For this reason it checks for cryptsetup and dm-setup packages and it install cryptsetup, if not installed.

Typing sudo fast-luks the script will load defaults parameters and will LUKS format /dev/vdb device, otherwise different parameters can be specified.

NB: Run as root.

Argument Defaults Description
-c, --cipher aes-xts-plain64 Set cipher specification string.
-k, --keysize 256 Set key size in bits.
-a, --hash_algorithm sha256 For luksFormat action specifies hash usedin LUKS key setup scheme and volumekey digest.
-d, --device /dev/vdb Set device to be mounted
-e, --cryptdev crypt Sets up a mapping <name> after successfulverification of the supplied key(via prompting).
-m, --mountpoint /export Set mount point
-f, --filesystem ext4 Set filesystem
$ sudo fast-luks --help
=========================================================
                      ELIXIR-Italy
               Filesystem encryption script

A password with at least 8 alphanumeric string is needed
There's no way to recover your password.
Example (automatic random generated passphrase):
                      PcHhaWx4

You will be required to insert your password 3 times:
  1. Enter passphrase
  2. Verify passphrase
  3. Unlock your volume

The connection will be  automatically closed.

=========================================================

fast-luks: a bash script to automate LUKS file system encryption.
 usage: fast-luks [-h]

 optionals argumets:
 -h, --help           show this help text
 -c, --cipher                 set cipher algorithm [default: aes-xts-plain64]
 -k, --keysize                set key size [default: 256]
 -a, --hash_algorithm         set hash algorithm used for key derivation
 -d, --device                 set device [default: /dev/vdb]
 -e, --cryptdev               set crypt device [default: cryptdev]
 -m, --mountpoint             set mount point [default: /export]
 -f, --filesystem             set filesystem [default: ext4]
 --default                    load default values

Cryptsetup howto

The cryptsetup action to set up a new dm-crypt device in LUKS encryption mode is luksFormat:

cryptsetup -v --cipher aes-xts-plain64 --key-size 256 --hash sha 256 --iter-time 2000 --use-urandom --verify-passphrase luksFormat crypt --batch-mode

where crypt is the new device located to /dev/mapper/crypt.

To open and mount to /export an encrypted device:

cryptsetup luksOpen /dev/vdb crypt

mount /dev/mapper/crypt /export

To show LUKS device info:

dmsetup info /dev/mapper/crypt

To umount and close an encrypted device:

umount /export

cryptsetup close crypt

To force LUKS volume removal:

dmsetup remove /dev/mapper/crypt

..Note:

NB: Run as root.

Change LUKS password

LUKS provides 8 slots for passwords or key files. First, check, which of them are used:

cryptsetup luksDump /dev/<device> | grep Slot

where the output, for example, looks like:

Key Slot 0: ENABLED
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

Then you can add, change or delete chosen keys:

cryptsetup luksAddKey /dev/<device> (/path/to/<additionalkeyfile>)

cryptsetup luksChangeKey /dev/<device> -S 6

As for deleting keys, you have 2 options:

  1. delete any key that matches your entered password:

    cryptsetup luksRemoveKey /dev/<device>
    
  2. delete a key in specified slot:

    cryptsetup luksKillSlot /dev/<device> 6
    

Storage encryption procedure

To encrypt the Virtual machine external volume follow this procedure.

Virtual Machine login

Log-in into your machine with:

ssh -i your_private_ssh_key.key galaxy@virtual.machine.ip.address

encrypt procedure 01

Typical IP addresses are: 90.147.170.xx, 90.147.102.xx or 90.147.75.xx and it is reported in the e-mail we sent you. You can copy and past the command from the mail the system send you.

encrypt procedure 02

Probably, you have to permanently accept the connection, typing “yes”.

encrypt procedure 03

and then enter your SSH passphrase.

encrypt procedure 04

Passphrase creation

You will be now prompted in the encryption script automatically. You will be required to insert an alphanumeric key, at least 8 characters. A key is automatically generated, as example, plase do not use if for production!

You have to type your password three times:

  1. inject your password
  2. Confirm your password
  3. Unlock your encrypted volume

Insert your volume encrypt/decrypt password for the first time:

encrypt procedure 05

and confirm it:

encrypt procedure 06

If the passphrases don’t match, restart the procedure.

Unlock the volume

Unlock the encrypted volume typing again your password:

encrypt procedure 07

The volume will be now encrypted and you will be automatically log-out the VM, until Galaxy is installed.

encrypt procedure 08

File System Encryption Test

Test executed to ensure LUKS volume encryption.

  1. Create two volumes, here named vol1, vol2.

  2. Attach each one to the instance (here listed as /dev/vdd and /dev/vde) and mount them respectively to /export and /export1.

    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    ...
    /dev/vdd        976M  2.6M  907M   1% /export
    /dev/vde        976M  2.6M  907M   1% /export1
    
  3. Encrypt /export, i.e. /dev/vdd using fast_luks (/export is the default value).

    $ df -h
    Filesystem            Size  Used Avail Use% Mounted on
    ...
    /dev/vde              976M  2.6M  907M   1% /export1
    /dev/mapper/jtedehex  990M  2.6M  921M   1% /export
    

    Ensure that /export has the same permissions of the other two volumes.

    drwxr-xr-x.   3 centos centos 4096 Nov  9 10:27 export
    drwxr-xr-x.   3 centos centos 4096 Nov  9 10:27 export1
    
  4. Put the same file on both volumes:

    $ echo "encryption test" > /export/test.txt
    $ echo "encryption test" > /export1/test.txt
    
  5. Umount all the volumes and luksClose the encrypted one:

    $ sudo cryptsetup luksClose /dev/mapper/jtedehex
    
  6. Create the volume binary image using dd:

    sudo dd if=/dev/vdd of=/home/centos/vdd_out
    2097152+0 records in
    2097152+0 records out
    1073741824 bytes (1.1 GB) copied, 21.809 s, 49.2 MB/s
    
    $ sudo dd if=/dev/vde of=/home/centos/vde_out
    2097152+0 records in
    2097152+0 records out
    1073741824 bytes (1.1 GB) copied, 21.3385 s, 50.3 MB/s
    
  7. HexDump the binary image with xdd:

    $ xxd vdd_out > vdd.txt
    
    $ xxd vde_out > vde.txt
    

    As output you should have:

    $ ls -ltrh
    -rw-r--r--.  1 root   root   1.0G Nov  9 11:19 vdd_out
    -rw-r--r--.  1 root   root   1.0G Nov  9 11:22 vde_out
    -rw-rw-r--.  1 centos centos 4.2G Nov  9 11:32 vdd.txt
    -rw-rw-r--.  1 centos centos 4.2G Nov  9 11:36 vde.txt
    
  8. Grep non-zero bytes and search for the test.txt file content encryption test:

    $ grep -v "0000 0000 0000 0000 0000 0000 0000 0000" vde.txt > grep_vde.txt
    $ grep "encryption test" grep_vde.txt
    8081000: 656e 6372 7970 7469 6f6e 2074 6573 740a  encryption test.
    
    $ grep -v "0000 0000 0000 0000 0000 0000 0000 0000" vdd.txt > grep_vdd.txt
    $ grep "encryption test" grep_vdd.txt
    $
    

    Note

    It is possible to see the test.txt file content only on the un-encrypted volume.

    Moreover, the output file grep_vde.txt is 73 kb while the encrypted one, grep_vdd.txt (138 MB), is very large:

    -rw-rw-r--.  1 centos centos  73K Nov  9 11:46 grep_vde.txt
    -rw-rw-r--.  1 centos centos 138M Nov  9 11:58 grep_vdd.txt
    

We also tried to open the volume when active (LUKS volume opened and mounted, Galaxy running) in the Virtual Machine, using the cloud controller (as administrator).

Test executed on the cloud controller:

# rbd map volume-3bedc7bc-eaed-466f-9d55-f2c29b44a7b2 --pool volumes
/dev/rbd0

# lsblk -f
NAME          FSTYPE      LABEL UUID                                   MOUNTPOINT
sda
|-sda1        ext4              db06fc46-7231-4189-ba2b-0b0117049680   /boot
|-sda2
|-sda5        swap              e5b98538-8337-4e25-8f82-f97f04258716   [SWAP]
`-sda6        LVM2_member       n4SAgY-GRNy-4Fl2-ROoQ-rRIf-bdBP-QC1B6s
  `-vg00-root ext4              1e3f1ff1-8677-4236-8cb4-07d5cad32441   /
rbd0          crypto_LUKS       c4bee3b9-e0dc-438e-87ae-2a3e491081c0

# mount /dev/rbd0 /mnt/
mount: unknown filesystem type ‘crypto_LUKS’

It is not possible to mount the volume without the user password.

Onedata (beta)

Transparent access to the storage resources through token management (Onedata). Each Galaxy instance uses of OneData spaces to store data.

Get Onedata space support

Galaxy ShedTools

Each Galaxy instance is customizable, through the web front-end, with different sets of pre installed tools (e.g. SAMtools, BamTools, Bowtie, MACS, RSEM, etc…), exploiting CONDA as default dependency resolver.

New Tools automatically installed using the official GalaxyProject python library Ephemeris.

Current possible pre-sets:

  1. galaxy-epigen: Galaxy ready for NGS analysis.
  2. galaxy-rna-workbench: Galaxy ready for RNA Sequencing analysis (original galaxy flavor here).
  3. galaxy-testing: Galaxy test recipe.

The corresponding recipe on github is downloaded and processed. It is possible to easily add new flavors, just adding new recipes on github.

Create and test Galaxy flavors

For each tool you want to install, you must provide tool name and owner and one between tool_panel_section_id and tool_panel_section_label.

Keys Required Default value Description
name yes   This is is the name of the tool to install
owner yes   Owner of the Tool Shed repository from where the tools is being installed
tool_panel_section_id yes, if tool_panel_section_label not specified   ID of the tool panel section where you want the tool to be installed. The section ID can be found in Galaxy’s shed_tool_conf.xml config file. Note that the specified section must exist in this file. Otherwise, the tool will be installed outside any section.
tool_panel_section_label yes, if tool_panel_section_id not specified   Display label of a tool panel section where you want the tool to be installed. If it does not exist, this section will be created on the target Galaxy instance (note that this is different than when using the ID). Multi-word labels need to be placed in quotes. Each label will have a corresponding ID created; the ID will be an all lowercase version of the label, with multiple words joined with underscores (e.g., ‘BED tools’ -> ‘bed_tools’).
tool_shed_url   https://toolshed.g2.bx.psu.edu) The URL of the Tool Shed from where the tool should be installed.
revisions   latest A list of revisions of the tool, all of which will attempt to be installed.
install_tool_dependencies   True True or False - whether to install tool dependencies or not.
install_repository_dependencies   True True or False - whether to install repo dependencies or not, using classic toolshed packages

For instance, this is the galaxy TESTING recipe:

---
api_key: <Admin user API key from galaxy_instance>
galaxy_instance: <Galaxy instance IP>

tools:
- name: fastqc
  owner: devteam
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False

- name: 'bowtie_wrappers'
  owner: 'devteam'
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False

Conda support

Conda is a package manager like apt-get, yum, pip, brew or guix and it is, currently, used as default dependency resolver in Galaxy.

Reference Data

Many Galaxy tools rely on the presence of reference data, such as alignment indexes or reference genome sequences, to efficiently work. A complete set of Reference Data, able to work with most common tools for NGS analysis is available for each Galaxy instance deployed.

Available reference data

Reference data (e.g. genomic sequences) are available for many species and shared among all the instances, avoiding unnecessary and costly data duplication, exploiting the CernVM-FS filesystem or Onedata.

Until now, Galaxy administrators have been responsible for downloading, building and installing these important reference data. For example, to make the UCSC hg19 build of the human reference genome available to the Burrows-Wheeler Aligner (BWA) short-read mapper (Li and Durbin, 2009), a Galaxy administrator would need to (i) download the reference genome FASTA file, (ii) make it available as a reference genome via the ‘all_fasta’ table (optional), (iii) build BWA alignment indexes via proper command-line calls, (iv) register the location and availability of the indexes within the ‘bwa_indexes’ data table (by adding an additional entry to the tool-data/bwa_index.loc file on disk) and (v) finally, restart the Galaxy server.

A complete list of the reference data, with download link, is available here.

Arabidopis thaliana (TAIR9)
Arabidopis thaliana (TAIR10)
Drosophila melanogaster (dm3)
Homo sapiens (hg18)
Homo sapiens (hg19)
Homo sapiens (hg38)
Mus musculus (mm9)
Mus musculus (mm10)
Saccharomyces cerevisiae (sacCer3)
Reference data indexes

Reference data indexes available for bowite

Homo Sapiens - hg19

Available reference index Details
bowtie  
bowtie2  
bwa  
liftover  
rsem created with rsem v1.3.0, table and gtf file from ucsc (download date 11 March 2018)
.fa  
.2bit  
.gtf download date: 11 March 2018

CernVM-FS reference data

The CernVM-File System (conversely cvmfs) provides a scalable, reliable and low- maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide- distributed computing infrastructure used to run data processing applications.

CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). The reference data Files and directories are hosted on standard web servers and mounted on /refdata directory:

$ ls -l /refdata/elixir-italy.galaxy.refdata/
total 60
drwxr-xr-x. 5 cvmfs cvmfs 4096 May 21 20:10 at10
drwxr-xr-x. 5 cvmfs cvmfs 4096 May 21 20:10 at9
drwxr-xr-x. 3 cvmfs cvmfs 4096 May 21 20:10 dm2
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:11 dm3
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:15 hg18
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 18:36 hg19
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:18 hg38
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:22 mm10
drwxr-xr-x. 3 cvmfs cvmfs 4096 May 21 20:22 mm8
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:25 mm9
-rw-r--r--. 1 cvmfs cvmfs   57 May 21 18:31 new_repository
drwxr-xr-x. 3 cvmfs cvmfs 4096 May 21 20:25 sacCer1
drwxr-xr-x. 3 cvmfs cvmfs 4096 May 21 20:25 sacCer2
drwxr-xr-x. 7 cvmfs cvmfs 4096 May 21 20:25 sacCer3
-rw-r--r--. 1 cvmfs cvmfs    0 May 21 18:31 test-content

Cvmfs client setup

Cvmfs is installed by default on each Galaxy instance (CentOS 7 or Ubuntu 16.04) if this is the reference data configuration provided by your service provider.

The elixir-italy.galaxy.refdata.pub public key is installed in /etc/cvmfs/keys/. The /etc/cvmfs/default.local file is also already configured.

The cvmfs_config probe command mount the cvmfs volume to /cvmfs, therefore mount command is used to correctly mount the cvmfs volume to /refdata.

Description Command
check configuration cvmfs_config chksetup
mount volume mount -t cvmfs elixir-italy.galaxy.refdata /refdata/elixir-italy.galaxy.refdata
umount volume cvmfs_config umount elixir-italy.galaxy.refdata
reload repository cvmfs_config reload elixir-italy.galaxy.refdata

Note

If mount fails, try to restart autofs with sudo service autofs restart.

Note

Cvmfs commands require root privileges

Cvmfs mount output:

$ sudo mount -t cvmfs elixir-italy.galaxy.refdata /refdata/elixir-italy.galaxy.refdata
CernVM-FS: running with credentials 994:990
CernVM-FS: loading Fuse module... done

$ ls /refdata/elixir-italy.galaxy.refdata/
at10  at9  dm2  dm3  hg18  hg19  hg38  mm10  mm8  mm9  new_repository  sacCer1  sacCer2  sacCer3  test-content

Cvmfs server location

Current cvmfs server configuration:

Reference data cvmfs Details
cvmfs repository name elixir-italy.galaxy.refdata
cvmfs server url 90.147.102.186
cvmfs key file elixir-italy.galaxy.refdata.pub
cvmfs proxy url DIRECT

Troubleshooting

Cvmfs not running, e.g. after reboot:

$ sudo mount -t cvmfs elixir-italy.galaxy.refdata /refdata/elixir-italy.galaxy.refdata
CernVM-FS: running with credentials 994:990
CernVM-FS: loading Fuse module... Failed to initialize root file catalog (16 - file catalog failure)

A reload of the config is able to fix the problem: https://wiki.chipp.ch/twiki/bin/view/CmsTier3/IssueCvmfsFailsToMount

$ sudo cvmfs_config reload elixir-italy.galaxy.refdata
Connecting to CernVM-FS loader... done
Entering maintenance mode
Draining out kernel caches (60s)
Blocking new file system calls
Waiting for active file system calls
Saving inode tracker
Saving chunk tables
Saving inode generation
Saving open files counter
Unloading Fuse module
Re-Loading Fuse module
Restoring inode tracker...  done
Restoring chunk tables...  done
Restoring inode generation...  done
Restoring open files counter...  done
Releasing saved glue buffer
Releasing chunk tables
Releasing saved inode generation info
Releasing open files counter
Activating Fuse module

Cvmfs server details

Since, cvmfs relies on OverlayFS or AUFS as default storage driver and Ubuntu 16.04 natively supports OverlayFS, it is used as default choice to create and populate the cvmfs server.

A resign script is located in /usr/local/bin/Cvmfs-stratum0-resign and the corresponding weekly cron job is set to /etc/cron.d/cvmfs_server_resign.

Log file is located in /var/log/Cvmfs-stratum0-resign.log.

Onedata reference data (beta)

To Be Updated

Reference data local download

The reference data set can be downloaded on your machine. This option is not explicitly available, by default, on the service web intereface to avoid unuseful replication and costly virtual space consumption.

Nevertheless, it is still possible to download them through ansible and automatically configure galaxy to use them: Galaxycloud-refdata.

Cluster support (SLURM)

The service provides support for virtual clusters through a dedicated section of the web front-end and allows to instantiate Galaxy with SLURM (slurm.schedmd.com) as Resource Manager and to customize the number of virtual nodes, nodes and master virtual hardware.

Automatic elasticity, provided using the CLUES INDIGO service component (INDIGO CLUES), enables dynamic cluster resources scaling, deploying and powering on new working nodes depending on the workload of the cluster and powering-off them when no longer needed. This provides an efficient use of the resources, making them available only when really needed.

Each node is configured according to the Galaxy tools installed on the VM as selected by the user during the configuration phase. All tools pre-set are tested to work with the galaxy elastic cluster.

Conda packages used to solve Galaxy tools dependencies are stored in /export/_conda directory and shared between front and worker nodes. This

The service is scalable and both users and service providers can chose among a full range of different computational capabilities: from limited ones to serve e.g. small research groups, Galaxy developers or for didactic and training purposes, to instances with elasticity cluster support to deliver enough computational power for any kind of bioinformatic analysis and number of users, opening the route for the migration of public Galaxy instances to this service.

Shared file system

Current cluster configuration foresee two path shared between front and worker nodes:

  1. /home where galaxy is installed.
  2. /export where galaxy input and output datasets are hosted.

Note

NFS exports configuration file: /etc/exports

Nodes deployment

Warning

Worker node deployment takes 10 minutes! Then your job will run. If the node is already deployed the job starts immediately.

This is due to:

  1. VM configuration
  2. Tools dependencies installation
  3. CernVM-FS configuration
  4. SLURM installation and configuration

SLURM main commands

Please have a look here for a summary of SLURM main commands: https://www.rc.fas.harvard.edu/resources/documentation/convenient-slurm-commands/

Authentication

Currently, the authentication system relies on INDIGO-AAI.

To login into the portal, select the Sign in section on top-right:

FGW Sign In

Registration

It is needed to register to the portal at the first login. Register with your preferred username or using Google authentication.

Select login method

Fill the registration form using a valid e-mail address:

registration form

and accept the usage policy to complete the registration:

policy form

A confirmation e-mail is the sent your e-mail address:

confirmation alert

You don’t need to answer to this mail, just follow the instructions, going to the link in the e-mail.

confirmation mail
success alert

Once confirmed, your request has to be approved by the site administrators. This usually does not require too much time.

Once your request is approved, you will be notified by mail and asked to insert your password.

success mail
change password
change password success

Finally at the first login you have to allow the Laniakea portal to acquire your login information:

mitre authorization

Login

To login into the portal, select the Sign in section on top-right:

FGW Sign In

Then insert your credentials or login using another authentication provider, you used during the registratrion procedure, like Google.

Select login method

Finally, you can access the portal sections and instantiate Galaxy:

FGW portal

Validation

Warning

Validation package for automatic tools test is under development and it is not currently available.

Testing module for Galaxy workflows: https://github.com/phnmnl/wft4galaxy

Galaxyctl libraries

Galaxyctl is a python script collection for Galaxy management (first start, stop/start/restart/status). Moreover it is possible to manage, through specific script, LUKS volumes and Onedata spaces.

Galaxyctl requires superuser privileges.

Current version: 0.0.1

Script Description
galaxyctl_libs Python libraries for uWSGI socket and stats server management, LUKS volume and Onedata space management.
galaxyctl Galaxy management script. It integrates Luksctl and Onedatactl commands.
luksctl LUKS volume script management
onedatactl Onedata spaces for user data and reference data management.

Galaxyctl_libs is composed by several modules.

Dependencies

Galaxyctl_libs depends on uWSGI for Galaxy management (i.e. currently no run.sh support). Moreover lsof is needed to check listening ports.

uwsgi

lsof

DetectGalaxyCommands

Parse galaxy Stop/Start/Restart/Status commands. Currently it supports supervisord or systemd/upstart

UwsgiSocket

Get uWSGI socket from galaxy.ini config file (e.g. 127.0.0.1:4001) and using lsof return uWSGI master PID.

master_pid, stderr, status = UwsgiSocket(fname='/home/galaxy/galaxy/config/galaxy.ini').get_uwsgi_master_pid()

UwsgiStatsServer

Read uWSGI stats server json. The stats server is the last software which uWSGI run during galaxy start procedure. When the stats server is ready, galaxy is ready to accept requests. Stats server address and port can be specified, but the class is able to read galaxy.ini file to recover stats informations. Reading Stats json the class is able to detect if uWSGI workers accept requests or not.

Inputs Description
server uWSGI stats server address, e.g. 127.0.0.1
port uUWSG stats server port, e.g. 9191
timeout Wait time, in seconds, for the Stats server start. If galaxy is starting, 300 seconds as timeout is ok, while if galaxy is already running 5 seconds are enough.
fname Galaxy config file, e.g. /home/galaxy/galaxy/config/galaxy.ini

GetUwsgiStatsServer

To connect to running uWSGI stats server call:

stats = UwsgiStatsServer(timeout=300, fname='/home/galaxy/galaxy/config/galaxy.ini)
socket = stats.GetUwsgiStatsServer()

GetUwsgiStatsServer

To check if at least one uWSGI workers accept requests, call:

stats = UwsgiStatsServer(timeout=300, fname='/home/galaxy/galaxy/config/galaxy.ini)
status = stats.GetUwsgiStatsServer('/home/galaxy/galaxy/config/galaxy,ini')

GetBusyList

To get the list of busy uWSGI workers:

stats = UwsgiStatsServer(timeout=5, fname='/home/galaxy/galaxy/config/galaxy.ini)
busy_list = stats.GetBusyList()

LUKSCtl

Read LUKS ini file, usually stored on /etc/galaxy/luks-cryptdev.ini, for LUKS volume management. Open, Close and Status commands are managed through luksctl script.

OneDataCtl

Reads Onedata ini file, usually stored on /etc/galaxy/onedatactl.ini, for Onedata space management: both user data and reference data.

mount_space

To mount onedata space (userdata or refdata), call:

onedata = OneDataCtl('/etc/galaxy/onedatactl.ini', 'userdata')
onedata.mount_space()

umount_space

To umount onedata space, call:

onedata = OneDataCtl('/etc/galaxy/onedatactl.ini', 'userdata')
onedata.umount_space()

Galaxyctl: Galaxy management

Galaxyctl is a python script collection used for Galaxy management. In particular it exploits galaxyctl_libs to properly check uWSGI Stats sto correctly retrieve Galaxy and uWSGI workers status. Moreover the script allow to manage, using the same command, luks volume and onedata spaces, parsing luksctl and onedatctl commands.

Since the script parse supervisorctl or systemd commands, it needs to be run as superuser.

Moudule Action Description
galaxy status Show galaxy status

stop Stop Galaxy. --force check uwsgi master process. If it is still running, after galaxy stop, it is killed.

start Start Galaxy. --force force galaxy to start by restarting it. --retry option allow to specify number of tentative retart (default 5). --timeout allow to customize uWSGI stats server wait time. These options are used during galaxy instantiation and you should not use them on production.

restart Restart Galaxy. --force force galaxy to start by restarting it. --retry option allow to specify number of tentative retart (default 5). --timeout allow to customize uWSGI stats server wait time. These options are used during galaxy instantiation and you should not use them on production.

startup This method is used only to run galaxy for the first time and you shoud not use it in production. --retry option allow to specify number of tentative retart (default 5). --timeout allow to customize uWSGI stats server wait time.
luks status Show LUKS volume status

open LUKSOpen volume

close LUKSClose volume
onedata status Show onedata space status. --userdata allow to check user data space if mounte through oneclient. --refdata allow to check reference data onedata space if mounted through oneclient.

mount Mount onedata space using oneclient. Use --userdata or --refdata to mount user data and reference data volumes.

umount Umount onedata space using furermount.

Galaxyctl basic usage

The script requires superuser commands to be used. Its basic commands are:

Action Command
Start Galaxy sudo galaxyctl start galaxy
Stop Galaxy sudo galaxyctl stop galaxy
Restart Galaxy sudo galaxyctl restart galaxy
Check Galaxy Status sudo galaxyctl status galaxy

Logging

Logs are stored in /var/log/galaxy/galaxyctl.log file.

Advanced options

stop

To stop galaxy:

sudo galaxyctl stop galaxy

The script check the uWSGI Stats server to retrieve workers PID and their status. If, after uWSGI stop, workers are still up and running, they are killed, allowing Galaxy to correctly start next time. The --force options allow to kill uwsgi master process if it is still alive after galaxy stop (in case of uwsgi FATAL error or ABNORMAL TERMINATION). Please check galaxy logs before run --force option.

start

To start Galaxy:

sudo galaxyctl start galaxy

Once Galaxy started, galaxyctl waits and check the uWSGI Stats server. Since it is the last software loaded, this ensure that Galaxy has correctly started. The script also check that at least 1 uWSGI worker has correctly started and it is accepting requests.

If no workers are available you have to restart Galaxy. Galaxyctl is able to automatically restart galaxy if the option --force is specified, restarting it until the workers are correctly loaded The number of retries is set, by default, to 5. It can be customized using --retry option, e.g. --retry 10. These options were not designed for production, but are used only during VMs instantiation phase to ensure Galaxy can correctly start.

restart

To restart Galaxy:

sudo galaxyctl restart galaxy

The options --force, --timeout and --retry are available for restart command too.

Galaxy first start

Galaxy takes longer to start the first time. Since the uWSGI stats server is the last software component started, the script waits it ensure that Galaxy has correctly started. Then uWSGI workers are checked to ensure Galaxy is acceptin requests. If not, uWSGI is restarted. Currently, before rise an error, the script try to restart galaxy 5 times, while the waiting time is set to 600 seconds. The command used in /usr/local/bin/galaxy-startup script, is

galaxyctl startup galaxy -c /home/galaxy/galaxy/galaxy.ini -t 600

LUKS module

By parsing luksctl script and using galaxyctl_libs, galaxyctl is able to manage LUKS volumes

Action Command
Open LUKS volume sudo galaxyctl open luks
Close LUKS volume sudo galaxyctl close luks
Check LUKS volume sudo galaxyctl status luks

In particular to unlock you LUKS volume:

sudo galaxyctl open luks

Then you will be asket to insert your LUKS passphrase. For instance:

(.venv) [galaxy@galaxy-indigo-test ~]$ sudo galaxyctl open luks
Enter passphrase for /dev/disk/by-uuid/42aaf979-6351-44e9-97ee-19e7f8c5e9f6:

Onedata module

By parsing onedatactl script and using galaxyctl_libs, galaxyctl is able to manage onedata user data and reference data spaces.

Data Action Command
User data mount sudo galaxyctl mount onedata –userdata

umount sudo galaxyctl umount onedata –userdata

status sudo galaxyctl status onedata –userdata
Refarence data mount sudo galaxyctl mount onedata –refdata

umount sudo galaxyctl umount onedata –userdata

status sudo galaxyctl status onedata –refdata

The options --userdata and --refdata are mutually exclusive.

Configuration files

Supervisord and systemd/upstart are supported to start/stop/restart/status Galaxy. The init system can be set using the variables init_system: two values are, currently, allowed: supervisord and init

init_system Explanation
supervisord Supervisord is current default, it is mandatory for docker container, since there’s no systemd on docker images.
init CentOS 7 and Ubuntu 16.04 use systemd, while Ubuntu 14.04 is using upstart.

Through galaxyctl_libs.DetectGalaxyCommands method the script automatically retrieve the right command to be used and it is compatible with both CentOS 7 and Ubuntu (14.04 and 16.04).

If Supervisord is used to manage Galaxy (which is our default choice), configuration files have to be specified using the variable supervisord_config_file On CentOS:

supervisord_conf_file = '/etc/supervisord.conf'

while on Ubuntu:

supervisord_conf_file = '/etc/supervisor/supervisord.conf'

Galaxyctl needs galaxy.ini to retrieve uWSGI stats server information, through the variable:

galaxy_config_file = '/home/galaxy/galaxy/config/galaxy.ini'

For LUKS volume configuration, the script reads our custom luks-cryptdev.ini file (stored in /etc/galaxy/ and needs luksctl script path (usually stored in /usr/local/bin) to load methods

luks_config_file = '/etc/galaxy/luks-cryptdev.ini'
luksctl_path = '/usr/local/bin'

Finally, for onedata spaces management, onedatactl.ini file (stored in /etc/galaxy) and onedatactl path (usually /usr/local/bin) are needed:

onedatactl_config_file = '/etc/galaxy/onedatactl.ini'
onedatactl_path = '/usr/local/bin'

Luksctl: LUKS volumes management

Luksctl is a python script allowing to easily Open/Close and Check LUKS encrypted volumes, parsing dmsetup and cryptsetup commands.

The script requires superuser rights.

Action Command
Open sudo luksctl open
Close sudo luksctl close
Status sudo luksctl status

Dependencies

Since the script is going to parse cryptsetup and dmsetup commands, both are required

cryptsetup

Open LUKS volumes

To open LUKS volume, call: luksctl open, which will require your LUKS decrypt password:

$ sudo luksctl open
Enter passphrase for /dev/disk/by-uuid/9bc8b7c6-dc7e-4aac-9cd7-8b7258facc75:
Name:              ribqvkjj
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      252, 1
Number of targets: 1
UUID: CRYPT-LUKS1-9bc8b7c6dc7e4aac9cd78b7258facc75-ribqvkjj


Encrypted volume: [ OK ]

Close LUKS volumes

To Close LUKS volume, call luksctl close:

$ sudo luksctl close
Encrypted volume umount: [ OK ]

LUKS volumes status

To check if LUKS volume is Open or not call luksctl status

$ sudo luksctl status
Name:              ribqvkjj
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      252, 1
Number of targets: 1
UUID: CRYPT-LUKS1-9bc8b7c6dc7e4aac9cd78b7258facc75-ribqvkjj


Encrypted volume: [ OK ]

Onedatactl: Onedata spaces management

Onedatactl is a python script to mount Onedata spaces, for user data and reference data. This script parse oneclient options using a configuration file (onedatactl.ini) to retrieve the required onedata space information (e.g. oneprovider, token and mountpoint).

Options

Parsing all oneclient option is out of the purpose of this script. Therfore the script provides only few options to allow user to easily mount their data hosted on onedata

Option Description
mount Mount user data or reference data space.
umount Umount user data or reference data space.
-t --token Set access token.
-H --provider Set onedata provider.
-m --mountpoint Set mountpoint.
-c config_file Load configuration file.
--insecure Allow insecure mount.
--nonempty Mount space on non empty space.
--version Show galaxyctl_libs version.

The onedatactl.ini file

Onedatactl.ini file is located in /etc/galaxy/onedatactl.ini. The ini file has two secitons: [userdata] for user data management and refdata for reference data mangement.

mountpoint

By enabling this option onedata space will be used to store user data and it will be mounted on the specified directory, e.g. /export This argument is required.

mountpoint = /export

provider

Insert Onedata provider for user data, e.g. oneprovider2.cloud.ba.infn.it This argument is required.

provider = oneprovider2.cloud.ba.infn.it

token

Insert Onedata token for user data. This argument is required.

token = MDAxNWxvY2F00aW9uIG9uZXpvbmUKMDAzYmlkZW500aWZpZXIgeExqMi00xdFN3YVp1VWIxM1dFSzRoNEdkb2x3cXVwTnpSaGZONXJSN2tZUQowMDFhY2lkIHRpbWUgPCAxNTI1MzM00NzgyCjAwMmZzaWduYXR1cmUgIOzeMtypO75nZvPJdAocInNbgH9zvJi6ifgXDrFVCr00K

insecure

Allow insecure connection. This is an optional argument.

insecure = False

nonempty

Allow to mount space on non empty directory. This is an optional argument

nonempty = False

Onedatactl options

The script allow to mount two different volumes.

mount

For user data, call:

sudo onedatactl mount userdata

For reference data, call:

sudo onedatactl mount refdata

umount

For user data, call:

sudo onedatactl umount userdata

For reference data, call:

sudo onedatactl umount refdata

status

For user data, call:

sudo onedatactl status userdata

For reference data, call:

sudo onedatactl status refdata

Oneclient

Onedatactl parse oneclient command using the information stored in /etc/galaxy/onedatactl.ini file. Therefore oneclient can be used to perform the same actions.

The basic command line syntax to mount spaces using a specific Oneprovider is:

oneclient -H <PROVIDER_HOSTNAME> -t <ACCESS_TOKEN> <MOUNT_POINT>

In order to unmount your spaces, type:

oneclient -u MOUNT_POINT

The complete Oneclient documentation is located here: https://onedata.org/docs/doc/using_onedata/oneclient.html

Troubleshooting

If you are connecting to a provider service which does not have a globally trusted certificate, you will have to use --insecure option.

Frequently Asked Questions

Laniakea FAQs.

Recover Galaxy after Virtual Machine reboot

How to correctly restart Galaxy after a reboot of the Virtual Machine ?

After the boot procedure of your VM at least these services should already be up and running:

  1. PostgreSQL
  2. Proftpd
  3. NGINX

Trying to connect to your Galaxy instance IP with a web browser you should see:

Galaxy page

Step 1: Unlock encrypted storage

If your instance is not mounted on encrypted storage please skip this and go to Step 2: One-command procedure

To unlock the volume connect with SSH and type the command sudo /usr/local/bin/luksctl open followed by your passphrase.

$ sudo /usr/local/bin/luksctl open
Enter passphrase for /dev/disk/by-uuid/4ba96890-f914-46aa-b7a6-7e71f0846f43:

Name:              jzwoejuw
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      252, 0
Number of targets: 1
UUID: CRYPT-LUKS1-4ba96890f91446aab7a67e71f0846f43-jzwoejuw

Encrypted volume: [ OK ]

Step 2: One-command procedure

To start Galaxy you can just copy and paste this:

$ sudo su -c "wget -O - https://raw.githubusercontent.com/Laniakea-elixir-it/Scripts/master/galaxy/recover.sh | bash" root

Probing /cvmfs/elixir-italy.galaxy.refdata... OK
Contacting Galaxy (wait for 10 seconds)...
Galaxy server on-line: [ OK ]

Warning

This may require few minutes, but should safely restart your Galaxy.

Galaxy page

Advanced Step-by-step procedure

Warning

Please follow this instructions only if you know what you are doing!

Mount reference data

After encrypion storage unlock (see section Step 1: Unlock encrypted storage), if you have it, you need now to mount your reference datasets:

$ sudo systemctl restart autofs
$ sudo cvmfs_config killall

Terminating cvmfs_config processes... OK
Terminating cvmfs2 processes... OK
Unmounting stale mount points... OK
Cleaning up run-time variable data... OK
Reloading autofs... OK
$ sudo cvmfs_config probe

Probing /cvmfs/elixir-italy.galaxy.refdata... OK
Start Galaxy

Finally, to correctly start all Galaxy services run: sudo /usr/local/bin/galaxy-startup

$ sudo /usr/local/bin/galaxy-startup

Loading Galaxy environment
Exporting environment variables
Check if galaxy is up and running, to avoid unuseful restart
status: 502
Galaxy unreachable
Check is supervisord is running
Starting the Galaxy production environment
/usr/lib/python2.7/site-packages/supervisor/options.py:383: PkgResourcesDeprecationWarning: Parameters to load are deprecated.  Call .resolve and .require separately.
return pkg_resources.EntryPoint.parse("x="+spec).load(False)

Galaxy start: [ OK ]

Galaxycloud Ansible Roles

Ansible automates Galaxy (postgresql, NGINX, uWSGI, proftpd) installation and configuration using YAML syntax.

These roles make extensive use of Ansible Modules, which are the ones that do the actual work in ansible, they are what gets executed in each playbook task. Furthermore, a python scripts collection for galaxy advanced configuration is used (run by ansible).

All roles can be easily installed through ansible-galaxy, for instance:

ansible-galaxy install indigo-dc.galaxycloud

Ansible roles documentation

Galaxycloud

Install Galaxy Production environment. This role has been specifically developed to be used for the ELIXIR-IIB use case in the INDIGO-DataCloud project.

Requirements

This ansible role supports CentOS 7, Ubuntu 14.04 Trusty and Ubuntu 16.04 Xenial

Note

Minimum ansible version: 2.1.2.0

Role Variables
Path

galaxy_instance_description: set Galaxy brand

galaxy_user: set linux user to launch the Galaxy portal (default: galaxy).

GALAXY_UID: set user UID (default: 4001).

galaxy_FS_path: path to install Galaxy (default: /home/galaxy).

galaxy_directory: Galaxy directory (usually galaxy or galaxy-dist, default galaxy).

galaxy_install_path: Galaxy installation directory (default: /home/galaxy/galaxy).

galaxy_config_path: Galaxy config pat location.

galaxy_config_file: Galaxy primary configuration file.

galaxy_venv_path: Galaxy virtual environment directory (usually located to <galaxy_install_path>/.venv).

galaxy_custom_config_path: Galaxy custom configuration files path (default: /etc/galaxy).

galaxy_custom_script_path: Galaxy custom script path (defautl: /usr/local/bin).

galaxy_log_path: log file directory (default: /var/log/galaxy).

galaxy_instance_url: instance url (default: http://<ipv4_address>/galaxy/).

galaxy_instance_key_pub: instance ssh public key to configure <galaxy_user> access.

galaxy_lrms: enable Galaxy virtual elastic cluster support. Currently supported local and slurm (default: local, possible values: local, slurm).

main options

GALAXY_VERSION: set Galaxy version (e.g. master, release_17.01, release_17.05…).

create_galaxy_admin: if true the administrator user will be created (default: true).

GALAXY_ADMIN_USERNAME: Galaxy administrator username.

GALAXY_ADMIN_PASSWORD: Galaxy administrator password.

GALAXY_ADMIN_API_KEY: Galaxy administrator API_KEY. https://wiki.galaxyproject.org/Admin/API. Please note that this key acts as an alternate means to access your account, and should be treated with the same care as your login password. To be changed by the administrator.(default value: GALAXY_ADMIN_API_KEY)

GALAXY_ADMIN_EMAIL: Galaxy administrator e-mail address

Galaxy configuration

export_dir: Galaxy userdata are stored here (defatult: /export).

tool_deps_path: change tool dependency directory (default: {{ export_dir }}/tool_deps)

use_conda: enable Conda (default: true).

job_work_dir: change job_working_dir path. Due to a current limitation in conda, the total length of the conda_prefix and the job_working_directory path should be less than 50 characters! (default: {{ export_dir }}/job_work_dir).

conda_prefix: change conda prefix directory (default: {{ export_dir }}/_conda).

conda_channels: change conda channels (default: iuc,bioconda,r,defaults,conda-forge).

update_ucsc: update UCSC genome database (default: ``true). A monthly cron job is added to keep update ucsc genome db.

fast_update: force database update by copying cached files (default: true).

use_pbkdf2: enable pbkdf2 cryptograpy (default: true).

Postgres database details

postgresql_version: set postgres version to be installed (current default: 9.6).

galaxy_db_dir: change galaxy database directory to store jobs results (default: {{export_dir}}/galaxy/database).

galaxy_db_port: set postgres port (default: 5432).

galaxy_db_passwd: set database password. By default it is generated a random password 20 characters long.

set_pgsql_random_password: if set to false the role takes the password specified through galaxy_db_passwd variable (default: true).

NGINX

nginx_upload_store_path: set nginx upload dataset directory (default: {{galaxy_db_dir}}/tmp/nginx_upload_store).

https mode

The Galaxy portal runs through an nginx http proxy by default. The following variables enable you to set nginx in https mode:

nginx_https: true
ssl_cert: /etc/certs/cert.pem
ssl_key: /etc/certs/key.pem
ssl_dhparam: /etc/certs/dhparam.pem

If nginx_https is set to true, the other ssl variables are required. You can either request a signed trusted certificate or generated self-signed certificate. An ansible role to generate self-signed certificate can be found in https://galaxy.ansible.com/LIP-Computing/ssl-certs/

PROFTPD

proftpd_welcome: set proftpd welcome message (default: galaxy ftp server).

proftpd_conf_path: set proftpd configuration file path.

proftpd_db_user: set proftpd database user (default: galaxyftp).

proftpd_db_passwd: set postgresql database password. By default it is generated a random password 20 characters long.

proftpd_files_path: set proftpd upload directory (default: {{galaxy_db_dir}}/ftp).

proftpd_ftp_port: set proftpd port (default: 21).

proftpd_passive_port_low: set passive port range minimum (default: 30000).

proftpd_passive_port_high: set passive port reng maximum (default:40000).

set_proftpd_random_password: if set to false the role takes the password specified through proftpd_db_passwd variable (default: true).

Init system

Currently this role support supervisord and systemd/upstart to start Galaxy services. init_type: if set to supervisord, it use to manage Galaxy. If set to init systemd/upstart is used to start Galaxy.

It is possible to exploit supervisord to manage postegreSQL, NGINX and proftpd setting to true the following variables. To run this role on docker container you have to set them to true. supervisor_manage_postgres: enable supervisord postgresql management (default: false).

supervisor_manage_nginx: enable supervisord nginx management (default: false).

supervisor_manage_proftpd: enable supervisord proftpd management (default: false).

Advanced storage configuration

enable_storage_advanced_options: this option, false by the default, has to be set to true only if you run the ansible role indigo-dc.galaxycloud-os, for advanced path configuration, onedata and filesystem encryption support. More details here: Galaxycloud-os (default: false).

Example Playbook

Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too:

- hosts: servers
  roles:
     - role: indigo-dc.galaxycloud
       GALAXY_ADMIN_EMAIL: "admin@elixir-italy.org"
       GALAXY_ADMIN_USERNAME: "admin"
       GALAXY_VERSION: "release_17.05"
       galaxy_instance_key_pub: "your_public_key"
       galaxy_instance_description: "INDIGO-CNR test"

Install Galaxy setting postgresql passwords:

- hosts: servers
  roles:
     - role: indigo-dc.galaxycloud
       GALAXY_ADMIN_EMAIL: "admin@elixir-italy.org"
       GALAXY_ADMIN_USERNAME: "admin"
       GALAXY_VERSION: "release_17.05"
       galaxy_instance_key_pub: "your_public_key"
       galaxy_instance_description: "INDIGO-CNR test"
       set_pgsql_random_password: false
       galaxy_db_passwd: 'galaxy'
       set_proftpd_random_password: false
       proftpd_db_passwd: 'galaxy'

Setup Galaxy Docker container. The role, using ansible, automatically recognize the virtual platform (virtual machine or Docker contanier).

- hosts: servers
  roles:
     - role: indigo-dc.galaxycloud
       GALAXY_ADMIN_EMAIL: "admin@elixir-italy.org"
       GALAXY_ADMIN_USERNAME: "admin"
       GALAXY_VERSION: "release_17.05"
       galaxy_instance_key_pub: "your_public_key"
       galaxy_instance_description: "INDIGO-CNR test"
       supervisor_manage_postgres: "True"
       supervisor_manage_nginx: "True"
       supervisor_manage_proftpd: "True"
License

Apache Licence v2

Galaxycloud-os

This role provides advanced storage options for Galaxy instances.

Warning

Run indigo-dc.galaxycloud-os before indigo-dc.galaxycloud, setting the variable enable_storage_advanced_options to true.

It is possible to select three different storage options using the os_storage ansible role variable.

Storage provider Description
Iaas IaaS block storage volume is attached to the instance and Galaxy is configured.
onedata Onedata space is mounte through oneclient and Galaxy is configured.
encryption IaaS block storage volume is encrypted with aes-xts-plain64 algorithm using LUKS.

Path configuration for Galaxy is then correctly set, depending on the storage solution selected, replacing the indigo-dc.galaxycloud path recipe (with the enable_storage_advanced_options set to true).

The role exploits the galaxyctl_libs (see Galaxyctl libraries) for LUKS and onedata volumes management .

LUKS encryption

For a detailed description of LUKS encryption used and scripts, see section Storage encryption.

Dependencies

For LUKS encryption the ansible role install cryptsetup.

For onedata reference data provider, the role depends on indigo-dc.oneclient role, to install oneclient:

- hosts: servers
  roles:
    - role: indigo-dc.oneclient
      when: os_storage == 'onedata'
Variables

The Galaxy path variables are the same of indigo-dc.galaxycloud.

Path

galaxy_user: set linux user to launch the Galaxy portal (default: galaxy).

GALAXY_UID: set user UID (default: 4001).

galaxy_FS_path: path to install Galaxy (default: /home/galaxy).

galaxy_directory: Galaxy directory (usually galaxy or galaxy-dist, default galaxy).

galaxy_install_path: Galaxy installation directory (default: /home/galaxy/galaxy).

export_dir: Galaxy userdata are stored here (defatult: /export).

galaxy_custom_config_path: Galaxy custom configuration files path (default: /etc/galaxy).

galaxy_custom_script_path: Galaxy custom script path (defautl: /usr/local/bin).

galaxy_log_path: log file directory (default: /var/log/galaxy).

galaxy_instance_key_pub: instance ssh public key to configure <galaxy_user> access.

galaxy_lrms: enable Galaxy virtual elastic cluster support. Currently supported local and slurm (default: local, possible values: local, slurm).

Main options

GALAXY_ADMIN_EMAIL: Galaxy administrator e-mail address

Isolation specific vars

os_storage: takes three possible values:

  1. IaaS: standard IaaS block storage volume.
  2. onedata: Onedata space is mounted for user data.
  3. download: IaaS block storage volume encrypted with aes-xts-plain64 is mounted.
Onedata

onedata_dir: onedata mountpoint. (default: /onedata).

Note

Once onedata space is mounted, files existing before mount operation, will not be available until volume umount. For this reason we set it to /onedata to a differet path.

onedatactl_config_file: set onedatactl config file (default: {{ galaxy_custom_config_path }}/onedatactl.ini).

userdata_oneprovider: set onedata oneprovider.

userdata_token: set onedata access token.

userdata_space: set space name.

Encryption

luks_lock_dir: set luks lock file directory (default: /var/run/fast_luks).

luks_success_file: set success file. It signals to ansible to proceed (default: /var/run/fast-luks.success).

luks_log_path: set LUKS log path (default: /var/log/galaxy).

luks_config_file: set luksctl configuration file (default: /etc/galaxy/luks-cryptdev.ini).

wait_timeout: time to waint encryption password (default: 5 hours).

mail_from: set mail from field (default: GalaxyCloud@elixir-italy.org).

mail_subject: with the instructions to access and encrypt the volume is sent to the user (default: [ELIXIR-ITALY] GalaxyCloud encrypt password).

LUKS specific variables

cipher_algorithm: set cipher algorithm (default: aes-xts-plain64).

keysize: set key size (default: 256).

hash_algorithm: set hash algorithm (default: sha256).

device: set device to mount (default: /dev/vdb)

cryptdev: set device mapper name (default: /dev'crypt).

mountpoint: set mount point. Usually the same of export_dir (default: {{ export_dir }}).

filesystem: set file system (default: ext4).

Create block file:

  1. https://wiki.archlinux.org/index.php/Dm-crypt/Device_encryption
  2. https://wiki.archlinux.org/index.php/Dm-crypt/Drive_preparation
  3. https://wiki.archlinux.org/index.php/Disk_encryption#Preparing_the_disk

Before encrypting a drive, it is recommended to perform a secure erase of the disk by overwriting the entire drive with random data.

To prevent cryptographic attacks or unwanted file recovery, this data is ideally indistinguishable from data later written by dm-crypt.

paranoic_mode: to enable block storage low level deletion set to true (default: false).

Example Playbook

IaaS configuration:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud-os
      os_storage: 'IaaS'
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      galaxy_instance_key_pub: '<your_ssh_public_key>'

    - role: indigo-dc.galaxycloud
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      GALAXY_ADMIN_USERNAME: "admin"
      GALAXY_VERSION: "release_17.05"
      galaxy_instance_key_pub: "<your_ssh_public_key>"
      enable_storage_advanced_options: true

Onedata configuration:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud-os
      os_storage: 'onedata'
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      userdata_provider: 'oneprovider2.cloud.ba.infn.it'
      userdata_token: '<your_access_token>'
      userdata_space: '<your_onedata_space>'
      galaxy_instance_key_pub: '<your_ssh_public_key>'

    - role: indigo-dc.galaxycloud
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      GALAXY_ADMIN_USERNAME: "admin"
      GALAXY_VERSION: "release_17.05"
      galaxy_instance_key_pub: "<your_ssh_public_key>"
      enable_storage_advanced_options: true

LUKS configuration:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud-os
      os_storage: 'encryption'
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      galaxy_instance_key_pub: '<your_ssh_public_key>'

    - role: indigo-dc.galaxycloud
      GALAXY_ADMIN_EMAIL: "admin@server.com"
      GALAXY_ADMIN_USERNAME: "admin"
      GALAXY_VERSION: "release_17.05"
      galaxy_instance_key_pub: "<your_ssh_public_key>"
      enable_storage_advanced_options: true

Galaxycloud-tools

This Ansible role is for automated installation of tools from a Tool Shed into Galaxy. The role use the path scheme from the role indigo-dc.galaxycloud

When run, this role will create a virtual environment, install ephemeris and invoke the install script to tools into Galaxy. The script stop Galaxy (if running), start a local Galaxy instance on http://localhost:8080 and install tools.

The list of tools to install is provided in files/tool_list.yaml file, hosted on an external repository: https://github.com/indigo-dc/Galaxy-flavors-recipes.

The role automatically clone this repository to install tools.

Requirements

This ansible role supports CentOS 7, Ubuntu 14.04 Trusty and Ubuntu 16.04 Xenial

Note

Minimum ansible version: 2.1.2.0

Role Variables
Path

galaxy_instance_description: set Galaxy brand

galaxy_user: set linux user to launch the Galaxy portal (default: galaxy).

GALAXY_UID: set user UID (default: 4001).

galaxy_FS_path: path to install Galaxy (default: /home/galaxy).

galaxy_directory: Galaxy directory (usually galaxy or galaxy-dist, default galaxy).

galaxy_install_path: Galaxy installation directory (default: /home/galaxy/galaxy).

galaxy_config_path: Galaxy config pat location.

galaxy_config_file: Galaxy primary configuration file.

galaxy_venv_path: Galaxy virtual environment directory (usually located to <galaxy_install_path>/.venv).

galaxy_custom_config_path: Galaxy custom configuration files path (default: /etc/galaxy).

galaxy_custom_script_path: Galaxy custom script path (defautl: /usr/local/bin).

galaxy_log_path: log file directory (default: /var/log/galaxy).

galaxy_instance_url: instance url (default: http://<ipv4_address>/galaxy/).

galaxy_instance_key_pub: instance ssh public key to configure <galaxy_user> access.

Main options

GALAXY_ADMIN_API_KEY: Galaxy administrator API_KEY. https://wiki.galaxyproject.org/Admin/API. Please note that this key acts as an alternate means to access your account, and should be treated with the same care as your login password. To be changed by the administrator.(default value: GALAXY_ADMIN_API_KEY)

galaxy_tools_tool_list_files: a list of yml files that list the tools to be installed.

galaxy_tools_base_dir: base dir to install installation script (default: /tmp).

galaxy_flavor: galaxy flavor to install. Each flavor corresponds to a directory hosted here: https://github.com/indigo-dc/Galaxy-flavors-recipes (defautl: galaxy-no-tools).

lock_file_path: add lock file to avoid role re-run during recipe update. To re-run the role remove the lock file {{lock_file_path}}/indigo-dc.galaxycloud-tools.lock (default: /var/run).

install_workflows: install workflows (default: ``false).

install_data_libraries: install data libs (default: ``false).

install_interactive_tours: enable interactive tours installation (default: ``false).

export_dir: Galaxy userdata are stored here (default: /export).

This role exploits a lite version of galaxy to install data-libraries. It install dataset in /home/galaxy/galaxy/database/files/000 by defaults. If the default galaxy database directory is different you have two options. By default the role read galaxy.ini and move datasets to file_path dir:

move_datasets: true
set_dataset_dest_dir: false

Otherwise you can set the destination directory. Dataset_dest_dir must exist, since the role will not create it.

move_datasets: true
set_dataset_dest_dir: true
dataset_dest_dir: '/path/to/dir'

Defaults values:

move_datasets:true

set_dataset_dest_dir:true

dataset_dest_dir:/path/to/dir

add_more_assets: add custom resources (i.e. visualisations plugins, custom web pages, etc.). Since there is no a standard way to retrieve and install visualisation plugin, we keep this recipes external and implement a common interface to insall these resources (default: false).

Create bootstrap user

if an apy key is not present on galaxy, a new user is created to istall tools and removed. This is a very basic implementation. Advanced one is located here: https://raw.githubusercontent.com/indigo-dc/ansible-galaxy-tools/master/files/manage_bootstrap_user.py Currently, to create it, few informations are needed: - galaxy installation path - galaxy_database_connection - and pbkdf2 enabled

create_bootstrap_user: false

galaxy_database_connection: postgresql://galaxy:galaxy@localhost:5432/galaxy

use_pbkdf2: true

bootstrap_user_mail: admin@server.com

bootstrap_user_name: admin

bootstrap_user_password: password

By default, the api key is random-generated, overwriting the galaxy_admin_api_key variable assignment. You can set it to a defined value, by setting this create_random_api_key to false. create_random_api_key: true

Example Playbook

Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud-tools
      galaxy_flavor: "galaxy-rna-workbench"
      galaxy_admin_api_key: "ADMIN_API_KEY"
      when: galaxy_flavor != "galaxy-no-tools"
Example Sources list

The role takes a sources list as input. Sources list recipe is used to describe Galaxy resources, tools, reference data, workflow, data libraries and/or visualization plugin to install.

The role always install tools:

- name: "Set {{galaxy_flavor}} resources"
  set_fact:
    galaxy_tools_tool_list_files:
      - '{{galaxy_tools_base_dir}}/Galaxy-flavors-recipes/{{galaxy_flavor}}/tool-list-example.yml'

You can enable workflows installation setting the variable install_workflows to true, then insert the directory containing wokflows:

# set path to workflow files
# ephemeris takes the worlflow path to install workflows
- name: "Set {{galaxy_flavor}} workflows resources"
  set_fact:
    install_workflows: true
    galaxy_tools_workflow_list_path:
      - '{{galaxy_tools_base_dir}}/Galaxy-flavors-recipes/{{galaxy_flavor}}/workflow'

The same goes for data libraries. You have to enable the installation, setting install_data_libraries``to ``true, then the yaml recipe path

# set yaml recipes
# ephemeris takes single files as argument
- name: "Set {{galaxy_flavor}} data library resources"
  set_fact:
    install_data_libraries: true
    galaxy_tools_data_library_list_files:
      - '{{galaxy_tools_base_dir}}/Galaxy-flavors-recipes/{{galaxy_flavor}}/library_data.yaml'

To enable tours set install_interactive_tours to true and the tours path:

# set galaxy tours path
# the whole dir is copied to galaxy/config/plugins/tours/
- name: "Set {{galaxy_flavor}} tours resources"
  set_fact:
    install_interactive_tours: true
    galaxy_tools_interactive_tour_list_path:
      - '{{galaxy_tools_base_dir}}/Galaxy-flavors-recipes/{{galaxy_flavor}}/tours'

Finally, it is possible to install external resources, like visualisation plugins, setting add_more_assets to true:

# set more resources to be installed
# like visualisation plugins.
# since there is no a standard way to retrieve and install
#  visualisation plugin, we keep this recepie external.
- name: "Install visualisation plugins"
  set_fact:
    add_more_assets: true
    galaxy_tools_assets_recipe_list_files:
      - '{{galaxy_tools_base_dir}}/Galaxy-flavors-recipes/{{galaxy_flavor}}/visualisations.yml'
Example Tool list

For each tool you want to install, you must provide tool name and owner and one between tool_panel_section_id and tool_panel_section_label in the yaml tool list.

---
api_key: <Admin user API key from galaxy_instance>
galaxy_instance: <Galaxy instance IP>

tools:
- name: fastqc
  owner: devteam
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False

- name: 'bowtie_wrappers'
  owner: 'devteam'
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False
License

Apache Licence v2

Galaxycloud-tooldeps

This role fix tools installation done using the role indigo-dc.galaxycloud-tools, taking into account the Galaxy version.

Example Playbook

Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too:

::
  • hosts: servers roles:

    • role: indigo-dc.galaxycloud-tooldeps
License

Apache Licence v2

Galaxycloud-refdata

Reference data ansible role for indigo-dc.galaxycloud. The role provides reference data and the corresponding galaxy configuration.

Currently, three reference data source are supported:

Provider Description
cvmfs CernVM-FS repository is used to provide reference data. It is mounted to /refdata.
onedata Onedata space hosting reference data is mounted to /refdata.
download Reference data are downloaded in /refdata (requires >100GB free space available on /refdata directory).

Moreover, this role, exploiting the python library Ephemeris, is able to check which tools have been installed through indigo-dc.galaxy-tools ansible role, and returns

  • the list of installed tools stored in /var/log/galaxy/galaxy-installed-tool-list.yml in yaml format
  • the list of missing tools stored in /var/log/galaxy/galaxy-missing-tool-list.yml in yaml format

Note

This option has been introduced for galaxy tools automatic deployment. If you need to install and configure reference data, you can disable it using galaxy_flavor: "galaxy-no-tools.

Requirements

When a CernVM-FS server is used, the role run the indigo-dc.cvmfs-client ansible role as dependency to install and configure the cvmfs client.

If the role use onedata to provide reference data, onedata command line tool oneclient needs to be installed on your system. In this case,the role is going to depend on indigo-dc.oneclient role and it will install oneclient automatically.

Finally, if the download option is selected, the role exploits a python script to download the reference data, which depends on python-pycurl (which is automatically installed).

Role Variables

galaxy_flavor: if different from ‘galaxy-no-tools’ the role will check if all tools installed using https://github.com/indigo-dc/ansible-galaxy-tools have been correctly installed. Possible galaxy_flavor values with the correspinding recipes are reported here: Galaxy ShedTools (default: galaxy-no-tools).

get_refdata: enable reference data configuration. If set to false this variable disable reference data configuration (default: true).

refdata_provider_type: takes three possible values:

  1. cvmfs: CernVM-FS repository with reference data is mounted
  2. onedata: Onedata space with reference data is mounted
  3. download: Reference data download

refdata_repository_name: onedata space, CernVM-FS repository name or subdirectory to download local reference data.

cvmfs variables

refdata_cvmfs_server_url: set CernVM-FS server (stratum 0 or Replica) address without ‘http://’ string, e.g. single ip address.

refdata_cvmfs_repository_name: set a different cvmfs repository name, overwriting the default option, which point to refdata_repository_name (e.g. elixir-italy.galaxy.refdata).

refdata_cvmfs_key_file: SSH public key to mount the repository

refdata_cvmfs_proxy_url: proxy address (default DIRECT).

refdata_cvmfs_proxy_port: proxy port (default 80).

onedata variables

refdata_provider: set reference data oneprovider (e.g. oneprovider2.cloud.ba.infn.it).

refdata_token set reference data access token (e.g. MDAxNWxvY2F00aW9uIG9uZXpvbmUKMDAzYmlkZW500aWZpZXIgeExqMi00xdFN3YVp1VWIxM1dFSzRoNEdkb2x3cXVwTnpSaGZONXJSN2tZUQowMDFhY2lkIHRpbWUgPCAxNTI1MzM00NzgyCjAwMmZzaWduYXR1cmUgIOzeMtypO75nZvPJdAocInNbgH9zvJi6ifgXDrFVCr00K).

refdata_space: set reference data space name.

download
at10: false # A. thaliana (TAIR 10)
at9: false # A. thaliana (TAIR 9)
dm2: false # D. melanogaster (dm2)
dm3: false # D. melanogaster (dm3)
hg18: false # H. sapiens (hg18)
hg19: false # H. sapiens (hg19)
hg38: false # H. sapeins (hg38)
mm10: false # M. musculus (mm10)
mm8: false # M. musculus (mm9)
mm9: false # M. musculus (mm8)
sacCer1: false # S. cerevisiae (sacCer1)
sacCer2: false # S. cerevisiae (sacCer2)
sacCer3: true # S. cerevisiae (sacCer3)

Select which reference data genome has to be downloaded.

Dependencies

For cvmfs server reference data provider, the role depends on indigo-dc.cvmfs-client role, which takes as input parameters the CernVM-FS server location details (stratum 0 address, public key and mount point).

- hosts: servers
  roles:
    - role: indigo-dc.cvmfs-client
      server_url: '90.147.102.186'
      repository_name: 'elixir-italy.galaxy.refdata'
      cvmfs_public_key: 'elixir-italy.galaxy.refdata.pub'
      proxy_url: 'DIRECT'
      proxy_port: '80'
      cvmfs_mountpoint: '/refdata'
      when:  refdata_provider_type == 'cvmfs'

For onedata reference data provider, the role depends on indigo-dc.oneclient role:

- hosts: servers
  roles:
    - role: indigo-dc.oneclient
      when: refdata_provider_type == 'onedata'
Example Playbook
  • Configure Galaxy with CernVM-FS reference data volume.

    - hosts: servers
      roles:
        - role: indigo-dc.galaxycloud-refdata
          galaxy_flavor: 'galaxy-no-tools'
          get_refdata: true
          refdata_provider_type: 'cvmfs'
          refdata_cvmfs_server_url: '90.147.102.186'
          refdata_cvmfs_repository_name: 'elixir-italy.galaxy.refdata'
          refdata_cvmfs_key_file: 'elixir-italy.galaxy.refdata'
          refdata_cvmfs_proxy_url: 'DIRECT'
    
  • Configure Galaxy with Onedata space for reference data.

    - hosts: servers
      roles:
        - role: indigo-dc.galaxycloud-refdata
          galaxy_flavor: "galaxy-no-tools"
          get_refdata: true
          refdata_provider: 'oneprovider2.cloud.ba.infn.it'
          refdata_token: 'MDAxNWxvY2F00aW9uIG9uZXpvbmUKMDAzYmlkZW500aWZpZXIgeExqMi00xdFN3YVp1VWIxM1dFSzRoNEdkb2x3cXVwTnpSaGZONXJSN2tZUQowMDFhY2lkIHRpbWUgPCAxNTI1MzM00NzgyCjAwMmZzaWduYXR1cmUgIOzeMtypO75nZvPJdAocInNbgH9zvJi6ifgXDrFVCr00K'
          refdata_space: 'elixir-italy.galaxy.refdata'
    
  • Download (all available) reference data. You can select which one download.

    - hosts: servers
      roles:
        - role: indigo-dc.galaxycloud-refdata
          galaxy_flavor: 'galaxy-no-tools'
          get_refdata: true
          refdata_repository_name: 'elixir-italy.galaxy.refdata'
          refdata_provider_type: 'download'
          at10: true # A. thaliana (TAIR 10)
          at9: true # A. thaliana (TAIR 9)
          dm2: true # D. melanogaster (dm2)
          dm3: true # D. melanogaster (dm3)
          hg18: true # H. sapiens (hg18)
          hg19: true # H. sapiens (hg19)
          hg38: true # H. sapeins (hg38)
          mm10: true # M. musculus (mm10)
          mm8: true # M. musculus (mm9)
          mm9: true # M. musculus (mm8)
          sacCer1: true # S. cerevisiae (sacCer1)
          sacCer2: true # S. cerevisiae (sacCer2)
          sacCer3: true # S. cerevisiae (sacCer3)
    

Galaxycloud-fastconfig

Ansible role for Galaxy fast configuration on Virtual Manchines with Galaxy already installed using indigo.dc-galaxycloud role.

Current indigo-dc.galaxycloud (and then Galaxy) configuration is the following:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud
      GALAXY_ADMIN_EMAIL: "admin@elixir-italy.org"
      GALAXY_ADMIN_USERNAME: "admin"
      GALAXY_VERSION: "release_17.05"
      galaxy_instance_key_pub: "ssh-rsa ..."
      set_pgsql_random_password: false # postgres password is fixed: galaxy
      set_proftpd_random_password: false # proftpd database password is fixed: galaxy
      galaxy_db_dir: '/home/galaxy/galaxy/database'
      tool_deps_path: '/home/galaxy/tool_deps'
      conda_prefix: '/home/galaxy/tool_deps/_conda'
      job_work_dir: 'database/jobs_directory'

Final Galaxy configuration, i.e. galaxycloud + galaxycloud-fastconfig is the same of galaxycloud standalone.

Example Playbook

Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too:

- hosts: servers
  roles:
    - role: indigo-dc.galaxycloud-fastconfig
      GALAXY_ADMIN_EMAIL: "mymail@example.com"
      GALAXY_ADMIN_USERNAME: "myuser"
      galaxy_instance_description: "mygalaxy"
      galaxy_instance_key_pub: "ssh-rsa ..."
License

Apache Licence 2

Galaxy-tools (deprecated)

This ansible role has been cloned from the official galaxyproject.galaxy-tools, with small changes.

This Ansible role is for automated installation of tools from a Tool Shed into Galaxy.

When run, this role will create an execution environment, install ephemeris and invoke the shed-install command to install desired tools into Galaxy. The list of tools to install is provided in files/tool_list.yaml file.

Example Playbook

To use the role, you need to create a playbook and include the role in it.

- hosts: localhost
  gather_facts: False
  connection: local
  vars:
      galaxy_tools_tool_list_files: [ "files/sample_tool_list.yaml" ]
      galaxy_tools_galaxy_instance_url: http://127.0.0.1:8080/
      # galaxy_tools_api_key: <API key for Galaxy admin user>
  roles:
    - indigo-dc.galaxy-tools
Example Tool list

For each tool you want to install, you must provide tool name and owner and one between tool_panel_section_id and tool_panel_section_label in the yaml tool list.

---
api_key: <Admin user API key from galaxy_instance>
galaxy_instance: <Galaxy instance IP>

tools:
- name: fastqc
  owner: devteam
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False

- name: 'bowtie_wrappers'
  owner: 'devteam'
  tool_panel_section_label: 'Tools'
  install_resolver_dependencies: True
  install_tool_dependencies: False
Variables
Required variables

Only one of the two variables is requried (if both are set, the API key takes precedence and a bootstrap user is not created):

galaxy_tools_api_key: the Galaxy API key for an admin user on the target Galaxy instance (not required if the bootstrap user is being created)

galaxy_tools_admin_user_password: a password for the Galaxy bootstrap user (required only if galaxy_install_bootstrap_user variable is set)

Optional variables

See defaults/main.yml for the available variables and their defaults.

Control flow variables

The following variables can be set to either yes or no to indicate if the given part of the role should be executed:

galaxy_tools_install_tools: (default: yes) whether or not to run the tools installation script

galaxy_tools_create_bootstrap_user: (default: no) whether or not to create a bootstrap Galaxy admin user

galaxy_tools_delete_bootstrap_user: (default: no) whether or not to delete a bootstrap Galaxy admin user

Cvmfs-server

Ansible role to install CernVM FS Server.

This role has been create to be general, but it is used to create Galaxy Reference Data read-only repository. To populate Stratum Zero repository with Reference data see section Build cvmfs server for reference data.

Requirements

This ansible role is compatible with both CentOS 7 and Ubuntu 16.04 Xenial.

The CernVM-FS (cvmfs) relies on OverlayFS or AUFS as default storage driver. Ubuntu 16.04 natively supports OverlayFS, therefore it is used as default, to create and populate the cvmfs server.

Python is required on host to run ansible: sudo apt-get install python

The apt ansible module requires the following packages on host to run:

  • python-apt (python 2)
Variables

repository_name: set the cvmfs repository name (default: elixir-italy.galaxy.refdata).

stratum_zero: set your domain or your ip address. (default: {{ ansible_default_ipv4.address }}).

repository_url: set the cvmfs repository url (default: http://{{ stratum_zero }}/cvmfs/{{ repository_name }}).

Example Playbook

The role is able to detect the server ip address automatically, through ansible. To customize your server, set your repository name.

- hosts: servers
  roles:
    - role: indigo-dc.cvmfs-server
      repository_name: '<your_repository_name>'
Development
  • S3 support and squid proxy server support is on-going.
  • Replica server support is on-going.

Cvmfs_client

Ansible role to install CernVM-FS Client.

Requirements

Python is required on host to run ansible.

The apt ansible module requires the following packages on host to run:

  • python-apt (python 2)
Variables

server_url: set cvmfs server url (e.g. ip address or domain).

repository_name: set cvmfs server repository name (default: elixir-italy.galaxy.refdata).

cvmfs_server_url: sert cvmfs server complete url (default: 'http://{{ server_url }}/cvmfs/{{ repository_name }}).

cvmfs_public_key_path: set path for cvmfs keys (default: /etc/cvmfs/keys).

cvmfs_public_key: set cvfms public key, usually <repository_name.pub> (defatul: {{ repository_name }}.pub).

cvmfs_public_key_list_files: list of *.pub files with the key to the repository to be mounted.

public_key_src_path: set cvmfs public key temporary path (default: /tmp).

proxy_url: set proxy name (default: DIRECT).

proxy_port: set proxy port (default: 80).

cvmfs_http_proxy: set proxy complete url (default: http://{{ proxy_url }}:{{ proxy_port }}).

cvmfs_mountpoint: set cvmfs mount point (default: /cvmfs, for reference data /refdata). If set to /cvmfs the role will use cvmfs_config probe to mount the repository.

add_fstab_entry: add fstab entry to automatically mount the repository (default: true).

Example Playbook

The role takes as input parameters the CernVM-FS server location details (stratum 0 address, public key and mount point).

- hosts: servers
  roles:
    - role: indigo-dc.cvmfs-client
      server_url: '90.147.102.186'
      repository_name: 'elixir-italy.galaxy.refdata'
      cvmfs_public_key: 'elixir-italy.galaxy.refdata.pub'
      proxy_url: 'DIRECT'
      proxy_port: '80'
      cvmfs_mountpoint: '/refdata'
      when:  refdata_provider_type == 'cvmfs'

TOSCA templates

TOSCA logo

The INDIGO PaaS Orchestrator is the key software component of the INDIGO PaaS layer, it receives deployment requests from the user interface software layer, and coordinates the deployment process over the IaaS platforms. Then the Infrastructure Manager deploysn the customized virtual infrastructure on IaaS Cloud deployment.

The INDIGO PaaS Orchestrator takes the deployment requests written in TOSCA YAML Simple Profile v1.0. To correctly orchestrate Galaxy deployment the following component are needed:

  • Ansible roles: automate software installation and configuration (see section Galaxycloud Ansible Roles)
  • Custom type: define user configurable parameters, node requirements, call ansible playbooks.
  • Artifact: define what to install and how to do it, through ansible role configuration.
  • TOSCA template: the orchestrator interprets the TOSCA template and orchestrates the deployment.

Note

This section is not inteded to be a complete guide to TOSCA types, but aims to describes the solutions adopted to deploy Galaxy.

Custom types

GalaxyPortal

Galaxy portal installation and configuration is entrusted to the GalaxyPortal custom type:

  1. Galaxy instance input parameters:

    properties:
      admin_email:
        type: string
        description: email of the admin user
        default: admin@admin.com
        required: false
      admin_api_key:
        type: string
        description: key to access the API with admin role
        default: not_very_secret_api_key
        required: false
      user:
        type: string
        description: username to launch the galaxy daemon
        default: galaxy
        required: false
      install_path:
        type: string
        description: path to install the galaxy tool
        default: /home/galaxy/galaxy
        required: false
      export_dir:
        type: string
        description: path to store galaxy data
        default: /export
        required: false
      version:
        type: string
        description: galaxy version to install
        default: master
        required: false
      instance_description:
        type: string
        description: galaxy instance description
        default: "INDIGO Galaxy test"
      instance_key_pub:
        type: string
        description: galaxy instance ssh public key
        default: your_ssh_public_key
    

    Note

    The export_dir property is able to set Galaxy storage location. On single VMs it is set to /export, while on Cluster it has to be set to /home/export, allowing for data sharing.

  2. Specify LRMS, e.g. local, torque, slurm, sge, condor, mesos:

    requirements:
      - lrms:
          capability: tosca.capabilities.indigo.LRMS
          node: tosca.nodes.indigo.LRMS.FrontEnd
          relationship: tosca.relationships.HostedOn
    
  3. Ansible role installation through ansible-galaxy:

    artifacts:
      galaxy_role:
        file: indigo-dc.galaxycloud
        type: tosca.artifacts.AnsibleGalaxy.role
    
  4. Ansible role call with input parameters:

    implementation: https://raw.githubusercontent.com/indigo-dc/tosca-types/master/artifacts/galaxy/galaxy_install.yml
          inputs:
            galaxy_install_path: { get_property: [ SELF, install_path ] }
            galaxy_user: { get_property: [ SELF, user ] }
            galaxy_admin: { get_property: [ SELF, admin_email ] }
            galaxy_admin_api_key: { get_property: [ SELF, admin_api_key ] }
            galaxy_lrms: { get_property: [ SELF, lrms, type ] }
            galaxy_version: { get_property: [ SELF, version ] }
            galaxy_instance_description: { get_property: [ SELF, instance_description ] }
            galaxy_instance_key_pub:  { get_property: [ SELF, instance_key_pub ] }
            export_dir: { get_property: [ SELF, export_dir ] }
    
  5. The artifact is located on github tosca-types/artifacts/galaxy/galaxy_install.yml

    ---
    - hosts: localhost
      connection: local
      roles:
        - role: indigo-dc.galaxycloud
          GALAXY_VERSION: "{{ galaxy_version }}"
          GALAXY_ADMIN_EMAIL: "{{ galaxy_admin }}"
          GALAXY_ADMIN_API_KEY: "{{ galaxy_admin_api_key }}"
    

GalaxyPortalAndStorage

GalaxyPortalAndStorage custom type inherits its properties from GalaxyPortal and extends its functionalities with onedata support and file system encryption:

  1. Storage options properties:

    os_storage:
      type: string
      description: Storage type (Iaas Block Storage (default), Onedaata, Filesystem encryption)
      default: "IaaS"
      required: true
    token:
      type: string
      description: Access token for onedata space
      default: "not_a_token"
      required: false
    provider:
      type: string
      description: default OneProvider
      default: "not_a_provider_url"
      required: false
    space:
      type: string
      description: Onedata space
      default: "galaxy"
      required: false
    
  2. Ansible roles: oneclient role is needed to install oneclient, while indigo-dc.galaxycloud-os is entrusted of file system encryption:

    oneclient_role:
      file: indigo-dc.oneclient
      type: tosca.artifacts.AnsibleGalaxy.role
    galaxy_os_role:
      file: indigo-dc.galaxycloud-os
      type: tosca.artifacts.AnsibleGalaxy.role
    galaxy_role:
      file: indigo-dc.galaxycloud
      type: tosca.artifacts.AnsibleGalaxy.role
    
  3. Ansible role call with input parameters:

    implementation: https://raw.githubusercontent.com/indigo-dc/tosca-types/master/artifacts/galaxy/galaxy_os_install.yml
    inputs:
      os_storage: { get_property: [ SELF, os_storage ] }
      userdata_token: { get_property: [ SELF, token ] }
      userdata_oneprovider: { get_property: [ SELF, provider ] }
      userdata_space: { get_property: [ SELF, space ] }
      galaxy_install_path: { get_property: [ SELF, install_path ] }
      galaxy_user: { get_property: [ SELF, user ] }
      galaxy_admin: { get_property: [ SELF, admin_email ] }
      galaxy_admin_api_key: { get_property: [ SELF, admin_api_key ] }
      galaxy_lrms: { get_property: [ SELF, lrms, type ] }
      galaxy_version: { get_property: [ SELF, version ] }
      galaxy_instance_description: { get_property: [ SELF, instance_description ] }
      galaxy_instance_key_pub:  { get_property: [ SELF, instance_key_pub ] }
      export_dir: { get_property: [ SELF, export_dir ] }
    
  4. The artifact includes indigo-dc.galaxycloud-os and indigo-dc.galaxycloud call.

    ---
    - hosts: localhost
      connection: local
      roles:
        - role: indigo-dc.galaxycloud-os
          GALAXY_ADMIN_EMAIL: "{{ galaxy_admin }}"
    
        - role: indigo-dc.galaxycloud
          GALAXY_VERSION: "{{ galaxy_version }}"
          GALAXY_ADMIN_EMAIL: "{{ galaxy_admin }}"
          GALAXY_ADMIN_API_KEY: "{{ galaxy_admin_api_key }}"
          enable_storage_advanced_options: true # true only with indigo-dc.galaxycloud-os
    

    Note

    The option enable_storage_advanced_options has to be set to true, leaving storage configuration to indigo-dc.galaxycloud-os.

GalaxyShedTool

  1. Ansible Galaxy tools properties:

    properties:
      flavor:
        type: string
        description: name of the Galaxy flavor
        required: true
        default: galaxy-no-tools
      admin_api_key:
        type: string
        description: key to access the API with admin role
        default: not_very_secret_api_key
        required: false
    
  2. Galaxy is required:

    ::
    requirements:
    • host:
      capability: tosca.capabilities.Container node: tosca.nodes.indigo.GalaxyPortal relationship: tosca.relationships.HostedOn
  3. Indigo-dc.galaxy-tools role installation:

    artifacts:
      galaxy_role:
        file: indigo-dc.galaxy-tools,master
        type: tosca.artifacts.AnsibleGalaxy.role
    
  4. Ansible role call. Instance IP address is needed to install tools:

    implementation: https://raw.githubusercontent.com/indigo-dc/tosca-types/master/artifacts/galaxy/galaxy_tools_configure.yml
    inputs:
      galaxy_flavor: { get_property: [ SELF, flavor ] }
      galaxy_admin_api_key: { get_property: [ HOST, admin_api_key ] }
      instance_public_ip: { get_attribute: [ HOST, public_address, 0 ] }
    
  5. The artifact checks if Galaxy is on-line before installing tools and downloads yaml tool list according to galaxy_flavor variable value.

    ---
    - hosts: localhost
      connection: local
      gather_facts: False
      pre_tasks:
        - name: Wait Galaxy is up
          uri: url="http://{{ instance_public_ip }}/galaxy/"
          register: result
          until: result.status == 200
          retries: 30
          delay: 10
          when: galaxy_flavor != 'galaxy-no-tools'
        - name: Get tool-list
          get_url: url="https://raw.githubusercontent.com/indigo-dc/Galaxy-flavors-recipes/master/galaxy-flavors/{{ galaxy_flavor }}-tool-list.yml" dest="/tmp/"
          when: galaxy_flavor != 'galaxy-no-tools'
      vars:
        galaxy_tools_tool_list_files: [ "/tmp/{{ galaxy_flavor }}-tool-list.yml" ]
        galaxy_tools_galaxy_instance_url: "http://{{ instance_public_ip }}/galaxy/"
        galaxy_tools_api_key: "{{ galaxy_admin_api_key }}"
      roles:
        - { role: indigo-dc.galaxy-tools, when: galaxy_flavor != 'galaxy-no-tools' }
    

    Note

    gather_facts: False is needed to properly set ansible variables.

GalaxyReferenceData

The ReferenceData custom type supports CernVM-FS, Onedata reference data volumes and reference data downloads.

  1. ReferenceData input parameters:

    properties:
      reference_data:
        type: boolean
        description: Install Reference data
        default: true
        required: true
      flavor:
        type: string
        description: name of the Galaxy flavor
        required: true
        default: galaxy-no-tools
      refdata_repository_name:
        type: string
        description: Onedata space name, CernVM-FS repository name or subdirectory downaload name
        default: 'elixir-italy.galaxy.refdata'
        required: false
      refdata_provider_type:
        type: string
        description: Select Reference data provider type (Onedata, CernVM-FS or download)
        default: 'onedata'
        required: false
      refdata_provider:
        type: string
        description: Oneprovider for reference data
        default: 'not_a_provider'
        required: false
      refdata_token:
        type: string
        description: Access token for reference data
        default: 'not_a_token'
        required: false
      refdata_cvmfs_server_url:
        type: string
        description: CernVM-FS server, replica or stratum-zero
        default: 'server_url'
        required: false
      refdata_cvmfs_repository_name:
        type: string
        description: Reference data CernVM-FS repository name
        default: 'not_a_cvmfs_repository_name'
        requred: false
      refdata_cvmfs_key_file:
        type: string
        description: CernVM-FS public key
        default: 'not_a_key'
        required: false
      refdata_cvmfs_proxy_url:
        type: string
        description: CernVM-FS proxy url
        default: 'DIRECT'
        required: false
      refdata_cvmfs_proxy_port:
        type: integer
        description: CernVM-FS proxy port
        default: 80
        required: false
      refdata_dir:
        type: string
        description: path to store galaxy reference data
        default: /refdata
        required: false
    
  2. Galaxy is required to install and configure reference data:

    requirements:
      - host:
          capability: tosca.capabilities.Container
          node: tosca.nodes.indigo.GalaxyPortal
          relationship: tosca.relationships.HostedOn
    
  3. Cvmfs role is used to install cvmfs client if cvmfs is selected to provide reference data. Oneclient role is used to provide onedata support.

    artifacts:
    oneclient_role:

    file: indigo-dc.oneclient type: tosca.artifacts.AnsibleGalaxy.role

    cvmfs_role:

    file: indigo-dc.cvmfs-client type: tosca.artifacts.AnsibleGalaxy.role

    galaxy_role:

    file: indigo-dc.galaxycloud-refdata type: tosca.artifacts.AnsibleGalaxy.role

  4. Ansible role call with paramteres:

    implementation: https://raw.githubusercontent.com/indigo-dc/tosca-types/master/artifacts/galaxy/galaxy_redfata_configure.yml
    inputs:
      get_refdata: { get_property: [ SELF, reference_data ] }
      galaxy_flavor: { get_property: [ SELF, flavor ] }
      refdata_repository_name: { get_property: [ SELF, refdata_repository_name ] }
      refdata_provider_type: { get_property: [ SELF, refdata_provider_type ] }
      refdata_provider: { get_property: [ SELF, refdata_provider ] }
      refdata_token: { get_property: [ SELF, refdata_token ] }
      refdata_cvmfs_server_url: { get_property: [ SELF, refdata_cvmfs_server_url ] }
      refdata_cvmfs_repository_name: { get_property: [ SELF, refdata_cvmfs_repository_name ] }
      refdata_cvmfs_key_file: { get_property: [ SELF, refdata_cvmfs_key_file ] }
      refdata_cvmfs_proxy_url: { get_property: [ SELF, refdata_cvmfs_proxy_url ] }
      refdata_cvmfs_proxy_port: { get_property: [ SELF, refdata_cvmfs_proxy_port ] }
      refdata_dir: { get_property: [ SELF, refdata_dir ] }
    
  5. Cvmfs public key is downloaded to mount cvmfs volume:

    ---
    - hosts: localhost
      connection: local
      pre_tasks:
        - name: Get refdata-list
          get_url:
          url: 'https://raw.githubusercontent.com/indigo-dc/Reference-data-galaxycloud-repository/master/cvmfs_server_keys/{{ refdata_cvmfs_key_file }}'
          dest: '/tmp'
      roles:
        - role: indigo-dc.galaxycloud-refdata
    

Galaxy template

The orchetrator interprets the TOSCA template and orchestrate the Galaxy deployment on the virtual machine.

Galaxy template is located here.

Input parameters are needed for each custom type used in the template:

  • Virtual hardware parameters:

    number_cpus:
      type: integer
      description: number of cpus required for the instance
      default: 1
    memory_size:
      type: string
      description: ram memory required for the instance
      default: 1 GB
    storage_size:
      type: string
      description: storage memory required for the instance
      default: 10 GB
    
  • Galaxy input paramters:

    admin_email:
      type: string
      description: email of the admin user
      default: admin@admin.com
    admin_api_key:
      type: string
      description: key to access the API with admin role
      default: not_very_secret_api_key
    user:
      type: string
      description: username to launch the galaxy daemon
      default: galaxy
    version:
      type: string
      description: galaxy version to install
      default: master
    instance_description:
      type: string
      description: galaxy instance description
      default: "INDIGO Galaxy test"
    instance_key_pub:
      type: string
      description: galaxy instance ssh public key
      default: your_ssh_public_key
    export_dir:
      type: string
      description: path to store galaxy data
      default: /export
    
  • Storage input parameters:

    galaxy_storage_type:
      type: string
      description: Storage type (Iaas Block Storage, Onedaata, Filesystem encryption)
      default: "IaaS"
    userdata_provider:
      type: string
      description: default OneProvider
      default: "not_a_privder_url"
    userdata_token:
      type: string
      description: Access token for onedata space
      default: "not_a_token"
    userdata_space:
      type: string
      description: Onedata space
      default: "galaxy"
    
  • Galaxy flavor input parameters:

    flavor:
      type: string
      description: Galaxy flavor for tools installation
      default: "galaxy-no-tools"
    
  • Reference data input parameters, for all possible options (CernVM-FS, Onedata and download).

    reference_data:
      type: boolean
      description: Install Reference data
      default: true
    refdata_dir:
      type: string
      description: path to store galaxy reference data
      default: /refdata
    refdata_repository_name:
      type: string
      description: Onedata space name, CernVM-FS repository name or subdirectory downaload name
      default: 'elixir-italy.galaxy.refdata'
    refdata_provider_type:
      type: string
      description: Select Reference data provider type (Onedata, CernVM-FS or download)
      default: 'onedata'
    refdata_provider:
      type: string
      description: Oneprovider for reference data
      default: 'not_a_provider'
    refdata_token:
      type: string
      description: Access token for reference data
      default: 'not_a_token'
    refdata_cvmfs_server_url:
      type: string
      description: CernVM-FS server, replica or stratum-zero
      default: 'server_url'
    refdata_cvmfs_repository_name:
      type: string
      description: Reference data CernVM-FS repository name
      default: 'not_a_cvmfs_repository_name'
    refdata_cvmfs_key_file:
      type: string
      description: CernVM-FS public key
      default: 'not_a_key'
    refdata_cvmfs_proxy_url:
      type: string
      description: CernVM-FS proxy url
      default: 'DIRECT'
    refdata_cvmfs_proxy_port:
      type: integer
      description: CernVM-FS proxy port
      default: 80
    

Input parameters are passed to the corresponding ansible roles, through custom type call:

galaxy:
  type: tosca.nodes.indigo.GalaxyPortalAndStorage
  properties:
    os_storage: { get_input: galaxy_storage_type }
    token: { get_input: userdata_token }
    provider: { get_input: userdata_provider }
    space: { get_input: userdata_space }
    admin_email: { get_input: admin_email }
    admin_api_key: { get_input: admin_api_key }
    version: { get_input: version }
    instance_description: { get_input: instance_description }
    instance_key_pub: { get_input: instance_key_pub }
    export_dir: { get_input: export_dir }
  requirements:
    - lrms: local_lrms

galaxy_tools:
  type: tosca.nodes.indigo.GalaxyShedTool
  properties:
    flavor: { get_input: flavor }
    admin_api_key: { get_input: admin_api_key }
  requirements:
    - host: galaxy

galaxy_refdata:
  type: tosca.nodes.indigo.GalaxyReferenceData
  properties:
    reference_data: { get_input: reference_data }
    refdata_dir: { get_input: refdata_dir }
    flavor: { get_input: flavor }
    refdata_repository_name: { get_input: refdata_repository_name }
    refdata_provider_type: { get_input: refdata_provider_type }
    refdata_provider: { get_input: refdata_provider }
    refdata_token: { get_input: refdata_token }
    refdata_cvmfs_server_url: { get_input: refdata_cvmfs_server_url }
    refdata_cvmfs_repository_name: { get_input: refdata_cvmfs_repository_name }
    refdata_cvmfs_key_file: { get_input: refdata_cvmfs_key_file }
    refdata_cvmfs_proxy_url: { get_input: refdata_cvmfs_proxy_url }
    refdata_cvmfs_proxy_port: { get_input: refdata_cvmfs_proxy_port }
  requirements:
    - host: galaxy
    - dependency: galaxy_tools

Note

Note that Reference data custom type needs Galaxy installed to the ost host: galaxy, but depends on galaxy tools dependency: galaxy_tools since it has to check installed and missing tools.

Finally we have virtual hardware customization:

host:
 properties:
   num_cpus: { get_input: number_cpus }
   mem_size: { get_input: memory_size }

Image selection:

os:
  properties:
    type: linux
    distribution: centos
    version: 7.2
    image: indigodatacloudapps/galaxy

And Storage configuration, which takes the export_dir input for the mount point and storage_size input allowing for storage size customization.

- local_storage:
    # capability is provided by Compute Node Type
    node: my_block_storage
    capability: tosca.capabilities.Attachment
    relationship:
      type: tosca.relationships.AttachesTo
      properties:
        location: { get_input: export_dir }
        device: hdb
my_block_storage:
  type: tosca.nodes.BlockStorage
  properties:
  size: { get_input: storage_size }

Galaxy cluster template

The Galaxycloud role provides the possibility to possible to instantiate Galaxy with SLURM as Resource Manager, just setting the galaxy_lrms variable to slurm.

This allows to instantiate Galaxy with SLURM cluster exploiting INDIGO custom types and ansible roles using INDIGO components:

  • CLUES (INDIGO solution for automatic elasticity)
  • Master node deployment with SLURM (ansible recipes + tosca types)
  • Install Galaxy + SLURM support (already in our ansible role indigo-dc.galaxycloud)
  • Worker node deployment
  • Galaxy customization for worker nodes

The related tosca template is located here.

The inpout parameters allow to customize the number of virtual nodes, nodes and master virtual hardware:

wn_num:
  type: integer
  description: Maximum number of WNs in the elastic cluster
  default: 5
  required: yes
fe_cpus:
  type: integer
  description: Numer of CPUs for the front-end node
  default: 1
  required: yes
fe_mem:
  type: scalar-unit.size
  description: Amount of Memory for the front-end node
  default: 1 GB
  required: yes
wn_cpus:
  type: integer
  description: Numer of CPUs for the WNs
  default: 1
  required: yes
wn_mem:
  type: scalar-unit.size
  description: Amount of Memory for the WNs
  default: 1 GB
  required: yes

Note

You can refere to Galaxy template section for galaxy input paramters.

The master node hosts Galaxy and Slurm controller:

elastic_cluster_front_end:
  type: tosca.nodes.indigo.ElasticCluster
  properties:
    deployment_id: orchestrator_deployment_id
    iam_access_token: iam_access_token
    iam_clues_client_id: iam_clues_client_id
    iam_clues_client_secret: iam_clues_client_secret
  requirements:
    - lrms: lrms_front_end
    - wn: wn_node

galaxy_portal:
  type: tosca.nodes.indigo.GalaxyPortal
  properties:
    admin_email: { get_input: admin_email }
    admin_api_key: { get_input: admin_api_key }
    version: { get_input: version }
    instance_description: { get_input: instance_description }
    instance_key_pub: { get_input: instance_key_pub }
  requirements:
    - lrms: lrms_front_end

lrms_front_end:
  type: tosca.nodes.indigo.LRMS.FrontEnd.Slurm
  properties:
    wn_ips: { get_attribute: [ lrms_wn, private_address ] }
  requirements:
    - host: lrms_server

lrms_server:
  type: tosca.nodes.indigo.Compute
  capabilities:
    endpoint:
      properties:
        dns_name: slurmserver
        network_name: PUBLIC
        ports:
          http_port:
            protocol: tcp
            source: 80
    host:
      properties:
        num_cpus: { get_input: fe_cpus }
        mem_size: { get_input: fe_mem }
    os:
      properties:
      image: linux-ubuntu-14.04-vmi

Then the worker nodes configuration (OS and virtual hardware):

wn_node:
  type: tosca.nodes.indigo.LRMS.WorkerNode.Slurm
  properties:
    front_end_ip: { get_attribute: [ lrms_server, private_address, 0 ] }
  capabilities:
    wn:
      properties:
        max_instances: { get_input: wn_num }
        min_instances: 0
  requirements:
    - host: lrms_wn

galaxy_wn:
  type: tosca.nodes.indigo.GalaxyWN
  requirements:
    - host: lrms_wn

lrms_wn:
  type: tosca.nodes.indigo.Compute
  capabilities:
    scalable:
      properties:
        count: 0
    host:
      properties:
        num_cpus: { get_input: wn_cpus }
        mem_size: { get_input: wn_mem }
    os:
      properties:
      image: linux-ubuntu-14.04-vmi

Note

Note that to orchestrate Galaxy with SLURM we do not need new TOSCA custom types or anible roles. Everythings is already built in INDIGO.

Build cvmfs server for reference data

This section gives a quick overview of the steps needed to create a new cvmfs repository to share reference data and activate it on the clients. The repository name used is elixir-italy.galaxy.refdata, but it can be replaced with the appropriate name.

All script needed to deploy a Reference data CernVM-FS Stratum 0 are located here.

Create CernVM-FS Repository

The CernVM-FS (cvmfs) relies on OverlayFS or AUFS as default storage driver. Ubuntu 16.04 natively supports OverlayFS, therefore it is used as default, to create and populate the cvmfs server.

  1. Install cvmfs and cvmfs-server packages.
  2. Ensure enough disk space in /var/spool/cvmfs (>50GiB).
  3. For local storage: Ensure enough disk space in /srv/cvmfs.
  4. Create a repository with cvmfs_server mkfs.

Warning

  • /cvmfs is the repository mount point, containing read-only union file system mountpoints that become writable during repository updates.
  • /var/spool/cvmfs Hosts the scratch area described here, thus might consume notable disk space during repository updates. When you copy your files to /cvmfs/<your_repository_name>/, they are stored in /var/spool/cvmfs, therefore you have ensure enough space to this directory.
  • /srv/cvmfs is the central repository storage location. During the cvmfs_server publish procedure, your files will be moved and stored here. Therefore you have to ensure enough space here, too. This directory needs to have enough space to store all your cvmfs server contents.

Note

A complete set of reference data takes 100 GB. Our cvmfs server exploits two different volumes, one 100 GB volume mounted on /var/spool/cvmfs and one 200 GB volume for /srv/cvmfs.

  • To Create a new repository:

    cvmfs_server mkfs -w http://<stratum_zero>/cvmfs/elixir-italy.galaxy.refdata -o cvmfs elixir-italy.galaxy.refdata'
    

    Replace <stratum_zero> with your domain or ip address.

  • Publish your contents to the cvfms stratum zero server:

    cvmfs_server transaction elixir-italy.galaxy.refdata
    touch /cvmfs/elixir-italy.galaxy-refdata/test-content
    cvmfs_server publish elixir-italy.galaxy.refdata
    
  • Periodically resign the repository (at least every 30 days):

    cvmfs_server resign elixir-italy.galaxy.refdata
    

    A resign script is located in /usr/local/bin/Cvmfs-stratum0-resign and the corresponding weekly cron job is set to /etc/cron.d/cvmfs_server_resign.

    Log file is located in /var/log/Cvmfs-stratum0-resign.log.

  • Finally restart the apache2 daemon.

    sudo systemctl restart apache2
    

The public key of the new repository is located in /etc/cvmfs/keys/elixir-italy.galaxy.refdata.pub

Client configuration

  • Add the public key of the new repository to /etc/cvmfs/keys/elixir-italy.galaxy.refdata.pub

  • Repository configuration:

    $ cat /etc/cvmfs/config.d/elixir-italy.galaxy.refdata.conf
    CVMFS_SERVER_URL=http://90.147.102.186/cvmfs/elixir-italy.galaxy.refdata
    CVMFS_PUBLIC_KEY=/etc/cvmfs/keys/elixir-italy.galaxy.refdata.pub
    CVMFS_HTTP_PROXY=DIRECT
    

Populate a CernVM-FS Repository (with reference data)

Content Publishing

  1. cvmfs_server transaction <repository name>
  2. Install content into /cvmfs/<repository name> (see Reference data download section)
  3. cvmfs_server publish <repository name>

Note

cvmfs_server publish command will take time to move your contents from /cvmfs to /srv/cvmfs.

Reference data download

Reference data are available on Openstack Swift for public download. The list of reference data download link is here

Furthermore, to automatically download our reference data set it is possible to use python script refdata_download.py.

THe package python-pycurl is needed to satisfy refdata_download.py requirements: on Ubuntu sudo apt-get install python-pycurl

Script usage

This script takes the yaml files as input located in Reference-data-galaxycloud-repository/lists/ directory.

Option Description
-i, --input. Input genome list in yaml format
-o, --outdir Destination directory. Default /refdata
-s, --space Subdirectory name (for cvmfs and onedata spaces). Default elixir-italy.galaxy.refdata
/usr/bin/python refdata_download.py -i sacCer3-list.yml -o /refdata -s elixir-italy.galaxy.refdata

Available Reference data yaml file:

  • at10-list.yml
  • at9-list.yml
  • dm2-list.yml
  • dm3-list.yml
  • hg18-list.yml
  • hg19-list.yml
  • hg38-list.yml
  • mm10-list.yml
  • mm8-list.yml
  • mm9-list.yml
  • sacCer1-list.yml
  • sacCer2-list.yml
  • sacCer3-list.yml

It is possible to download automatically all reference data files using the bash script refdata_download.sh, which parse the python script, using as input the list file Reference-data-galaxycloud-repository/lists/list.txt

./refdata_download.sh list.txt

INDIGO PaaS Orchestrator

PaaS Orchestrator is the core component of the INDIGO PaaS layer. It collects high-level deployment requests from the software layer, and coordinates the resource or service deployment over IaaS platforms.

Web site: https://www.indigo-datacloud.eu/paas-orchestrator

Official GitBook documentation: https://www.gitbook.com/book/indigo-dc/indigo-paas-orchestrator/details

GiHub: https://github.com/indigo-dc/orchestrator

Orchent

Orchent is the indigo command line client.

Orchent: https://github.com/indigo-dc/orchent

Infrastructure Manager

The Infrastructure Manager (IM) is a tool that deploys complex and customized virtual infrastructures on IaaS Cloud deployments (such as AWS, OpenStack, etc.).

Official GitBook documentation: https://www.gitbook.com/book/indigo-dc/im/details

Deployments log

Deployment logs are available in /var/tmp/.im/<im-id>/<deployment_ip>/ctxt_agent.log. For instance:

# tail -f /var/tmp/.im/1b0e064c-9a29-11e7-9c45-300000000002/90.147.102.27_0/ctxt_agent.log

Note

After each ansible role run, the log file is deleted!!

INDIGO FutureGateway

Warning

FutureGateway deployment requires https and java 8. Currently the script exploits openjdk 7. Se section Update to Java 8 - Appendix A.

To correctly setup the FGW portal follow the instruction in the Ubuntu LTS 14.04 Server section here as super user:

# IP=$(ifconfig | grep  -A 2 eth0 | grep inet\ addr | awk -F':' '{ print $2 }' | awk '{ print $1 }' | xargs echo)

# echo "$IP    futuregateway" >> /etc/hosts

# adduser --disabled-password --gecos "" futuregateway

# mkdir -p /home/futuregateway/.ssh

# chown futuregateway:futuregateway /home/futuregateway/.ssh

# wget https://github.com/indigo-dc/PortalSetup/raw/master/Ubuntu_14.04/fgSetup.sh

# chmod +x fgSetup.sh

# cat /dev/zero | ssh-keygen -q -N ""

# cat /root/.ssh/id_rsa.pub >> /home/futuregateway/.ssh/authorized_keys

# echo "#FGSetup remove the following after installation" >> /etc/sudoers

# echo "ALL  ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

You can edit the fgSetup.sh script to point to specific FGW release or branch.

We are currently using master branch.

GITBASE=https://github.com/indigo-dc                   # GitHub base repository endpoint
GITBASERAW=https://raw.githubusercontent.com/indigo-dc # GitHub base for raw content
GITPORTALSETUP_NAME="PortalSetup"                      # PortalSetup git path name
GITPORTALSETUP_CLONE="PortalSetup.git"                 # PortalSetup clone name
GITPORTALSETUP_TAG="master"                            # PortalSetup tag name
GITFGAPISERVER_NAME="fgAPIServer"                      # fgAPIServer git path name
GITFGAPISERVER_CLONE="fgAPIServer.git"                 # fgAPIServer clone name
GITFGAPISERVER_TAG="master"                            # fgAPIServer tag name
GITFGAPISERVERDAEMON_NAME="APIServerDaemon"            # APIServerDaemon git path name
GITFGAPISERVERDAEMON_CLONE="APIServerDaemon.git"       # APIServerDaemon clone name
GITFGAPISERVERDAEMON_TAG="master"                      # APIServerDaemin clone tag name

Then:

# ./fgSetup.sh futuregateway futuregateway <your ssh port> $(cat /root/.ssh/id_rsa.pub)

The ssh port is, usually, the 22.

Warning

FutureGateway token authentication requires https. Here, we are going to use Let’s Encrypt certificates, as examble. A quick guide is available here.

Portal configuration

Start the portal:

# /etc/init.d/futuregateway start

The portal will available at http(s)://<your_ip_address>:8080

Note

FGW (re)start take a while!

Login with the mail configured during the wizard and test as password. Then set your new password and recovery question.

Apache configuration

https is mandatory for FutureGateway Token authentication.

Enalble http_proxy and ssl modules on apache2

a2enmod ssl
a2enmod proxy_http

Port 443 must be opened.

In /etc/apach2/sites-available/ create your futuregateway.conf file, setting

ServerName <your_server_name>

...

## SSL directives
SSLEngine on
SSLCertificateFile /path/to/cert.pem
SSLCertificateKeyFile /path/to/key.pem
SSLCertificateChainFile /path/to/chain.pem

...

then enable FGW:

a2ensite futuregateway.conf

and reload apache:

# service apache2 reload

Add to FGW configuration file portal-ext.properties the following lines:

web.server.protocol=https
web.server.https.port=443

and restart FGW:

# /etc/init.d/futuregateway restart

Note

To create your signed cetificate with Let’s Encrypt: https://github.com/maricaantonacci/slam/blob/master/gitbook/create-custom-keystore.md

IAM integration

Iam portlets for the FGW portal are available on github: https://github.com/mtangaro/fgw-elixir-italy/tree/master/iam-modules

Put the portlets in the /home/futuregateway/FutureGateway/deploy/, FGW will upload them automatically, moving them in /home/futuregateway/FutureGateway/osgi/modules/.

You can follow this instructions to set it up: https://github.com/indigo-dc/LiferayPlugIns/blob/master/doc/admin.md.

The option javascript.fast.load=false has to be set in /home/futuregateway/FutureGateway/portal-ext.properties.

Administrator portlet

The administrator portlet is here: https://github.com/mtangaro/fgw-elixir-italy/tree/master/admin-modules

Once uploaded, the Future Gateway APIs URL is https://hostname/apis/v1.0.

Future Gateway apis pop-up

The next thing is the configuration of PTV (Portal Token Validator). This is a service which FG API server uses for token validation.

Specific Liferay user and role are needed to exploit PTV.

Note

This step is not mandatory, since you can still configure your portal owner for token validation, since it has all required permissions and is registerd to IAM.

But, if you change IAM client you have to create a new user and change the PTV configuration. This configuration avoid this.

Create a new Role named External Services and give it IAM token permissions:

External Services role
External Services IAM permissions

Create a new user (not a IAM user, just register it using the Sign-in liferay module and enable it).

Then assign the new Role External Services to the new user: Users and Organizations -> User Information -> Roles -> Select External Services and save.

To configure PTV you have to modify FutureGateway/fgAPIServer/fgapiserver.conf [1] by the following options:

fgapisrv_ptvendpoint= https://hostname/api/jsonws/iam.token/get-token-info
fgapisrv_ptvuser    = [...]
fgapisrv_ptvpass    = [...]

Moreover you have to configure FutureGateway/apache-tomcat-8.0.36/webapps/APIServerDaemon/WEB-INF/classes/it/infn/ct/ToscaIDC.properties [2] with:

fgapisrv_frontend   = https://hostname/apis/v1.0
fgapisrv_ptvtokensrv= https://hostname/api/jsonws/iam.token/get-token
fgapisrv_ptvendpoint= https://hostname/api/jsonws/iam.token
fgapisrv_ptvuser    = [...]
fgapisrv_ptvpass    = [...]

ptvuser and ptvpass corresponds to user email and password of a FGW user with the right permissions for token validations.

Warning

After changing [1] restart of Apache # service apache2 restart, and after [2] restart of Tomcat # service futuregateway restart.

To validate if your PTV service is working, you can do the following:

  1. Visit https://jwt.io and copy-paste your IAM token. Token is stored in Your User Name -> Account Settings -> Miscellaneus -> Iamaccesstoken

    IAM access token

    In the decoded payload, you will find your subject:

    321f0ea3-4aab-46f7-accf-f645cd9d3629
    
  2. Use the PTV web service directly:

    $ export PTV_USER= ***
    $ export PTV_PASS= ***
    $ export SUBJECT=321f0ea3-4aab-46f7-accf-f645cd9d3629
    
    $ curl -u "$PTV_USER:$PTV_PASS"\
           -d "subject=$SUBJECT"\
           https://hostname/api/jsonws/iam.token/get-token
    

To test if the FGW API server is authenticating you correctly, you can do the following:

$ curl https://hostname/apis/v1.0/applications

This should show ‘401 Unauthorized’, so do the following:

export TOKEN = IAM_token_from_FGW_portal

$ curl -H "Authorization: Bearer $TOKEN" https://hostname.cloud.ba.infn.it/apis/v1.0/applications

If FG API server is configured correctly, you will get JSON description of your FG applications.

Finally, going in the FutureGateway admin portlet you should see:

admin portlet

Portlet configuration

Create build environment

To correctly build FutureGateway portlets we recommends to use Ubuntu 16.04 Java 8 and gradle are needed:

# apt-get install gradle

Install Blade cli: https://dev.liferay.com/develop/tutorials/-/knowledge_base/7-0/installing-blade-cli

The linux version of the liferay portal is available here: https://sourceforge.net/projects/lportal/files/Liferay%20Workspace/1.5.0.1/LiferayWorkspace-1.5.0.1-linux-x64-installer.run

$ chmod +x LiferayWorkspace-1.5.0.1-linux-x64-installer.run

$ ./LiferayWorkspace-1.5.0.1-linux-x64-installer.run

Answer [2] Don't initialize Liferay Workspace directory

and continue the installation.

Build portlets

Next you should use some code lines like below:

blade init liferay-workspace

cd ./liferay-workspace

git clone https://github.com/indigo-dc/LiferayPlugIns modules/

cd ./modules

git checkout remotes/origin/nonofficial # to build nonofficial portlets

blade gw clean jar

Newly created portlets are in ./modules/LIB_NAME/build/libs.

Next you need copy created jars to ~/FutureGateway/deploy and portlets are available on the your website.

Update to Java 8 - Appendix A

sudo apt-get purge openjdk*

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java8-installer

log-out and log-in to refresh environment variables.

Configure Apache for http - Appendix B

Enalble http_proxy on apache2

a2enmod proxy_http

In /etc/apach2/sites-available/ create your futuregateway.conf file, setting

ServerName <your_serve_name>

...

then enable FGW:

a2ensite futuregateway.conf

and reload apache:

# service apache2 reload

Import Signed CA - Appendix C

To import the SSL certificate you have to

  1. Install the ca-certificates package:

    # yum install ca-certificates
    
  2. Add the certificate as a new file to

    # cp path/to/goagent/local/CA.crt /usr/local/share/ca-certificates/cert.crt
    
  3. Use command:

    # update-ca-certificates
    

Import Signed CA in Java keystore - Appendix D

If IAM is under https but Tomcat log ($CATILINA_HOME/webapps/APIServerDaemon/WEB-INF/logs/APIServerDaemon.log) is showing:

...
23:03:55,675 ERROR [http-nio-8080-exec-13][IAMEndPoints:69] IAM Configuration URL 'https://indigoiam.cloud.ba.infn.it/.well-known/openid-configuration' is not reachable
...

you have to import your https certificate in java keystore [*]:

# apt-get install ca-certificates-java

# keytool -import -file /path/to/crt/file/file.crt -storepass changeit -keystore $JAVA_HOME/lib/security/cacerts -alias mycert

To list java certificates:

$ keytool -list -storepass changeit -keystore $JAVA_HOME/lib/security/cacerts

You can install it in /etc/ssl/certs to make curl work, too.

[*] http://www.thinkplexx.com/learn/howto/security/tools/understanding-java-keytool-working-with-crt-files-fixing-certificate-problems

Create https certificate - Appendix E

You can install the certbot tool on your machine (1st approach) or you can use the docker image certbot/certbot (2nd approach).

Install certbot tool (https://certbot.eff.org/#ubuntuxenial-other)
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:certbot/certbot
$ sudo apt-get update
$ sudo apt-get install certbot
$ sudo certbot certonly --standalone -d $HOSTNAME

The certificates should be in:
$ sudo ls /etc/letsencrypt/live/$HOSTNAME
cert.pem  chain.pem  fullchain.pem  privkey.pem  README

or

$ sudo docker run -it --rm -p 80:80 -p 443:443 -v /etc/letsencrypt:/etc/letsencrypt/  certbot/certbot certonly --standalone -d $HOSTNAME
The certificates should be in:
$ sudo ls /etc/letsencrypt/live/$HOSTNAME
cert.pem  chain.pem  fullchain.pem  privkey.pem  README

Fix ghost deployment issue

Open mysql FGW database:

mysql -h localhost -P 3306 -u fgapiserver -pfgapiserver_password fgapiserver

mysql> select f1.id, f2.id, f1.name from fg_user f1 join fg_user f2 on f1.name = f2.name where f1.id <> f2.id;
+----+----+--------------------------------------+
| id | id | name                                 |
+----+----+--------------------------------------+
|  6 |  5 | c0b907df-43a3-4c8a-952a-2b5ca56ec43e |
|  5 |  6 | c0b907df-43a3-4c8a-952a-2b5ca56ec43e |
|  8 |  7 | 9c3c7f53-7279-4008-82be-60600418c884 |
|  7 |  8 | 9c3c7f53-7279-4008-82be-60600418c884 |
+----+----+--------------------------------------+
4 rows in set (0.00 sec)

Suppose you want to delete user id = 6, which is showing you double entries (note that in this particular case we deleted both id=6 and id=8 which were our double entries, keeping id=5 and id=7):

mysql> delete from fg_user_group where user_id = 6;
Query OK, 1 row affected (0.04 sec)

mysql> delete from fg_user where id=6;
Query OK, 1 row affected (0.01 sec)

Then you can add unique contraint to prevent double entries:

mysql> alter table fg_user add unique(name);
Query OK, 0 rows affected (0.50 sec)
Records: 0  Duplicates: 0  Warnings: 0

Logs

You can easily access to logs with symlinks:

ln -s /home/futuregateway/FutureGateway/apache-tomcat-8.0.36/webapps/APIServerDaemon/WEB-INF/logs/APIServerDaemon.log logs-apiserverdaemon.log
ln -s /home/futuregateway/FutureGateway/fgAPIServer/fgapiserver.log logs-fgapiserver.log
ln -s /home/futuregateway/FutureGateway/apache-tomcat-8.0.36/logs/catalina.out logs-tomcat-catalina.out

INDIGO CLUES

CLUES is an elasticity manager system for HPC clusters and Cloud infrastructures that features the ability to power on/deploy working nodes as needed (depending on the job workload of the cluster) and to power off/terminate them when they are no longer needed.

Official GitBook documentation: https://www.gitbook.com/book/indigo-dc/clues-indigo/details

Check worker nodes status

To check worker node status:

# sudo clues status
node                          state    enabled   time stable   (cpu,mem) used   (cpu,mem) total
-----------------------------------------------------------------------------------------------
vnode-1                       powon    enabled     00h02'54"      0,0.0            1,1073741824.0
vnode-2                         off    enabled     00h41'00"      0,0.0            1,1073741824.0

CLUES commands:

# clues --help
The CLUES command line utility

Usage: clues [-h] [status|resetstate|enable|disable|poweron|poweroff|nodeinfo|shownode|req_create|req_wait|req_get]

[-h|--help] - Shows this help
* Show the status of the platform
Usage: status

* Reset the state of one or more nodes to idle
Usage: resetstate <nodes>
<nodes> - names of the nodes whose state want to be reset

* Enable one or more nodes to be considered by the platform
Usage: enable <nodes>
<nodes> - names of the nodes that want to be enabled

* Disable one or more nodes to be considered by CLUES
Usage: disable <nodes>
<nodes> - names of the nodes that want to be disabled

* Power on one or more nodes
Usage: poweron <nodes>
<nodes> - names of the nodes that want to be powered on

* Power off one or more nodes
Usage: poweroff <nodes>
<nodes> - names of the nodes that want to be powered off

* Show the information about node(s), to be processed in a programmatically mode
Usage: nodeinfo [-x] <nodes>
[-x|--xml] - shows the information in XML format
<nodes> - names of the nodes whose information is wanted to be shown

* Show the information about node(s) as human readable
Usage: shownode <nodes>
<nodes> - names of the nodes whose information is wanted to be shown

* Create one request for resources
Usage: req_create --cpu <value> --memory <value> [--request <value>] [--count <value>]
--cpu|-c <value> - Requested CPU
--memory|-m <value> - Requested Memory
[--request|-r] <value> - Requested constraints for the nodes
[--count|-n] <value> - Number of resources (default is 1)

* Wait for a request
Usage: req_wait <id> [timeout]
<id> - Identifier of the request to wait
[timeout] - Timeout to wait

* Get the requests in a platform
Usage: req_get

Check worker nodes deployment

Worker node deployment log are available to: /var/log/clues2/clues2.log

Troubleshooting

Invalid Token

Symptoms: Galaxy jobs stuck in This job is waiting to run and stay gray in the Galaxy history.

The worker nodes are not correctly instantiated, due to an Invalid Token. Check /var/log/clues2/clues2.log:

urllib3.connectionpool;DEBUG;2017-10-31 10:52:33,288;"GET /orchestrator/deployments/48126bd4-14d8-494d-970b-fb581a3e13b2/resources?size=20&page=0 HTTP/1.1" 401 None
[PLUGIN-INDIGO-ORCHESTRATOR];ERROR;2017-10-31 10:52:33,291;ERROR getting deployment info: {"code":401,"title":"Unauthorized","message":"Invalid token: eyJraWQiOiJyc2ExIiwiYWxnIjoiUlMyNTYifQ.eyJzdWIiOiI3REU4Qjg4MC1DNEQwLTQ2RkEtQjQxMS0wQTlCREI3OUYzOTYiLCJpc3MiOiJodHRwczpcL1wvaWFtLXRlc3QuaW5kaWdvLWRhdGFjbG91ZC5ldVwvIiwiZXhwIjoxNTA5NDQ0NDY2LCJpYXQiOjE1MDk0NDA4NjYsImp0aSI6IjAyZmE5YmM0LTBkMjctNGJkZi1iODVjLTJlMjM2NjNjNmY5OCJ9.QqjYzVs0h5kuqoBZQf5PPcYrsRJksTFyZO5Zpx8xPcfjruWHwwOnw9knQq8Ex3lwAXgi5qxdmqBDi4EIZAOaoFsPirlM7K6fCBE0-M_btm4nTbUvTSaUAfjki41DnPoEjLqXTTy8XLPUrCSmHVeqvSHHFipeSkP9OxKltlUadPc"}

Solution:

  1. Stop CLUES: sudo systemctl stop cluesd.
  2. Edit the file /etc/clues2/conf.d/plugin-ec3.cfg and change the value of the INDIGO_ORCHESTRATOR_AUTH_DATA parameter with the new token.
  3. Restart CLUES sudo systemctl start cluesd.
  4. You also have to open the CLUES DB with sqlite3 command: sqlite3 /var/lib/clues2/clues.db and delete old refreshed token: DELETE FROM orchestrator_token;. To exit from sqlite just type: .exit.

Support

If you need support please contact us to: laniakea.helpdesk@gmail.com

Software glitches and bugs can occasionally be encoutered. The best way to report a bug is to open an issue on our GitHub repository.

Cite

Licence

GNU General Public License v3.0+ (https://www.gnu.org/licenses/gpl-3.0.txt)

Galaxy tutorials

Galaxy training network: https://galaxyproject.org/teach/gtn/

Galaxy For Developers: https://crs4.github.io/Galaxy4Developers/

Indices and tables