On the CENAGIS IT Platform you will find modern

Technologies

IT Platform Components

The CENAGIS IT Platform is a collection of many interacting tools and elements.

Together they form a coherent system with a unified login system, universal access to working data, and supporting documentation at every step of the analytical process.

  • Access to most of the system’s functions is possible through a specially prepared Access Panel. It allows you to go from among others:
  • Viewing data in the CENAGIS Repository
  • Viewing and editing metadata in the Resource Directory
  • Viewing and launching Geospatial Services (Geospatial Services Node).
  • Managing virtual computers
  • Using the computing environment
  • Using shared storage space
  • Using communication tools in teams
The following sections describe the most important features of these components.

Virtualization Subsystem

Working on a virtual computer resembles working on a classic desktop computer.
In contrast to the classic environment, access is possible from any computer with Internet access, sharing and presentation of work environments, working in different operating systems: Windows and different versions of Linux (when using several virtual machines). This allows, for example, testing data processing and visualization on computers with different CPU, RAM, and GPU numbers.

The virtual machines come with the Windows Server operating system or virtually any free Linux distribution. Ubuntu or CentOS is installed as standard.

A wide range of GIS-class software, e.g. from Hexagon, photogrammetric software, and software designed for spatial data processing/conversion is configured in the system. Some of this software can be offered to some users as part of an access package. It is also possible to use commercial software under the BYOL – “Bring Your Own License” – licensing model. Users can thus install software they own on virtual computers if they have the appropriate licenses. In addition to commercial software, a wide range of open-source GIS software and technologies are available, e.g. QGIS, PostGIS, Geoserwer, and Cesium.

Twenty-one servers are responsible for virtualization on the CENAGIS platform, which can run for research and commercial purposes. These resources allow running several hundred virtual machines, and a single machine can often exceed an ordinary workstation’s performance capabilities. The maximum parameters of a single machine are up to 54 vCPUs and 300 GB of RAM. The analysis results, the visualization of which is more graphically demanding, can be displayed with the support of a separate part of NVIDIA Tesla V100 (32 GB) and A100 (80 GB) graphics cards with CUDA technology. In addition, a separate experimental infrastructure of 12 servers housed in a different server room is available.

In addition, each virtual computer allows easy access to large geospatial data resources from the CENAGIS Repository. They are connected as network drives or through a configured connection to a PostGIS database with data loaded and ready to go. It is also important to have access to software with configured access to data from the CENAGIS Repository (QGIS, Hexagon – if a license is acquired, ESRI – if a license is acquired).

One of the goals of CENAGIS is to facilitate the work of research teams, so convenient tools are at their disposal to facilitate collaboration, such as shared folders between machines and system components, shared databases or source code repositories. Hardware resources can be dynamically distributed among team members, which allows for cost optimisation and better-organised work.

Virtualization Subsystem

Windows : Windows accessible via remote desktop in a web browser.
Ubuntu : Linux accessible through a remote desktop in a web browser.
KVM : Virtual Machine Hypervisor.
CloudStack : Virtual machine infrastructure management.
Ceph : Secure data storage on distributed servers.

GIS software

Hexagon : Hexagon's software suite, led by the Geomedia application
QGIS : The most popular open and free GIS package
FME : Software designed to process thousands of types and formats of data.
Limon : Dephos software designed to work with point clouds.
ESRI : Software from ESRI, led by ArcGIS.

BigData Analysis Subsystem

It is a set of hardware and software providing a unified environment designed for Spatial Big Data research.

Special emphasis is placed on analytics based on geospatial data using containerization and distributed data storage and processing.

A state-of-the-art Jupyter Lab development environment has been configured to conduct a variety of analyses. It is a configured work environment for a geo-data scientist to process and analyze data in Python (a typical data scientist work environment). Users can use prepared libraries, among others, to facilitate the use of artificial intelligence methods.

By basing the environment on containerization, it is possible to use this technology as an optimal way to test and share prototype solutions. It is also possible to run containers using the power of available professional graphics cards.

The core component of this subsystem is DC/OS software, an open-source distributed operating system based on Apache Mesos software. DC/OS manages multiple hardware servers from a single interface, allowing containers (docker) and distributed services to run on these machines and providing networking and resource management.

Data processing and analysis mechanisms are based on Apache Spark analytical engine technology, whose processes run in Docker containers under the control of Apache Mesos.

The Big Data subsystem consists of the resources of 30 servers, which provide 1,800 vCPUs and 11 TB of RAM for users.

There are 60 NVIDIA Tesla T4 cards for GPU-based computing.

This is an entirely new approach to spatial data processing in geoinformation, providing opportunities to realize analysis on a scale previously unattainable.

BigData subsystem

Jupyter : A browser-based development environment that combines code with documentation and data visualization.
Python : BigData's core programming language.
GeoPandas : Handling vector spatial data, prepared for large-scale calculations.
geomesa : Distributed BigData analytics for vector data.
RasterFrames : Distributed BigData analytics for raster data.
Apache Spark : Distributed big data processing engine.
docker : All BigData subsystem services are based on containerization technology.
Apache Mesos : BigData subsystem resource pool orchestrator.
DC/OS : A distributed operating system that manages all running services.

Data repository

Equally important as the technical infrastructure is the data made available in CENAGIS.  Selected open and commercial geospatial data resources covering the territory of Poland are available in an accessible way in a consistent IT environment.

• Reference and thematic data from the geodetic and cartographic resource
• Thematic data from various institutions
• European Space Agency (ESA) satellite images.
• Data from community resources
• Data from commercial companies
• and many others

For more information, see Data.

Data is stored in a distributed HDFS file system, in relational databases (e.g. PostgreSQL) and in NoSQL databases (e.g. Accumulo). The system automatically takes care of data replication and backups, so data in the repository and user data are safe and resistant to hardware failures. 

Users can browse the data available in the system’s Repository in several ways:

  • Viewing data from the CENAGIS Repository, proprietary data and OGC services using the CENAGIS MapViewer web browser (including point cloud viewing, multi-temporal comparison and 3D visualizations)
  • Viewing data from the CENAGIS Repository, your own data and OGC services using the Hexagon Geospatial Portal (including downloading data fragments to your CENAGIS drive)
  • Browse the CreoDIAS Repository (access to ESA satellite imagery) in two ways, including simplified access via CENAGIS SatExplorer.

An important feature of the system is the function for viewing and editing metadata in the Resource Catalog. It allows users to:

  • Search resources (data, software, working environments, projects, research teams, publications, documentation)
  • Analyse metadata
  • Edit custom metadata about your own resources added to the Repository

Data storage

PostgreSQL : Basic relational database in CENAGIS
PostGIS : PostgreSQL extension to efficiently store spatial data.
GeoServer : A server that provides data viewing services.
HDFS : Distributed file system dedicated to the BigData Subsystem.
Parquet : A file format that stores spatial BigData
Accumulo : NoSQL database storing large spatial vector data.

Additional services

In addition to the typical work in data processing tools, the CENAGIS Platform provides additional tools to support user collaboration. The purpose of this type of solution is primarily:

  • Finding partners for projects
  • Verification of ideas
  • Promote and share the results of your research
  • Promote and share your data
  • Promote your technological solutions
  • Interact to create more complex projects
  • CENAGIS Drive – convenient file sharing
  • CENAGIS Chat – internal communicator
  • CENAGIS Forum – contact with the entire CENAGIS community
  • Jitsi Meet – teleconferencing
  • CENAGIS GitLab – code repository and project management
  • CENAGIS Wiki – documentation of all system components
  • Technology introduction tutorials
  • Video tutorials
  • Community support through chat and forum
  • Helpdesk CENAGIS

CENAGIS Services

NextCloud : An engine for the CENAGIS Drive file-sharing service.
Rocket Chat : Responsible for the CENAGIS Chat service allowing private and group messages to be exchanged.
Flarum : CENAGIS Forum Engine - a discussion venue for the geoanalyst community.
Jitsi Meet : A service for making voice and video calls to other users.
GitLab : Source code repositories, and project management in CEANGIS.
wiki.js : A page engine with documentation and tutorials on how to use the platform.

Want to start using the CENAGIS IT Platform?