PIC: Port d'Informació Científica

GONZALO MERINO


The Port d'Informació Científica (PIC) is a scientific-technological center maintained through a collaboration agreement between IFAE and CIEMAT, with the support of UAB. PIC uses distributed high throughput computing technologies including clusters, Grid, Cloud and Big Data, to provide services to manage large amounts of data to scientific collaborations whose researchers are spread all over the world.


2018 Highlights

In 2018, PIC´s work in the field of producing synthetic galaxy catalogs has been recognized and was awarded the Euclid STAR prize during the Euclid Consortium meeting in Bonn. Furthermore, the presentation of CosmoHub as a powerful tool for the exploration and distribution of cosmological data was awarded the prize for best poster at the European Week of Astronomy and Space Science in Liverpool. Another important accomplishment was the success in utilizing resources in the MareNostrum4 supercomputer at BSC-CNS for running LHC simulations, paving the way for effective integration of HPC and HTC resources to tackle future WLCG challenges.

Finally, with the support of our PAU Data Management pipeline and the integration of scientific data analysis pipelines into this framework, the first paper on the estimation of the PAU photometric redshift has been produced.

Image

Figure 1


Data Center Infrastructure

PIC’s data center is located in UAB’s Building D and has two different server rooms with different characteristics and energy efficiency profiles:

  • 150m2 air-cooled room which hosts the storage and computing IT equipment
  • 25m2 liquid-cooled room which hosts only computing resources PIC’s air-cooled room has a raised-floor with 34 racks and 1400 rack units for IT equipment and an Storagetek SL8500 tape library. PIC’s liquid-cooled room has four oil immersion tanks. It is the only liquid immersion scientific data center in Spain.

Service Catalogue

PIC supports a variety of scientific projects deploying computing and storage resources for intensive processing and analysis of extremely large amounts of data.

  • Data Processing Services
    • Access to more than 8300 CPU slots on two clusters managed by HTCondor and PBS. These resources are integrated in the world’s largest distributed scientific computing infrastructure - the WorldWide LHC Computing Grid (WLCG) - and are also available to local users.
  • Data handling and storage
    • Open Source software tools to manage 35PB of scientific data: 10PB on high performance disk and 25PB on reliable tape for long term archive.
  • Data Transfer, data access and network connectivity
    • With a current external connectivity of 20Gbps and annual I/O rates close to 30PB in and out, PIC is the largest data mover in the Spanish academic network.
    • During 2018 RedIRIS has deployed a dedicated 10Gbps network link to perform the data transfer for the MAGIC and LST(CTA) telescopes from the Observatorio Roque de los Muchachos in La Palma to PIC.
  • Front-End services and user support
    • Customized access to data processing and analysis services.

Hybrid Cloud Infrastructure - HNSciCloud

IFAE participates in the HNSciCloud project, co-funded by the EU Horizon 2020 Work Programme, whose goal is to develop a Hybrid Cloud platform for science.

During 2018 IFAE adapted two use cases for the astrophysics projects MAGIC and CTA to run in the Cloud. As part of this activity, Cloud resources were used to transparently extend the local computing facilities and boost the data processing capacity.

The Worldwide LHC Computing Grid

The Worldwide LHC Computing Grid (WLCG) is a distributed computing infrastructure comprising resources from more than 170 centres in 42 countries. WLCG is used to analyse the unprecedented rate of hundreds of Petabytes (PB) of data generated each year by the LHC.

LHC TIER-1 Data Center

The Tier-1 centre at PIC provides services to three of the LHC experiments, accounting for ~5% of the total Tier-1 capacity for ATLAS, CMS, and LHCb. Besides providing 24x7 data processing services enabling the LHC science, the project is engaged in a big R&D project that aims to federate the Tier-1 and the Tier-2 resources from IFAE and CIEMAT, aligning with the WLCG strategies towards HL-LHC. At the end of 2018, the Tier-1 resources at PIC were of 6700 CPU cores, 8PB of disk storage space, and 21PB of tape storage space.

2018 has been a year with many R&D and operations activities at the PIC Tier-1: the local batch computing service and its Grid interface has fully transitioned to HTCondor, a software that specializes in implementing High Throughput Computing services on massively distributed environments. A lot of effort has been done also in benchmarking and optimizing the storage systems setup to boost performance and efficiency. Sustained average data rates of 600MB/s were achieved from CERN to the tape archive at PIC.

WLCG is exploring the use of supercomputing facilities (HPC) for HL-LHC. HPC systems could provide an important contribution to perform the large simulations required for analysis. ATLAS researchers from IFAE and IFIC successfully ran 650.000 hours of simulations in the MareNostrum4 supercomputer at BSC. CMS researchers from IFAE and CIEMAT are actively engaged in an R&D project to enable their distributed infrastructure to make efficient usage of HPC resources like MareNostrum4.

ATLAS Tier-2 and the Tier-3 Interactive Analysis Facility

As a complement to the Tier-1 installation, the LHC group also provides resources to the ATLAS Spanish Tier-2 infrastructure, specialized in the data analysis and simulation production.

In numbers, the contribution of the IFAE through PIC in 2018 to the Spanish Tier-2 of ATLAS has been of 6.5 million CPU hours and 940 TB of disk storage, delivered with measured reliabilities above 99%.

IFAE ATLAS physicists have access to a Tier-3 analysis facility also hosted at PIC. PIC is not only the largest LHC computing center in Spain, but also it is the only one providing the full range of data services end-to-end: from the archive of detector RAW data to the final user analysis for publication.

Astrophysics and Cosmology

Services offered include the CosmoHub web platform for analysis and distribution of massive cosmology data sets, the main Data Center for the Major Atmospheric Gamma Imaging Cherenkov Telescopes (MAGIC), the Spanish Science Data Center for the Euclid Ground Segment and, therein, the development of image simulations for the Euclid space mission, the integral treatment of data from the Physics of the Accelerating Universe (PAU), data quality analysis of the Dark Energy Survey (DES) and support for the exploitation and distribution of the simulations of the Marenostrum Institut de Ciències de l'Espai (MICE) project.

MAGIC

PIC provides data transfer from La Palma-Roque de los Muchachos Observatory, computing, data management and analysis. PIC continued the activities regarding the cloud usage for MAGIC data reprocessing in 2018.

CTA/LST1

PIC provides distributed computing services for CTA simulation production. This task is developed in collaboration with the French IN2P3 Institute.

PIC started working officially for the LST1 collaboration in order to provide data management and computing support for the first CTA telescope, becoming the official Data Managers for LST1. PIC provided a data distribution system to support the LST camera assembling and testing activities. An important accomplishment was the implementation of a single-sign-on mechanism to enable authenticated access for users to the IT Container in the observatory, which provides computing and storage resources for LST1. In terms of data analysis support, PIC has introduced computing containers technology based on Singularity for the job executions. This was presented in the first LST Analysis Bootcamp hosted in Legnaro in November.

PAU

In 2018, PIC, as PAU Survey data center, continues to be fully operative. During the 2018A and B observation periods, data have been automatically transferred from WHT in La Palma to PIC. Analysis pipelines developed at PIC in collaboration with ICE have been run several times for optimization. As part of the optimization process, we carried out tests for the integration of the PAU pipelines with the Apache Hadoop platform available at PIC. External projects using PAUCam access PIC storage to retrieve their data. The total volume of PAU data (raw and reduced) stored on tape at PIC reached ~40 TB at the end of the year.

IFAE researchers at PIC started a new data analysis framework using Machine Learning algorithms running on GPUs. Preliminary results for star-galaxy separation and background estimation look promising. Further exploration of these techniques is foreseen for 2019.

Euclid

During 2018, PIC continued carrying on the activities derived from PIC’s role as the Spanish Science Data Center (SDC-ES) and member of the Organizational Unit for Simulations (OU-SIM), responsible for developing the Simulation Processing Function (SIM-PF).

2018 was marked by the production of specific galaxy mock catalogs for the Euclid footprint and the preparation of image simulations for the Science Challenge 456. This challenge is to be run across the Euclid Ground Segment in early 2019. Development procedures and operation protocols for the SGS have been significantly improved.

CosmoHub

CosmoHub is a web portal for analysis and distribution of massive cosmology and astrophysics data sets developed at PIC. Hosting over 3.500 catalogs generated by more than 250 users from all over the world, it is a very popular tool among researchers. Because of its success, there are plans for a new version with increased functionality. The new design will include several analysis pipelines from the separately developed SciPIC framework and offer its usage to the CosmoHub user community.

This activity is part of a proposal to the European Research Council Consolidator Grant. The proposal is led by our collaborators from ICE and will be under evaluation in 2019.