SLAC’s new x-ray laser data system will process one million frames per second
When the Department of Energy’s SLAC Accelerator National Lab’s x-ray laser upgrades are complete, the powerful new machine will capture up to 1 terabyte of data per second; that’s a data rate equivalent to streaming about a thousand full movies in a single second and analyzing every frame of every movie as they zoom into that mode. super fast forward.
Data experts in the lab are finding ways to manage this huge amount of information as Linac Coherent Light Source (LCLS) upgrades come online over the next several years.
LCLS accelerates electrons to almost the speed of light to generate extremely bright x-ray beams. These x-rays probe a sample such as a protein or quantum material, and a detector captures a series of images that reveal the atomic movement of the sample in real time. By combining these images, chemists, biologists and materials scientists can create molecular films of events such as how plants absorb sunlight or how our drugs help fight disease.
As the LCLS is upgraded, scientists go from 120 pulses per second to 1 million pulses per second. This will create a 10,000 times brighter X-ray beam that will allow further studies of systems that could not be studied before. But it will also come with a huge data challenge: the x-ray laser will produce hundreds to thousands of times more data per given period of time than before.
To manage this data, a group of scientists led by Jana Thayer, director of LCLS’s data systems division, are developing new computational tools, including computer algorithms and ways to connect to supercomputers. Thayer’s group uses a combination of computer science, data analysis, and machine learning to determine patterns in x-ray images and then chain together a molecular film.
Go with the flow
At LCLS, data flows continuously. “When scientists have access to an experiment, it’s either a 12-hour day or a 12-hour night, and limited to a few shifts before the next team arrives,” says Ryan Coffee, senior scientist. of SLAC. To efficiently use precious experimentation time, bottlenecks must be completely avoided in order to preserve the flow of data and its analysis.
Streaming and storing data poses a significant challenge to network and computing resources, and being able to monitor data quality in near real time means that data needs to be processed immediately. An essential step in making this possible is to reduce the amount of data as much as possible before storing it for further analysis.
To enable this, Thayer’s team implemented on-the-fly data reduction using multiple types of compression to reduce the size of the recorded data without affecting the quality of the scientific output. A form of compression, called a veto, rejects unwanted data, such as images where x-rays have missed their mark. Another, called feature extraction, records only scientifically important information, such as the location and brightness of a point on an X-ray image.
âIf we were to record all the raw data, like we’ve done so far, it would cost us a quarter of a billion dollars a year,â says Thayer. âOur mission is to understand how to reduce data before we write it. One of the really interesting and innovative elements of the new data system that we have developed is the data reduction pipeline, which removes irrelevant information and reduces data that needs to be transferred and stored. “
Coffee says, “Then you save a lot of power, but more importantly, you save on throughput. If you have to send the raw data over the network, you’re going to completely overwhelm it trying to send images every microsecond. “
The group also created an intermediate place to put data before it was stored. Thayer explains, âWe can’t write directly to storage, because if there is a problem in the system, it has to pause and wait. Or if there is a network problem, you may lose data completely. small but reliable buffer that we can write in; we can then move the data to permanent storage. “
Thayer points out that the data system is designed to provide researchers with the results of their work as quickly as the current system, so that they get information in real time. It is also built to accommodate the expansion of LCLS science for the next 10 years. The big challenge is to keep up with the huge leap in data throughput.
âIf you imagine going from scanning 120 frames per second to 1 million per second, it takes a lot more scrolling,â she says. âComputing is not magic – it always works the same way – we just increase the number of brains working on each of the images. “
Supported by a recent DOE award and in collaboration with colleagues at the DOE National Laboratory Complex, the team is also looking to incorporate artificial intelligence and machine learning techniques to further reduce the amount of data to be processed and for point out interesting features. in the data as it arises.
To understand the challenge of LCLS data, Coffee makes an analogy with self-driving cars: âThey have to calculate in real time: they can’t analyze a batch of images that have just been recorded and then say, ‘We predict you would have. had to turn left on frame number 10. “The SLAC’s data rate is much higher than any of these cars, but the problem is the same: Researchers must orient their experiment to find the best destinations. more exciting! “
The upgrades driving this massive leap in data throughput and performance will come in two phases over the next several years, including LCLS-II and a subsequent high-energy upgrade. The work of data experts will ensure that scientists can take full advantage of both. âUltimately this will have a dramatic effect on the kind of science we can do, opening up opportunities that are not possible today,â said Coffee.
Upgraded x-ray laser shows its soft side
Provided by SLAC National Accelerator Laboratory
Quote: SLAC’s new x-ray laser data system will process one million frames per second (2021, February 18) retrieved on September 18, 2021 from https://phys.org/news/2021-02-slac- x-ray-laser-million -images.html
This document is subject to copyright. Other than fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.