# ATCA Fast Data Acquisition and Processing System for JET Gamma-Ray Cameras Upgrade Diagnostic

<u>R. C. Pereira<sup>1</sup></u>, A. M.Fernandes<sup>1</sup>, A. Neto<sup>1</sup>, J. Sousa<sup>1</sup>, A.J. Batista<sup>1</sup>, B.B. Carvalho<sup>1</sup>, C. M. B. A. Correia<sup>2</sup>, C. A. F. Varandas<sup>1</sup> and JET-EFDA contributors\*

Abstract- Nuclear reaction gamma-ray diagnosis is one of the important techniques used for studying confined fast-ions. The Joint European Torus (JET) gamma-ray camera diagnostic provides information on the spatial distribution of fast ions. The system is currently being upgraded and should allow gamma-ray image measurements in high power deuterium JET pulses, and eventually in deuterium-tritium discharges. In order to fully exploit the diagnostic capabilities it is mandatory to develop a reliable, maintainable, multi-channel spectroscopy data acquisition and real-time processing (DAQP) system, which shares much of the common development for other specific implementation like Gamma-ray spectroscopy. The DAQP system is based on the Advanced Telecommunications Computing Architecture (ATCA) and contains a 6 GFLOPS x86based control unit and three transient recorder and processing (TRP) modules, to cope with the two arrays of collimators (10 horizontal + 9 vertical lines of sight), interconnected through PCI Express (PCIe) links. Each TRP module features 8 channels of 13 bit resolution sampling at 250 MHz, 4GByte of local memory and two field programmable gate arrays able to perform complex trigger managing modes and allowing real time analyses (pulse height analyzer and pile-up discrimination), minimizing data storage and transfer issues.

The DAQP system aims at overcoming the problem of storing large amount of data during long discharges. A raw/processed mode is being developed where the acquired raw data follows two parallel paths: besides being directly stored in the on-board memory, it is processed and streamed in real-time through PCIe links. This procedure is expected to greatly reduce the amount of data and possible allow continuous operation of the diagnostic. During commissioning and when data validation is required, the 4 GB raw data will be executed on the x86 control unit through a well known algorithm and the result cross checked with the processed data.

*Index Terms*— Data Acquisition, Data Compression, Gamma-rays, Real time.

#### I.INTRODUCTION

'HE study of fast ions in tokamak plasmas is based on the detection and analysis of gamma-ray radiation emitted as the result of the interaction of fusion reaction products, including alpha particles or NBI ions and ICRHaccelerated ions with impurities such as carbon and beryllium [1, 2, 3].

Spatial distribution of the gamma-ray emission sources in the energy range greater than 1MeV at JET plasma is measured using a gamma-ray camera (GRC) consisting of two arrays of collimators, vertical and horizontal, with 9 and 10 lines of sight, respectively. Each line of sight uses a CsI(Tl) photo-diode coupled to the photo multiplier tubes (PMT). Experimental data obtained from the 19 lines of sight allows tomographic reconstruction of the local gamma-ray emissivity in a poloidal cross-section. This gamma-ray camera belongs to the fast-electron-bremsstrahlung diagnostic system incorporated into a neutron profile monitor.

Presently at JET, the count rate is limited due to the obsolete electronics modules used for analog processing and data acquisition. This paper presents a DAQP system that aims at replacing the current electronics for JET GRC, with fast TRP modules having real-time pulse processing capabilities. This approach enables sophisticated analysis/data reduction in real-time using PHA and pile-up detection algorithms. Online data processing is expected to reduce data storage and transfer issues while keeping the relevant information. This improvement aims at increasing performance in energy resolution dynamic range and count rate to more than 2 MHz.

Manuscript received May 8, 2009. This work has been carried out in the frame of the Contract of Association between the European Atomic Energy Community and Instituto Superior Técnico (IST) and of the Contract of Associated Laboratory between Fundação para a Ciência e Tecnologia (FCT) and IST. The content of the publication is the sole responsibility of the authors and it does not necessarily represent the views of the Commission of the European Union or FCT or their services.

R.C. Pereira, A.M. Fernandes, A. Neto, J. Sousa, A.J. Batista, B.B. Carvalho, C.A.F Varandas, are with the Associação EURATOM/IST Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade Técnica de Lisboa, 1049-001 Lisboa, Portugal (corresponding author to provide phone: 00351-239-410108; fax: 00351-239-829158; e-mail: ritacp@lei.fis.uc.pt).

C.M.B.A.Correia, is with Associação EURATOM/IST Centro de Electrónica e Instrumentação, Dept. de Física, Universidade de Coimbra, 3004-516 Coimbra, Portugal (e-mail: correia@lei.fis.uc.pt).

<sup>\*</sup> See the Appendix of F. Romanelli et al., Fusion Energy Conference 2008 (Proc. 22st Int. Conf. Geneva, 2008) IAEA (2008)).

## II.SYSTEM DESCRIPTION



Figure 1: DAQP system diagram block. There are three TRP boards with 19 channels to cope with the system lines of sight. All channels are acquired simultaneously. The controller is responsible to arm the acquisition and to retrieve data to the system hard disk.

The DAQP system is composed by a set of three TRP boards based on the PICMG 3.0 ATCA standard [4,5], orchestrated by a central controller unit using PCIe x1 links. The boards are enclosed in a 14-slot ATCA shelf (sub-rack), allowing for the possibility of at a later stage sharing the controller and sub-rack with a similar set of boards for fast electron bremstrahlung diagnostic (FEB), fig.1.

The PCIe interconnect architecture defines three types of devices: root complex, switch and endpoint. The CPU, system memory and graphics controller connect to a root complex, which is roughly equivalent to a PCI host bus. Due to PCIe point-to-point nature, switch devices are necessary to expand the number of system functions. PCIe switch devices connect a root complex device on the upstream side to endpoints on the downstream side.

The mains components of the controller are a standard x86 motherboard mounted on a carrier board that provides the connection between the motherboard and the PCIe links in the ATCA backplane. The connection between the controller and

the boards is enabled by three PCIe switches located in the carrier board [6]. This board also powers the motherboard from the ATCA shelf. The motherboard contains 2 GB of DDR RAM and a multi-core Intel Quad® processor.



Figure 2: FS node application for gamma ray camera diagnostic. A large set of parameters can be configured, for instance: (i) Mode of acquisition (raw, segmented -pulse event, processed, and calibration mode), (ii) Trigger type (internal/external); (iii) Number of bytes to acquire; (iv) Pulse parameters (used on the segmented acquisition - pulse width and pre-trigger number of samples), and (v) Processing parameters.

The controller is running Linux, with a kernel version 2.6.24, as the operating system. In order to interact with the boards a native Linux char device driver was developed [7]. Commands are sent by user programs by issuing standard *ioctls* system calls to the driver, responsible by forwarding the command to the boards by writing in the PCIe bus. The same procedure is used to read board registers and configuration values. In what concerns data readout, the read system call is used. When the driver receives a read request, it raises a bit in the board register asking for data and waits in a thread safe queue. Upon the transmission of data, using direct memory access (DMA), the board issues an interrupt. Finally, in the interrupt service routine, data is copied and the queue where the driver was waiting, signaled. An application programming interface (API) was developed based on the driver native functions, in order to ease the development of codes without requiring a deep knowledge of the low level details of the hardware

A FireSignal (FS) [8] node is installed at the controller aiming at interfacing the diagnostic sub-systems to the JET control and data acquisition system (CODAS). The HTTP data/event transport layer was chosen to integrate the diagnostics specific requirements, including the API, pulse height analysis (PHA) with pile-up resolution (PUR) and local control operation. As depicted in Fig.2, a FS node prepares the diagnostic for one plasma pulse allowing simultaneous acquisition of the 19 channels. The node uses the API referred above in order to change the board parameters. All the board parameters are described in a XML file, guaranteeing live configuration validation.

#### A.Module Architecture

The TRP board architecture is based on two acquisition blocks, each with its own control provided by a Xilinx®, virtex<sup>TM</sup> 4 family (XC4FX60) field programmable gate array (FPGA), clock distribution and memory, acting as individual PCIe endpoints.

Each FPGA is directly connected to four free-running ADC channels acting as: (i) Temporary data buffer, mandatory to store data before sending it either to on-board memory or directly through PCIe links without loosing events; (ii) Realtime event manager, which is used to know whenever a real event has occurred and has to be stored or processed; (iii) A 2 GB DDR2 (SDRAM) controller; (iv) Time stamper (TS), so that in the segmented or processed mode the user can cross check data with the occurrence time and be sure of not having lost data; (v) High speed algorithms like digital level trigger detection, PHA while resolving pile-up, and (vi) Gigabit communication interface, 1 lane PCIe link ( 2.5 GHz full duplex) provider.

The input channels are single-end, ac-coupled, with an input voltage range of 1.1V @ 50 ohms. Each ADC channel (ADS5444 from TI®) runs at 250 MHz with 13 bit resolution.

The firmware for this application allows a continuous acquisition mode, where data is continuously stored from an initial triggered start until memory is full or acquisition is disabled by software, after a user defined time interval. The stored data can be in a raw or processed form.

Data retrieval can be executed either upon pulse completion (offline) or in streamed mode (online). In offline mode it is possible to choose between raw, segmented, processed or calibration data, while in online, due to bandwidth restrictions, only processed data is available. Segmented data or pulse event data, means that instead of storing all acquired data (raw mode) only pulse events are stored. This will decrease slightly the amount of stored data, but at the expected count rates, the decrease of data isn't significant. The user must define the pulse width in number of samples, the number of samples before the pulse event detection and the threshold for pulse detection. Processed data is retrieved as time stamped pulse event energy value. Calibration data is only implemented on channel 2 for estimating the pole-zero cancellation factor necessary on the processed mode. A concurrent mode was implemented to provide results, from processing algorithms inside the FPGA, validation, where streamed energy values are cross-checked against the offline processing of the fully digitized data stored in the local memory. Offline data validation will be preformed in the x86 processor.

#### B.Master/slave boards

This GRC diagnostic has 19 lines of sight to be acquired, as each board provide 8 channels, 3 TRP boards are used. In order to allow a simultaneous acquisition of the 6 blocks (each with four channels), the root complex (host) identifies each of the six blocks as 6 individual PCIe endpoints [9].

The start/trigger (START) source and the system clock must be synchronized for the 3 boards/6 PCIe endpoints. Trigger/start and clock signals can either be internal for test purposes or externally from JET's Control Trigger and Timing System (CTTS).

External START and clock are inputs on the master board, which distributes the two signals through the ATCA backplane both to itself and to the slave boards. The selection of master/slave is automatically assigned by firmware based on reading the detected ATCA slot address.

### C. Time Stamp

The timing stamp has a 44-bit timer running at the acquisition clock and is reset with the START signal.

For a 250 MHz acquisition clock, the time span is  $T=2^{44}/250$ MHz >24 h. During this period, event occurrences (e.g. triggers, or pulse energies) or operational errors (e.g. buffer under/overflow) are time stamped and stored for later analysis.

## D.Digital Trigger detector block

A digital trigger occurs when the digitized data crosses a user defined voltage threshold and is implemented in steps of powers of two, from  $2^0$  to  $2^{12}$ . Schemes for reliable trigger in noisy or slow signals, like signal averaging were implemented, this scheme prevents from triggering in the noise level instead of low amplitude signals [10].

In a pulse event mode, every time a trigger occurs, a second trigger is inhibited during a user defined time interval, correspondent to the pulse width. Concerning processed mode, every time a trigger (pulse event) occurs during the energy processing of the previous event, this last event will be counted as event occurrence but the energy isn't calculated. The digital trigger detector block is independent for each acquisition channel.

## III.DATA PATH

Fig. 3 shows a simplified block diagram of the data path inside the FPGA.

Data from ADCs is buffered and distributed simultaneously to four distinct paths: raw, segmented, calibration and processed mode. At all modes, data is accommodated in 128 bit word in order to be stored at the DDR2 memory. At processed mode, the result is also accommodated into a 128 bit word if it is to be stored at DDR2 memory or into a 64 bit word (energy values) if it is to be streamed through PCIe link.



Figure 3: Block diagram of Data Path inside FPGA. Acquired data can follow four distinct paths (raw, segmented, calibration or processed mode), or can follow two parallel paths, processed mode streamed in real-time and the raw mode where all digitized data is stored locally.

Fig. 3 shows a simplified block diagram of the data path inside the FPGA.

Data from ADCs is buffered and distributed simultaneously to four distinct paths: raw, segmented, calibration and processed mode, with the possibility of having two parallel paths (raw and processed). At all modes, data is accommodated in 128 bit word in order to be stored at the DDR2 memory. At processed mode, the result is also accommodated into a 128 bit word if it is to be stored at DDR2 memory or into a 64 bit word (energy values) if it is to be streamed through PCIe link.

## A.Input Buffer

To cope with other diagnostics (e.g. Gamma ray spectrometry) [5], the TRP boards are also prepared to accommodate different types of ADCs, for instance a 250 MHz 13 bits ADCs or 14 bits 400 MHZ double data rate

ADCs. This latest acquisition scheme consists in sending 2 14 bits words at every edge of the acquisition clock. In order to have a more scalable and upgradeable module an input buffer was implemented on the firmware, permitting to accommodate the input signal at the most significant bits (MSB) of a 16 bit word and out putting a 64 bit word at one fourth of the input rate. This procedure is mandatory for the 400 MHz ADCs, due to the DDR ADC input interface and also due to FPGA time constraints limitations, for instance the performance and resource utilization for a first in first out memory, FIFO, varies depending on the configuration features selected during core customization, being 285 MHz a limitation for synchronous FIFOs and 365 MHz for independent clock FIFOs [11].





Figure 4: TRP-400 Operation Mode block diagram. After the operation mode is selected by the user, stream/DDR2 arbitrator from task manager will select whenever data must be streamed from internal buffers or retrieved from DDR2.

Tasks are defined as a collection of basic operations that a given block of each board can perform, as depicted in Fig. 4. Every time a START arrives to the system, a task is executed. For each block there are 3 main operating modes: (i) Data storage in the DDR2 memory with retrieval of data, through the PCIe links, once the memory is full; (ii) Energy data streaming, again using the PCIe links and finally (iii) A fully-digitized/energy data mode where the all the acquired data follows two parallel paths: directly stored in the on-board memory, processed and streamed in real-time through the PCIe links. These implementation schemes are shown in Fig. 4.

The task manager has a stream/DDR2 data arbitrator selecting whenever data must be streamed from internal buffers or retrieved from DDR2. This will be necessary for the concurrent mode where data is streamed during a user defined plasma pulse time interval. At the end of the plasma pulse time interval, data is automatically retrieved from the DDR2 memory. C.DMA Engine



Figure 5: DMA Engine diagram block.

In order to allow an in-line data flowing through PCIe x1 link, 2.5 Gbit/s (although a 4 x link is foreseen), a DMA engine is needed to build the DMA packets, taking into account the operation mode, supporting PCIe packet fragmentation, and sending the packet to the host memory. The PCIE packet payload comprises 32 words of 32 bits, 128B of payload size (the payload size is limited by the controller's motherboard – 128B, if in the future other motherboard is used with larger payload size, the PCIe switch (PEX) will limit the payload to 256 B). The DMA engine is depicted in Fig.5.

In the case of data streaming the DMA packet is of the same size of the PCIe packet to avoid any time constraint. This is particularly important if the count rate of a specific plasma pulse is particularly low, as otherwise there would be the risk of retaining events for large period of times, compromising real-time requirements. The host will be waiting for DMA packets during the plasma pulse period. For the DMA packet, the size is optimized for 4096 bytes (this value must be multiple of PCIe packet), and the host waits for DMA packets until a maximum of 2 GB of data is received.

The DMA is based on the PCIe message-signaled interrupts (MSI), every time a DMA packet is build an interruption request is sent to the host. The host will read the packet and assert the interruption request routine.

#### IV.RESULTS

Acquired data directly from the diagnostic was too noisy to apply the intended algorithm, based on a trapezoidal shaper implemented with IIR filters [6] inside the FPGA. As so, a set of tests were performed using a LaBr<sub>3</sub> detector and three calibration sources, <sup>137</sup>Cs (662 KeV), <sup>60</sup>Co(1) (1172 KeV) <sup>60</sup>Co(2) (1323 KeV), from which a spectrum was obtained using the energies values retrieved from the on-board memory.

Fig.6 depicts the spectrum built with energy values acquired with the DAQP system. The system proved to be

able to perform PHA, but this version of the algorithm isn't capable to fully cope with low energy pulses without losing some. One option to improve can consist of applying first a high pass filter followed by another approach of the trapezoidal filter to be studied.



Figure 6: Spectrum of <sup>137</sup>Cs (662 KeV), <sup>60</sup>Co(1) (1172 KeV) and <sup>60</sup>Co(2) (1323 KeV) sources applied to a LABr<sub>3</sub> detector, acquired and processed by DAQP system.

perform PHA, but this version of the algorithm isn't capable to fully cope with low energy pulses without losing some. One option to improve can consist of applying first a high pass filter followed by another approach of the trapezoidal filter to be studied.

#### V.CONCLUSION

Although the system hasn't been commissioned, several plasma pulse events were acquired during JET plasma pulse operation. Unfortunately the right algorithm isn't still developed, but once developed and proved its feasibility, ATCA system can attain a whole JET's plasma pulse without losing events. In raw data mode where all digitized data is stored on memory, four channels sampling at 250 MSPS will fill up the 2GB memory of each block in 1 s. At pulse event mode, for a maximum a 2 Mevenst/s memory fill will last slightly more (~3.7 s for 128 samples of pulse width, it increases with smaller pulse width). But for processing mode, estimating a 2 MHZ of events count rate, the memory will be filled up in ~15 s. This value doesn't cover an average JET's plasma pulse period (~30 s). A new simultaneous mode is implemented where data is being processed and streamed, during a user programmable time, through PCIe, in real-time, while, at the same time, the 2 GB are being filled with raw data, allowing off-line validation of the processed data. This mode will enable a continuous operation of the diagnostic.

#### REFERENCES

 V.G.Kiptily F.E. Cecil and S.s: Medley, "Gamma-ray diagnostics of high temperature magnetically confined fusion plasmas", Plasma Phys. Control Fusion, 48, (2006), R59-R82;

- [2] V.G.Kiptily, D.Borba, F.E. Cecil, M. Cecconello, D. Darrow, P.C. deVries, V. Goloborod'ko, K.Hill, T.Johnson, A. Murari, F. Nabais, S.D. Pinches, M. Reich, S.E. Sharapov, V.Yavorskkij, I.N. Chugunov, D.B. Gin, G. Gorini, A.E.Shevelev, V. Zoita, "Fast ion JET diagnostics:confinement and losses" AIP, 988(2008) 283.
- [3] V.G.Kiptily et al., γ-ray diagnostics of energetic ions in JET, Nuclear Fusion 42, (2002), 999-1007;
- [4] AdvancedTCA®, PICMG® 3.0 Revision 2.0, AdvancedTCA® Base Specification, March 18, 2005
- [5] R.C.Pereira, J.Sousa, A.M. Fernandes, F. Patrício, B. Carvalho, A.Neto, C.A.F. Varandas, G. Gorini, M. Tardocchi, D.Gin, A. Shevelev, "ATCA Data Acquisition system for gamma ray spectrometry", FUSION ENGINEERING AND DESIGN, VL 83, IS 2-3, pg:341-345, 2008
- [6] A. J. N. Batista, J. Sousa, and C. A. F. Varandas, "ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks,"Review of Scientif Instruments, vol. 77, no. 10, Oct 2006.
- [7] O'Reilly Media Inc. "Linux Device Drivers", Third Edition, Jonathan Corbet, Alessandro Rubini and Greg Kroah-Hartman.
- [8] A.Neto, J. Sousa, B. Carvalho, H. Fernandes, R. C. Pereira, A. M. Fernandes Varandas, G. Gorini, M.Tardocchi, D. Gin, A. Shevelev and K. Kneupner, "The control and data acquisition software for the gamma-ray spectroscopy ATCA sub-systems of the JET-EP2 enhancements" FUSION ENGINEERING AND DESIGN, VL 83, IS 2-3, PG:346-349, 2008
- [9] Mindshare, Inc, Ravi Budruk, Don Anderson, Tom Shanley, "PCI Expresses System Architecture"Addision Wesley, September 04, 2003, pp41-53
- [10] A. Combo, R. Pereira, J. Sousa, N. Cruz, P. Carvalho, C. A. F. Varandas, S. Conroy, J. Källne and M. Weiszflog, "A PCI transient recorder module for the JET magnetic proton recoil neutron spectrometer", Fusion Engineering and Design, Volume 71, Issues 1-4, June 2004, Pages 151-157
- [11] FIFO Generator v4.4 User Guide, UG175 September 19, 2008