SUMMARY This paper covers new architectures, technologies, and performance benchmarking together with prospects for high productivity and high performance computing enabled by photonics. The exponential and sustained increases in computing and data center needs are driving the demands for exascale computing in the future. Power-efficient and parallel computing with balanced system design is essential for reaching that goal as should support ~billion total concurrents and ~billion core interconnections with ~exabyte/bisection bandwidth. Photonic interconnects offer a disruptive technology solution that fundamentally changes the computing architectural design considerations. Optics provide ultra-high throughput, massive parallelism, minimal access latencies, and low power dissipation that remains independent of capacity and distance. In addition to the energy efficiency and many of the fundamental physical problems, optics will bring high productivity computing where programmers can ignore locality between billions of processors and memory where data resides. Repeaterless interconnection links across the entire computing system and all-to-all massively parallel interconnection switch will significantly transform not only the hardware aspects of computing but the way people program and harness the computing capability. This impacts programmability and productivity of computing. Benchmarking and optimization of the configuration of the computing system is very important. Practical and scalable deployment of photonic interconnected computing systems are likely to be aided by emergence of athermal silicon photonics and hybrid integration technologies.

key words: computing, data centers, optical interconnects, silicon photonics

1. Introduction

Ubiquitous computing and cloud computing have become an essential part of our daily lives, and we rely increasingly on these computers for everything from healthcare and climate predictions to entertainment and shopping. In the healthcare sector alone, we are seeing rapid transitions in data processing from two-dimensional images to three-dimensional or hyper-spectral real-time 3D images. However, today’s data centers and computing systems have reached scalability power limitations to scale further. Typical data centers are consuming megawatts of power, and the desire to realize exascale computing is seriously challenged by the power limitations projected to approach 0.5 GW power consumption for Exascale computing.

Figure 1 shows the rapid exponential growth of supercomputer performance based on Top500.org where LINPACK benchmarking was used.

Table 1 summarizes the top 10 on the TOP500 list, and our current aim at exascale computing by year 2020 is unrealistic unless the energy consumption for operation (Energy/Rmax) go at least an order of magnitude below the cur-

Table 1 List of Top 10 supercomputers [1].

<table>
<thead>
<tr>
<th>#</th>
<th>Site</th>
<th>Manufacturer</th>
<th>Computer</th>
<th>Country</th>
<th>Rmax (PFLOPS)</th>
<th>Speed (PFLOPS)</th>
<th>Power (MW)</th>
<th>Energy (MWh/TFLOPS)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>National University of Defense Technology</td>
<td>NGC</td>
<td>Tianhe-2</td>
<td>China</td>
<td>33.9</td>
<td>54.3</td>
<td>17.8</td>
<td>0.53</td>
</tr>
<tr>
<td>2</td>
<td>Oak Ridge National Laboratory</td>
<td>Cray</td>
<td>Titan</td>
<td>USA</td>
<td>17.6</td>
<td>27.1</td>
<td>8.21</td>
<td>0.47</td>
</tr>
<tr>
<td>3</td>
<td>Lawrence Livermore National Laboratory</td>
<td>IBM</td>
<td>Sequoia</td>
<td>USA</td>
<td>17.1</td>
<td>20.1</td>
<td>7.99</td>
<td>0.46</td>
</tr>
<tr>
<td>4</td>
<td>RIKEN Advanced Institute for Computational Science</td>
<td>Fujitsu</td>
<td>K Computer</td>
<td>Japan</td>
<td>10.5</td>
<td>11.3</td>
<td>12.7</td>
<td>1.22</td>
</tr>
<tr>
<td>5</td>
<td>Argonne National Laboratory</td>
<td>IBM</td>
<td>A shift</td>
<td>USA</td>
<td>8.6</td>
<td>10.1</td>
<td>3.95</td>
<td>0.46</td>
</tr>
<tr>
<td>6</td>
<td>Swiss National Supercomputing Centre (CSCS)</td>
<td>Cray Inc.</td>
<td>Piz Daint Cray XE6</td>
<td>Switzerland</td>
<td>6.3</td>
<td>7.8</td>
<td>2.13</td>
<td>0.37</td>
</tr>
<tr>
<td>7</td>
<td>Texas Advanced Computing Center/UT</td>
<td>Dell</td>
<td>Stampede</td>
<td>USA</td>
<td>5.2</td>
<td>6.5</td>
<td>4.51</td>
<td>0.87</td>
</tr>
<tr>
<td>8</td>
<td>Forschungszentrum Jülich (FZJ)</td>
<td>IBM</td>
<td>Juqueen BlueGene/Q</td>
<td>Germany</td>
<td>5</td>
<td>5.9</td>
<td>2.3</td>
<td>0.46</td>
</tr>
<tr>
<td>9</td>
<td>Lawrence Livermore National Laboratory</td>
<td>IBM</td>
<td>Sequoia BlueGene/Q</td>
<td>USA</td>
<td>4.3</td>
<td>5</td>
<td>1.97</td>
<td>0.46</td>
</tr>
<tr>
<td>10</td>
<td>Leibniz Rechenzentrum</td>
<td>IBM</td>
<td>SuperMUC</td>
<td>Germany</td>
<td>2.9</td>
<td>3.2</td>
<td>3.52</td>
<td>1.21</td>
</tr>
</tbody>
</table>

* Tianhe-2 supercomputer power number did not include the power of the cooling system, and the actual power consumption is estimated to be 30% higher than shown. K-computer as tested in this Top500.org data utilizes electrical interconnects only.
current level at ∼0.5 nJ/FLOP so that sustained exascale computing can be realized at below 10 MW average power consumption. Koomey [2], [3] noted in 2010 that the energy efficiency of computing doubles nearly every 1.5 years, but as Fig. 2 illustrates, the Koomey's law is starting to break down as energy efficiency starts to flatten. At the device level, the Dennard's law in 1974 [4] describing MOSFET scaling rules for obtaining simultaneous improvements in transistor density, switching speed and power dissipation [4] to follow the Moore's Law [5], thus doubling every 20 months, has already become obsolete in 2004. (It is often incorrectly quoted as a doubling of transistors every 18 months, as David House, an Intel Executive, gave that period to chip performance increase. The actual period was about 20 months for doubling the number of transistors).

Figure 3 illustrates the continuation of the Moore’s law while the Dennard’s law has stopped to keep pace with the Moore’s law beyond 2004. As the device dimensions such as gate oxide thickness reduce to several atomic layers, tunneling and leakage current become significant. The Inter-
national Technology Roadmap for Semiconductors (ITRS) [7] refers to this as a ‘red brick wall’ as there is no known technology solution beyond 2016 when CMOS scaling is expected to stop (note that CMOS power density scaling already stopped in 2004).

Hence, while the number of transistors continue to increase, new solutions for improving energy efficiency is no longer about increasing the clock speed in a large single processor, but rather by introducing many small processors running at moderate clock speeds and interconnecting them in a way to support massive concurrencies in parallel at an optimized system performance design configuration. The Amdahl’s law [8] suggests that a parallel computing system with balanced processing, memory, and communications performs best across most applications. This indicates that for an optimized exascale computing system, we expect to need 1 exabyte/second, or 8 million terabit/second bisection bandwidth. Furthermore, as Fig. 4 and Fig. 5 illustrated, the number of cores and the total concurrency are increasing exponentially, now approaching or exceeding 1 million in a large computing and data systems. Unfortunately, currently most of the computing systems are 10x~100x misbalanced and deprived of sufficient interconnection bandwidths and parallelism.

It is evident that electronics alone will not be able to meet the requirements of future computing supporting the
massive bandwidth, concurrency, and connectivity requirements scaling to \( \sim \) billion congestions and cores for exascale computing. Photonic interconnects offer a disruptive technology solution that fundamentally changes the computing architectural design considerations. Optics provide ultra-high throughput, massive parallelism, minimal access latencies, and low power dissipation that remains independent of capacity and distance. In addition to the energy efficiency, many of the fundamental physical problems of interconnects are directly addressed within the optical technology platform, including precise clock distribution [9], bit rate transparency, and power reduction, without concerns for [10] impedance, crosstalk, voltage isolation, pin induction, signal distortion, and repeater-induced latency. Exciting opportunities exist in wavelength routing to reconfigure the high-capacity connectivity of multiple wavelengths to reduce contentions and increase system-wide throughput [11],[12]. Recent advances in silicon nanophotonic technologies compatible with nanoelectronics offer new possibilities in realizing future computing systems with a very new architecture.

2. Optical Interconnects Everywhere

The opportunity for optical interconnects [13]–[16] in board-to-board and rack-to-rack communications is already well documented. Table 2 illustrates interconnection hierarchy in the computing system for typical dimensions and targeted energy and cost budget to stay competitive against electronic counterparts. The target energy was calculated for local link at each region to pursue exascale computing at below 20 MW power in the near future. ITRS Longterm Roadmap [17],[18] also indicates similar trend.

In addition to the power requirement, what remains extremely challenging is the cost target of the new photonic technologies for practical implementation of exascale computing and for market acceptance towards ubiquitous computing.

Many literatures [15],[16] already discussed the benefits of optical interconnects in detail. Figure 6 illustrates the advantage of the optical link by reducing the cost, power, and complexity compared to the electronic link. Figure 7 shows a generic optical transceiver which requires additional optical components but reduces the requirement for clock and data recovery circuits. Since electrical communication links suffer from RF skin-effects, RF losses, electromagnetic interference, and limited bandwidths, high-speed parallel data communication need to employ many electronic regenerators including equalization, reshaping, reamplification, and recalocking. On the other hand, optical datalink can achieve multi Tb/s communications on parallel wavelengths with negligible loss, crosstalk, and distortion on a fiber such that repeaters become unnecessary for typical data centers or computing centers, and the equalizers can be simpler. Further, optical clock distribution can potentially eliminate needs for many clocking circuits in each node.

<table>
<thead>
<tr>
<th>Table 2</th>
<th>Interconnection hierarchy.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rack-to-Rack</td>
<td>Board-to-Board</td>
</tr>
<tr>
<td>Distance</td>
<td>1 m~200 m</td>
</tr>
<tr>
<td>Dimension</td>
<td>~1 cm</td>
</tr>
<tr>
<td>Energy local link</td>
<td>&lt;1 pJ/b</td>
</tr>
<tr>
<td>Cost (US$)</td>
<td>~$100</td>
</tr>
</tbody>
</table>

In addition to point-to-point data link and data transmission, switching and routing of data with optical parallelism and reconfigurability greatly facilitates the goals of high-productive exascale computing. The next sections will discuss these aspects in more detail.

3. Optical Switches and Routing in Computing Systems and LIONS

As Fig. 8 indicates, typical computing and data centers utilize interconnection of various size electronic switches in many cascaded stages. Due to limitations in radix and bandwidth of the electronic switches, the inefficiency of the cascaded switch stages compounds, especially in terms of latency, throughput, and power consumption. This is one of the key reasons behind challenges of parallel pro-
Fig. 8 Typical data center interconnection topology involving various sizes of electronic switches [19].

Fig. 9 Flattened data center interconnection topology involving $N \times N$ AWGR based LION switch.

Fig. 10 $N \times N$ arrayed waveguide grating router’s (a) wavelength routing property ($N = 5$ example), and (b) wavelength assignment table (figures courtesy of NEL).

Fig. 11 The system diagram of the proposed rack to rack LIONS optical switch, LD: Label Detector; OLG: Optical Label Generator; PE: Packet Encapsulation; LE: Label Extractor; FDL: Fiber Delay Line; TX: Transmitter; RX: Receiver; PFC: packet Format Converter; O/E: Optical-to-Electrical converter; E/O: Electrical-to-Optical Converter [22].

gramming because the locality constraints for data become very important. On the other hand, optical parallelism and wavelength routing capability of arrayed waveguide grating routers (AWGRs) can collapse the entire network to a single hop and flattened interconnection topology while supporting all-to-all interconnection as shown in Fig. 9.

Arrayed waveguide grating router (AWGR) [20], [21] is a passive wavelength routing component that enables all-optical switching. The main difference between the electronic crossbar switch and the AWGR switch is that (a) the $N \times N$ AWGR switch provides simultaneously fully connected all-to-all interconnection thus providing $N^2$ simultaneous connection links, (b) there is no switching component within the core of the AWGR switch fabric (AWGR is a passive element), (c) the switching component scales linearly as the number of connection links supported by the switch, and (d) the AWGR supports parallel wavelength interconnection covering huge (> 20THz) optical bandwidths. Figure 10 illustrates the (a) wavelength routing property ($N = 5$ example) and (b) wavelength assignment table for switching of an AWGR. As Fig. 10 demonstrates, the well-known wavelength routing property of the $N \times N$ AWGR supports simultaneous and non-blocking interconnections of each of $N$ input ports with all of $N$ output ports by using $N$ wavelengths. For the $N = 5$ example, there are $N^2 = 25$ simultaneous interconnections. Each input port can utilize a wavelength (instead of electronic headers) to address the output port. For this reason, any input port can use a tunable transmitter to address each output port by tuning the transmitter to the corresponding wavelength without contention if each output port is equipped with a wavelength demultiplexers and $N$ receivers. Alternatively, the $N \times N$ AWGR can be used in all-to-all interconnection without contention by employing $N$ transmitters and $N$ receivers at each port [22].

Figure 11 shows an AWGR based hybrid interconnecting architecture, referred to as lightwave interconnect optical network switch (LION switch, or LIONS), which also includes optical channel adapters (OCA), tunable wavelength converters (TWC), loopback buffers and the control plane for the switching fabric. LIONS uses label switching with the optical label transmitted on a different wavelength.

We compared the performance of rack to rack LIONS with other state-of-art rack to rack switches including the electrical switching network architecture employing flattened butterfly topology, the IBM-Corning’s Optical Packet Switch for Supercomputers (OSMOSIS) and the Data Vortex optical switch [23]–[26]. Figures 12(a) and (b) show the effective bandwidth and the end-to-end latency of LIONS comparing with a flattened butterfly network under a 32- and 128-node for a message size of 128 bytes. The average end-to-end latency of the flattened butterfly network increases much faster than that of the LIONS system under moderate network load. The flattened butterfly network saturates more easily with increasing network size, while the latency of LIONS is almost independent of the size of the network. The effective bandwidth comparison shows that LIONS can support heavy network load of up to 90% and beyond. We simulated the case where the LIONS has a radix of 64 and the message size is 256 bytes (the packet size is 324 bytes) so that the setting is comparable with that of OSMOSIS demonstrator. At the load of 90%, the latency of LI-
Figure 12 LION switch comparison with other switches. (a) the effective bandwidth LIONS comparing with a flattened butterfly network under a 32- and 128-node for a message size of 128 bytes, (b) the end-to-end latency of LIONS comparing with a flattened butterfly network under a 32- and 128-node for a message size of 128 bytes, (c) the effective bandwidth comparison for LIONS vs. OSMOSIS, and (d) the effective bandwidth comparison for LIONS vs. Data Vortex [22].

Figure 13 Working principle of all-optical token where saturable reflector (reflective-SOA) reflects the first packet with high gain while others with lower saturated gain [28], [29]. (TD: token detector).

ONS is still less than 190 ns. In comparison, the minimum achievable latency of OSMOSIS is above 700 nanoseconds by only considering data path delay as well as STX arbitration [27]. Figure 12(c) shows the effective bandwidth comparison. LIONS achieves a little higher effective bandwidth than OSMOSIS, because the InfiniBand label is smaller than the total overhead of the OSMOSIS. Figure 12(d) shows the effective bandwidth comparison. Notice that Data Vortex saturates before the uniform network load reaches 0.5 [27]. Clearly LIONS outperforms both OSMOSIS and Data Vortex in terms of effective bandwidth. The results also indicate label rate actually affects latency performance of LIONS, higher label rate can reduce the end-to-end latency.

The control plane in Fig. 11 can be distributed by all-optical token and accelerated by all-optical negative acknowledgment mechanisms, combined into all-optical TONAK described in Refs. [28], [29]. Figure 13 illustrates the working principle of all-optical token where saturable reflector (reflective-SOA) reflects the first packet with high gain while others with lower saturated gain. By placing the R-SOAs at each receiver, there can be as many parallel tokens available in the network as the number of receivers in the network. Thus arbitration can be achieved in parallel without requiring the centralized control plane.

The LIONS described in Fig. 11 can be a top-of-the-rack switch interconnecting many racks or supernodes at the top hierarchy mentioned in Table 2. As Fig. 14 [30] illustrates, a passive $N \times N$ AWGR without tunable lasers can interface $N$ compute nodes or processors to support many parallel interconnections up to $N^2$. Figure 14 includes a centralized control plane for illustration, but it will be replaced with a distributed control plane using the all-optical TOKEN technique [28]. Each node has a transmitter array that uses $k_t$ ring modulators to generate the data packets and $k_r$ receivers with ring filter to receive packets. An off-chip comb generator provides the $N$ wavelengths required by the cyclic frequency AWGR for wavelength routing. We restrict the number of rings on each bus waveguide to a fix number $k_t$ and $k_r < N$. This passive LIONS can be in multiple hierarchy with interfaces between the same and higher hierarchy as Fig. 15 illustrates for chip-to-chip hierarchy and board-to-board hierarchy.

Figure 16 shows the performance study of the proposed architecture based on a $64 \times 64$ AWGR under various configurations [30]. Even for moderately small $k_t$ and $k_r$ at values 2 and 4, there are substantial improvements over the case of 1, equivalent to the case of electronic counterpart. For single-transmitter/receiver-pair configuration ($k_t = 1$ and $k_r = 1$), the presence of the virtual output queues (VOQ) greatly improves the system performance in all three aspects, but system equipped with multiple transmitter/receiver pairs ($k_t \geq 2$ and $k_r \geq 2$) still performs better especially in terms of end-to-end latency at high (> 90%) input load. Without VOQ, multiple-transmitter/receiver pairs provide significant boost in performance due to the increased statistical multiplexing and the enhanced instantaneous rate at each AWGR inputs and outputs. We observe zero packet loss and 100% throughput for all the cases where $k_t \geq 2$ and $k_r \geq 2$.

Figure 17 shows a photograph of a fabricated silicon
photonic $N \times N$ LIONS chip for $N=4$, $k_t=4$ and $k_r=4$ [30]. Figure 18 shows the experimental bit-error-rate measurements for (a) 4-by-1 routing and for (b) 1-by-4 routing demonstration [30] on this chip. Error free and successful switching and interconnection have been achieved.

AWGRs can scale to 2010 ports [31] and have been demonstrated on a silicon photonic chip for $512 \times 512$ [32]. LIONS expect achieve orders of magnitude increases in switching capacity, connectivity, throughput, and latency compared to electronic counterparts while providing all-to-all connectivity without contention.

4. Silicon Photonic Integrated Circuits and Athermalization

Silicon photonics exploits the CMOS infrastructure built up on many billions of dollars of investments in the past. While the photonic devices are quite large in physical size and fabrication resolution compared to the electronic devices, thus the number of devices to be integrated in the die can be orders of magnitude lower, the uniform and well-established CMOS fabrication platform can contributed greatly to manufacturability of photonic components for future computing. Initially, silicon photonics stirred speculations that the future CMOS will combine photonic integration and electronic integration, which expect to bring significant impact by (a) offering intelligent data processing and storage capabilities of electronics together with high capacity and parallelism of photonics, or by (b) realizing signal processing in the photonic domain. However, challenges of photonic-electronic integration lie in the process compatibility between photonic and electronic ICs, isolation, crosstalk, yield, and heat density. The groups at Luxtera, IBM, and MIT/Micron have recently and independently demonstrated electronic-photonic integration on a silicon CMOS platform. Figure 19 shows silicon photonic transmitters and receivers realized on the same electronic platform as silicon CMOS [33].

More practical photonic-electronic integration would be to pursue hybrid integration by die bonding or wafer bonding. As Fig. 20 illustrates, 3D photonic-electronic in-
Fig. 19  Silicon photonic transmitters and receivers realized on the same
electronic platform as silicon CMOS [33].

Fig. 20  Future 3D processor consisting of silicon photonic interconnect
plane, memory plane, and processor plane together with an external optical
frequency comb source.

integration can build up on the already active 3D electronic
integration with through-silicon-vias (TSVs).

However, silicon photonics must maintain nearly con-
stant optical response across the operating temperature
range. The high temperature sensitivity of silicon mate-
rial (thermo-optical coefficient of silicon is $1.81 \times 10^{-4}$ K$^{-1}$
which is 20 times greater compared to that of silica) makes
this challenging.

Recently, we have realized [34] athermalization of sili-
con photonic waveguides by using a CMOS compatible pro-
cess exploiting titanium oxide overcladding. Figure 21 illus-
trate fabricated athermal micro-resonator ring modulators,
and Fig. 22 show experimental testing of the fabricated de-
vices with various waveguide widths showing nearly perfect
athermalization as well as blue shift and red shift depending
on the waveguide width.

5. Benchmarking Evaluations

New computing paradigms and architectures must be put
to test under actual workloads. Figure 23 shows the com-
parison between the performance of an AWGR based all-
to-all fully connected network, that of the AWGR based
LIONS switch with three variations, and that of an elec-
tronic flattened-butterfly-network (with a large number of
switches). The giga-updates per second (GUPS) bench-
marking shows the AWGR based all-to-all interconnection
results in a 16-fold increase and approaches the theoreti-
cal maximum GUPS. GUPS benchmarking is well suited
for applications that require low latency and processing on
many small data packets such as sorting. TOP500 used
LINPACK benchmarking that utilized numerical linear al-
gebra, which is not very memory or communication (inter-
connect) intensive. GRAPH500 utilized Graphs which are
a core part of most analytics workloads that better repre-
sent data-intensive applications. Since such workloads re-
quire much more communication and access with memory,
the GRAPH500 ranking in Table 3 differ significantly from
Table 1. The computing systems with better balance and
higher memory bandwidth emerged (e.g. IBM BlueGene/Q)
compared to the others (e.g. Tianhe-2).

Future computing systems must be compared in the
context of expected workloads and optimize based on the
workloads towards most efficient design, configuration, and

Fig. 21  fabricated devices: a) cross-section schematic of the phase tun-
ing section in the ring resonator after filling the trench with TiO$_2$; b) SEM
image of the ring resonator with 275 nm wide waveguide, prior to TiO$_2$
deposition; c) magnified view of the section of waveguide to ring directional
coupler, illustrating the principle of trenching; d) transition of trenched
waveguide to waveguide clad with SiO$_2$ (not trenched) [34].

Fig. 22  a) evidence of blue shift with temperature increase in 250 nm
wide waveguide device b) summary of measured (square markers) and fit-
ted values of resonant frequency shifts for different waveguide width de-
vices. Inset marks the waveguide width [34].

Fig. 23  GUPS benchmarking for different interconnects.
operation.

6. Conclusion

The exponential and sustained increases in computing and data center needs are driving the demands for exascale computing in the future. Power-efficient and parallel computing with balanced system design is essential for reaching that goal as should support ∼ billion total concurrencies and ∼ billion core interconnections with ∼ exabyte/second bisection bandwidth. Photonic interconnects offer a disruptive technology solution that fundamentally changes the computing architectural design considerations. Optics provide ultra-high throughput, massive parallelism, minimal access latencies, and low power dissipation that remains independent of capacity and distance. In addition to the energy efficiency and many of the fundamental physical problems, optics will bring high productivity computing where programmers can ignore locality between billions of processors and memory where data resides. Repeaterless interconnection links across the entire computing system and all-to-all massively parallel interconnection switch will significantly transform not only the hardware aspects of computing but the way people program and harness the computing capability. This impacts programmability and productivity of computing. Benchmarking and optimization of the configuration of the computing system is very important. Practical and scalable deployment of photonic interconnected computing systems are likely to be aided by emergence of athermal silicon photonics and hybrid integration technologies.

Acknowledgments

The author acknowledge contributions from colleagues and researchers at UC Davis and support in part from DoD ACS project W911NF-13-1-0090.

References


Table 3  GRAPH 500 benchmarking results ranking top 10 differently from Table 1.

<table>
<thead>
<tr>
<th>Rank</th>
<th>System</th>
<th>CPU</th>
<th>Memory</th>
<th>Bandwidth</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>LLNL Sequoia (IBM BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>183.8 GB</td>
</tr>
<tr>
<td>2</td>
<td>Argonne National Laboratory Mira (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>3</td>
<td>J. F. Cray (Rigel - Infinib)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>384 GB</td>
</tr>
<tr>
<td>4</td>
<td>Fermi (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>226 GB</td>
</tr>
<tr>
<td>5</td>
<td>Tianhe-2 (Millicom)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>6</td>
<td>Teraflop Systems</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>7</td>
<td>CIRRAK (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>8</td>
<td>Saber (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>9</td>
<td>Avoca (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
<tr>
<td>10</td>
<td>Leibniz (IBM - BlueGene/Q, Power QIC L 1.30 HZ)</td>
<td>IBM BlueGene/Q</td>
<td>16TB</td>
<td>148.1 GB</td>
</tr>
</tbody>
</table>


