Lasith Eranda Handapangoda
The invention of the first transistor led towards a remarkable transformation in the technical world. Especially, the invention of the transistor boosted the computational power from thousands to millions of calculations per second. Furthermore, power consumption and heat generation were decreased to a very lower level. After a few decades since the invention of the first transistor, scientists could acquire the knowledge to create micrometer-scale transistors and create a package, also called a microprocessor, which consists of thousands of transistors. Since then, the microprocessors were manufactured in a large number of units and were available to the general public. The microprocessors became very popular and started to appear in every type of digital device in late 1985. Today, one microchip contains billions of nanometer-scale transistors in less than one square inch with billions of calculations per second processing power under very low power consumption. By today, more than 15 billion microprocessors are being used all around the world.
The rapid advancement of technology and digital devices caused new concepts to emerge. Because of the current computational power, people started to capture various kinds of data from all over the world. The experts found that analyzing these data as a collection would be helpful to extract new features, that we could never reveal by considering these data as individual components. For instance, supermarkets collect their customers’ data. The analysis of these relational data may show new buying trends of their customers, where we can never identify by just analyzing the buying pattern of an individual. In the above case, an analyst with a modern computer is enough to find this information. But what about the buying trends analysis of a global marketplace like eBay? eBay has more than 181 million users and they collect most of their user activities such as item searches, transactional data, and customer services data. People cannot handle such a load of data analysis processes unless they use machine learning concepts and dedicated computer processors specially designed for the deployment of machine learning models. These breakthroughs required an enormous amount of computational power. The Cloud Tensor Processing Units (TPUs) were designed to address these issues.
What is a Cloud TPU?
A Tensor Processing Unit (TPU) is an Application Specific Integrated Circuit (ASIC) that is specially designed for handling and accelerating machine learning workloads by using TensorFlow. A cloud TPU is a huge collection of integrated TPUs also called a “TPU pod” which can be accessed through the internet as an infrastructure as a service. These cloud TPUs can operate at a very high computational power. For instance, Google’s latest version of cloud TPUs can deliver up to 11.5 petaflops (a quadrillion (thousand trillion) floating-point operations per second) of machine learning acceleration.
How does it work?
TPUs can deliver a huge computational power because of their unique infrastructure design architecture. Central Processing Units (CPUs) that we use in general computers can run at clock speeds in the gigahertz range; it may still take a long time to execute large matrix multiplications. Machine learning depends on neural networks. The process of running a trained neural network essentially requires the execution of a higher number of matrix multiplications. Typical CPUs perform these calculations by processing a single operation with each instruction in a single clock cycle. Therefore, CPUs are so-called scalar processors.
TPUs consist of a dedicated parallel matrix multiplier unit to accelerate the matrix multiplication through performing vector processing. Vector processing performs hundreds to thousands of operations in a single clock cycle.
Systolic Array Architecture
Systolic Array architecture is a completely different processor architecture from typical CPUs and GPUs (Graphics Processing Units). A typical CPU accesses its registers many times to store and retrieve the intermediate stages of a running calculation. Even though the access time for the registries is very low, it increases the processing time cumulatively by every clock cycle, the processor executes to solve the calculation. Therefore, it takes much time to perform large matrix multiplication on typical CPUs.
In systolic array architecture, the processor accesses its registries infrequently. It reads the input values and uses them for many different calculations without storing intermediate values in the registers, because of its architecture.
Wires connect spatially adjacent. Arithmetic Logic Units (ALUs) and these wires work as their registers. The answer of one operation is directly passed to the adjacent ALU to perform the next operation as required. This architecture saves time and much more power used to access registers many times during an operation. For example, the matrix multiplication units used in TPU v1 consist of a systolic array mechanism that contains 256 x 256 = a total of 65536 ALUs. Therefore, the TPU can process 65536 multiplications and additions for 8-bit integers every processing cycle. Furthermore, as the TPU runs at 700MHz, it can compute 65536 x 7 x 108 = 46×1012 operations or 92 Tera operations per second in the matrix multiplication unit. TPU v2 and TPU v3 are even faster than this because TPU v2 and TPU v3 consist of 128×128 systolic arrays with two or more cores in a single processing unit.
Benefits of Cloud TPUs
Cloud TPUs are specially designed to perform Artificial Intelligence (AI) related operations on cloud platforms. Machine learning model training has become a cost-effective and less time-consuming task, because of its computational power. The machine learning models have to be trained over and over as apps are built. Cloud TPUs perform these tasks very quickly at a low training cost. Moreover, Cloud TPUs have a higher level of accuracy in deploying these machine learning models.
Applications of Cloud TPUs in the real world
Most of the Google services have been migrated with these TPU units to power up these services with machine learning to give a better outcome to its user. Applications like, Translate, Photos, Search, Assistance, and Gmail have been migrated with the TPU units to enhance the service offered to their users.
The modern world tends to enhance and optimize every type of operation. Machine learning has become a leading field to identify trends and respond accordingly. Machine learning can be applied in every arena and can be used to optimize these operations to provide the best outcome. The Cloud TPUs are a giant leap for machine learning accretion.