To solve such matrices, the Thomas algorithm is often employed. It consists of two macro units – forward reduction and backward substitution.
The Thomas algorithm performs fewer operations than its standard Gaussian elimination method, providing significant time savings on compute clusters or GPUs.
We conducted extensive benchmark tests of GPU runtimes of CR, PCR and PTA algorithms; as well as Thomas’ classic algorithm for lid-driven cavity flow at Re = 1000 with 1000 iterations runs of the Thomas algorithm for smooth lid flow at Re=1000.
Early Life and Education
The Thomas algorithm is an application of Gaussian elimination to a banded matrix and has four parameters, a, b, c and d; these represent diagonal elements while d is the right-hand column element.
Implementation of the Thomas algorithm on GPUs can be quite complex due to having to allocate memory every time they are called upon; this makes performance suboptimal.
CR and PCR algorithms were designed to solve this system of equations without placing any restrictions on the number of computational rows, making them suitable for solving large systems of equations. Here, we compare runtimes of Thomas running on GPU with those from CR/PCR running on CPU in solving two benchmark examples; steady lid-driven cavity flow and steady/unsteady flow over square cylinder inside channel at various Reynolds numbers.
Algorithm may sound simple to you, but not for many others. The word conjures images of an obscure black box with opaque components inside that only an expert could operate. But this perception has started to shift as vision product teams often pair algorithm developers with software engineers and system engineers in order to expedite design, development, deployment and share accountability for its performance.
On a parallel platform with efficient hardware, the PDD algorithm provides a high absolute speedup over Thomas for systems of order 1600 (see Figure 1). Unfortunately, however, its scalability does not exceed CR or PCR algorithms; to further increase scalability we propose reduced PDD using diagonal dominance to further decrease computation count.
Achievement and Honors
Llewellyn Hilleth Thomas has made important contributions to atomic, molecular and solid state physics. He developed an approximate theory of N displaystyle N-body quantum systems as well as an algorithm for solving tridiagonal systems of equations. Llewellyn received several honorary degrees before co-founding multiple companies. Today he serves as director of Tech’s Algorithms Combinatorics Optimization program (ACO) while supervising 16 Ph.D students.
There are various solutions for solving tridiagonal systems of linear algebraic equations, and Thomas algorithm is one of the more efficient. Unfortunately, however, it requires many operations to solve systems with multiple unknowns; therefore in this paper we develop a parallel Thomas algorithm which is more cost-effective than serial CR and PCR algorithms.
He and his wife are parents to two school-age children who attend Naperville Christian Academy, while he serves as an elder in Wheaton’s nonconformist church congregation.
The Thomas algorithm reduces large systems of equations into more manageable ones by performing operations which are linear in terms of unknowns. This approach significantly outshines standard Gaussian elimination methods which require approximately 10 (500) flops for an n x n matrix.
To test CR, PCR, PTA and the classic Thomas algorithm on GPUs, we solved a steady lid-driven cavity flow and steady/unsteady flow over a square cylinder inside a channel at various Reynolds numbers using various Reynolds numbers as their Reynolds numbers change. PTA provided the highest speedup compared to its counterparts.
Due to his immense net worth, he is considered one of the wealthiest individuals worldwide. Additionally, he serves as an investor and founder for numerous companies involved in e-commerce, software development and healthcare.
The CR algorithm utilizes linear (O(n)) operations while Thomas algorithm can use O(n). But for large systems, solving them often takes too long a time; so to speed up this process Thomas algorithm may be more appropriate to help efficiently solve larger systems.
To address this problem, the PCR-Thomas algorithm divides large systems into several smaller ones before solving each in parallel with Thomas. It is faster and more effective than either CR or PCR; its implementation and programming is straightforward; its scalability outpaces PDD/reduced PDD algorithms; it offers superlinear speedup over Thomas.