Because a vector is a 1D data structure, we could simply represent a vector as a 1D texture on the GPU. The drawbacks of such a representation quickly become apparent after investigating the constraints that are imposed on textures by the GPU.
All textures on current generations of GPUs are limited in size; this limit is currently for 1D textures. Thus we can represent only vectors of length using 1D textures, which is not sufficient for the simulation of reasonable problem sizes. In addition, we often will compute a vector as the result of a computation by rendering a primitive that covers the area of the vector being computed.
On recent GPUs, rendering 2D textured primitives is far faster than rendering 1D primitives that generate the same number of fragments. To further reduce the size of the internal representation, we pack contiguous blocks of 2x2 entries into one RGBA texel.
On some GPUs, this layout also improves texture access performance. Now that we have an efficient vector representation, we can advance to a more complex linear algebra entity: the matrix. While vectors are usually treated as full vectors —vectors in which practically all elements are nonzero—matrices often have only a few nonzero entries, especially those matrices derived from PDE discretizations. Therefore, we describe different representations for different types of matrices.
We start with the representation for full matrices , also called dense matrices , in which almost every value is nonzero. Later we take a look at alternative sparse matrix types. To represent a dense matrix, we split up the matrix into a set of column vectors and store each vector in the format described earlier. Figure illustrates the procedure.
As we show later, matrix-vector operations can be performed very efficiently on this representation. In real-world applications, sparse matrices exhibiting a regular pattern of nonzero elements often arise from domain discretizations. Banded matrices occur if computations on grid points in a regular grid involve a fixed stencil of adjacent grid points. Then, nonzero elements are arranged in a diagonal pattern. See Equation for an example of such a matrix.
Representing such matrices as full matrices wastes a lot of space and leads to unnecessary numerical computations, because zero elements cannot easily be discarded during computations on the GPU. Therefore, we need a special representation for this type of matrix. Due to the diagonal structure, this becomes straightforward: instead of storing the columns of the matrix, we store its diagonals in the proposed vector format.
Note that in this representation, some space for the off-middle diagonals is wasted and padded with zeroes. If the matrix has many nonzero diagonals and a significant amount of memory is wasted this way, we can combine two opposing diagonals into one vector. See Figure for an example. By doing so, we can store even a full matrix in the diagonal format without wasting a single byte of texture space.
Now let's have a closer look at how to represent matrices in which nonzero elements are scattered randomly. In this case, a texture-based representation either leads to highly fragmented textures that waste a lot of space or else requires a complex indexing scheme that significantly reduces the performance of matrix access operations.
For this particular kind of matrix, we prefer a vertex-based representation: We generate one vertex for sets of four nonzero entries in a matrix row. We choose the position of this vertex in such a way that it encodes the row index as a 2D coordinate; the vertex renders at exactly the same position where the respective vector element was rendered via the corresponding 2D texture.
Similarly, we encode columns as 2D indices in texture coordinates 1 through 4. Matrix entries are stored in the XYZW components of the first texture coordinate of each vertex. Then we store the entire set of vertices in a vertex buffer on the GPU. As long as the matrix is not going to be modified, the internal representation does not have to be changed. The left part of Figure shows the encoding scheme we explain the matrix-vector product on the right side in Section Now that we have representations for scalars, vectors, and matrices, let's describe some basic operations on these representations.
Again, we begin with the simplest case: a component-wise vector-vector operation. Using the vector representation as proposed, a vector-vector operation reduces to rendering a dual-textured quad. To explain our approach, we describe the implementation of a vector-vector sum, as shown in Figure First we set up the viewport to cover exactly as many pixels as there are elements in the 2D target vector, and we make the target vector texture the current render target.
We then render a quad that covers the entire viewport. Vertices pass through the vertex stage to the rasterizer, which generates one fragment for every vector element.
For every fragment, a fragment program is executed. The program fetches respective elements from both operand vectors, adds these values together, and writes the result into the render target. Note that the result is already in the appropriate format and is ready to be used in upcoming operations. In particular, elements don't need to be rearranged, and the processing of RGBA encoded data is done simultaneously.
While the vector-vector sum operates component-wise on its input operands and outputs a vector of equal length, other operations reduce the contents of a vector to a single scalar value—for instance, computing the sum of all vector elements. A vector that is represented by a texture of size n x n can be reduced to the sum of the values of its elements in log n passes. Starting with the original 2D texture, we set up a viewport of half the size of this texture in both directions.
A full-screen quad is rendered into this viewport, and a fragment program fetches four adjacent values from the current source texture. These values are added, and they are written into the render texture target, which becomes the source texture in the next pass. We repeat this process log n times to produce the sum of all elements, which is written into the single float texture described earlier.
Figure gives an overview of this process. Now that we have found a way to handle component-wise operations and reduce operations, we have all we need to operate on vectors. We now advance to the matrix operations. The way we compute a matrix-vector product depends on whether our matrix is stored in the vector stack format for banded and full matrices or in the vertex format for random sparse matrices. To compute a matrix-vector product, we split up the computation into a series of vector-vector products.
Again, we show the process by means of an example. The banded sparse matrix-vector product and full matrix-vector product algorithms are the same. For this example, we use a matrix with two nonzero diagonals, as shown in Figure As we described in the previous section, the matrix A is encoded as two vectors stored in two 2D textures, and the upper diagonal is padded with zeroes.
Besides performance gains due to improved numerical computations, graphics algorithms benefit from this model in that the transfer of computation results to the graphics processor for display is avoided. We demonstrate the effectiveness of our approach by implementing direct solvers for sparse matrices, and by applying these solvers to multi-dimensional finite difference equations, i.
Schiwietz, P. We'll do our best to fix them. Check all that apply - Please note that only the first page is available if you have not selected a reading option after clicking "Read Article". Include any more information that will help us locate the issue and fix it faster for you. In this work, the emphasis is on the development of strategies to realize techniques of numerical computing on the graphics chip.
In particular, the focus is on the acceleration of techniques for solving sets of algebraic equations as they occur in numerical simulation. We introduce a framework for the implementation of linear algebra operators on programmable graphics processors GPUs , thus providing the building blocks for the design of more complex numerical algorithms. In particular, we propose a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism and efficient communication on modern GPUs.
Besides performance gains due to improved numerical computations, graphics algorithms benefit from this model in that the transfer of computation results to the graphics processor for display is avoided. We demonstrate the effectiveness of our approach by implementing direct solvers for sparse matrices, and by applying these solvers to multi-dimensional finite difference equations, i. Continue with Facebook. Sign up with Google. Log in with Microsoft. Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
0コメント