It’s a fairly practical algorithm and the result looks nice. I implemented the algorithm with OpenCL last year.
The purpose of the second part is to design and implement a new interpolation method ABOS (Approximation Based On Smoothing), which would eliminate limitations of ex- isting methods and which would be robust and flexible enough for interpolating any data set, such as a complex of geological and seismic measurements, temperature distri- bution, height of a snow layer, concentration of contaminants in an aquifer or digital model of terrain.
So much of modern computer science, engineering and chip design is concerned with masking the inherent limitations that the memory hierarchy and von Neumann architecture impose on computation, Chua says, that once those limitations are removed, “every computer programmer will have to go back to school.”
The Billion-Dollar Race to Reinvent the Computer Chip – Scientific American.
In this case, using Structures-of-Arrays might be better than Arrays-of-Strucutures for GPU, which may workaround the inefficiently packing and obviate the necessarily of padding.
Structured Buffers are by definition tightly packed. This means that the following code generates a buffer with a stride of 20 bytes:
StructuredBuffer <Foo> FooBuf;
That may not seem terrible, but it does have some performance implications for your code that may not be immediately obvious. The fact that the structure is not naturally aligned to a 128-bit stride means that the Position element often spans cache lines, and that it can be more expensive to read from the structure. While one inefficient read from a structure is unlikely to damage your performance terribly, it can quickly explode. Something like a shader iterating over a list of complex lights, with more than 100 bytes of data per light, can be a serious pothole. In fact, we recently found prerelease code where whole-frame performance was penalized by over 5% by just such a difference.
Understanding Structured Buffer Performance.