Understanding Structured Buffer Performance

In this case, using Structures-of-Arrays might be better than Arrays-of-Strucutures for GPU, which may workaround the inefficiently packing and obviate the necessarily of padding.

Structured Buffers are by definition tightly packed. This means that the following code generates a buffer with a stride of 20 bytes:

struct Foo
    float4 Position;
    float  Radius;
StructuredBuffer <Foo> FooBuf;

That may not seem terrible, but it does have some performance implications for your code that may not be immediately obvious. The fact that the structure is not naturally aligned to a 128-bit stride means that the Position element often spans cache lines, and that it can be more expensive to read from the structure. While one inefficient read from a structure is unlikely to damage your performance terribly, it can quickly explode. Something like a shader iterating over a list of complex lights, with more than 100 bytes of data per light, can be a serious pothole. In fact, we recently found prerelease code where whole-frame performance was penalized by over 5% by just such a difference.

Understanding Structured Buffer Performance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s