Press "Enter" to skip to content

Draw Calls

内容纲要

Shader 和 Batches

本章的内容

  • White a HLSL shader.
  • Support the SRP batcher, GPU instancing, and dynamic batching.
  • Configure material properties per object and draw many at random.
  • Create transparent and cutout materials.
  • 写一个HLSL着色器
  • 支持SRP bathcer, GPU instanc,dynamic batching
  • 为每个物体配置材质参数,并且随机绘制
  • 创建透明和cutout材质

This is the second part of a tutorial series about creating a custom scriptable render pipeline. It covers the writing of shaders and drawing multiple objects efficiently.

这是关于创建自定义脚本渲染管道系列教程的第二部分。它涵盖了着色器的编写和如何有效地绘制多个对象。该教程基于 Unity 2019.2.

大量的 sphere, 但是只有很少的draw calls

1.Shaders | 着色器

To draw something the CPU has to tell the GPU what to draw and how. What is drawn is usually a mesh. How it is drawn is defined by a shader, which is a set of instructions for the GPU. Besides the mesh, the shader needs additional information to do its work, including the object's transformation matrices and material properties.

Unity's LW/Universal and HD RPs allow you to design shaders with the Shader Graph package, which generates shader code for you. But our custom RP doesn't support that, so we have to write the shader code ourselves. This gives us full control over and understanding of what a shader does.

unity在渲染时,首选要由cpu告诉gpu哪些物体要渲染,如何渲染。通常绘制的物体是mesh。而如何绘制则是由着色器Shader来定义的,简单理解Shader就是一系列unity需要告诉gpu执行的内容。除了mesh,shader需要一些额外的信息来完成它的工作,包括物体的变换矩阵transformation matrices和材质属性 material properties。

Unity的 URP和HD RP允许你使用Shader Graph来设计Shader,它会为你生成Shader代码。但是我们的自定义RP并不支持这个,所以我们必须自己编写着色器代码。这让我们能够完全控制和理解着色器的作用。

1.1 Unlit Shader | 非光照Shader

Our first shader will simply draw a mesh with a solid color, without any lighting. A shader asset can be created via one of the options in the Assets / Create / Shader menu. The Unlit Shader is most appropriate, but we're going to start fresh, by deleting all the default code from the created shader file. Name the asset Unlit and put in in a new Shaders folder under Custom RP.

我们的第一个着色器将简单地绘制一个纯色,没有任何照明。着色器Asset可以通过asset/create/shader菜单中的一个选项创建。Unlit着色器是最合适的,但是我们要重新开始,从创建的着色器文件中删除所有默认的代码。将该资产命名为Unlit,并放入自定义RP下的一个新的Shaders文件夹中。

Unlit Shader asset

Shader code looks like C# code for the most part, but it consists of a mix of different approaches, including some archaic old bits that made sense in the past but no more.

The shader is defined like a class, but with just the Shader keyword followed by a string that is used to create an entry for it in the Shader dropdown menu of materials. Let's use Custom RP/Unlit. It's followed by a code block, which contains more blocks with keywords in front of them. There's a Properties block to define material properties, followed by a SubShader block that needs to have a Pass block inside it, which defines one way to render something. Create that structure with otherwise empty blocks.

着色器代码在大多数情况下看起来像c#代码,但它是由不同的方法组成的,包括一些曾经在过去有用而现在不在起作用的部分。

着色器被定义为一个类,Shader 关键词后的字符串,用于在Shader的下拉菜单中创建一个条目。让我们使用CustomRP/Unlit。它后面跟着一个代码块,代码块包含更多前面有关键字的块。有一个Properties块来定义材质属性,后面跟着一个SubShader块,它需要在里面有一个Pass块,它定义了一种渲染东西的方法。先保持所有的程序块都是空的。

Shader "Custom RP/Unlit" {
	
	Properties {}
	
	SubShader {
		
		Pass {}
	}
}

上面定义了一个shader的最小结构,我们可以创建一个material来指向这个shader

unlit material

The default shader implementation renders the mesh solid white. The material shows a default property for the render queue, which it takes from the shader automatically and is set to 2000, which is the default for opaque geometry. It also has a toggle to enable double-sided global illumination, but that's not relevant for us.

默认的shader将mesh渲染为纯白色。Material显示了渲染队列的默认属性,默认为2000,这是不透明mesh的默认值。它也有一个开关来启用双面全局照明,我们可以暂时不管这些默认设置。

1.2 HLSL programs | HLSL 代码块

The language that we use to write shader code is the High-Level Shading Language, HLSL for short. We have to put it in the Pass block, in between HLSLPROGRAM and ENDHLSL keywords. We have to do that because it's possible put other non-HLSL code inside the Pass block as well.

我们用来编写着色器代码的语言是高级着色语言High-Level Shading Language,简称HLSL。我们必须把它放在Pass块中,而且必须在HLSLPROGRAM和ENDHLSL关键字之间。我们必须这样做,因为其他非hlsl代码也有可能放在Pass块中。

CG programs

Unity 同时也支持CG,但是在本文中,我们使用HLSL,就像官方推荐在RP中使用HLSL

To draw a mesh the GPU has to rasterize all its triangles, converting it to pixel data. It does this by transforming the vertex coordinates from 3D space to 2D visualization space and then filling all pixels that are covered by the resulting triangle. These two steps are controlled by separate shader programs, both of which we have to define. The first is known as the vertex kernel/program/shader and the second as the fragment kernel/program/shader. A fragment corresponds to a display pixel or texture texel, although it might not represent the final result as it could be overwritten when something gets drawn on top of it later.

We have to identify both programs with a name, which is done via pragma directives. These are single-line statements beginning with #pragma and are followed by either vertex or fragment plus the relevant name. We'll use UnlitPassVertex and UnlitPassFragment.

为了绘制网格,GPU必须栅格化所有三角形,将其转换为像素数据。它通过将顶点坐标从3D空间转换为2D可视化空间,然后填充生成的三角形所覆盖的所有像素来实现这一点。这两个步骤是由单独的着色器程序控制的,我们需要分别定义它们。第一个是vertex (顶点),第二个是片段(fragment)。片段对应于显示像素或纹理texel,尽管它可能不代表最终结果,因为它可能会在以后绘制在它上面的内容时被覆盖。

我们需要通过pragma指令来标识这两个程序。这些是单行语句,以#pragma开头,后面是vertex或fragment以及相关名称。我们将使用UnlitPassVertex和UnlitPassFragment来命名。

 HLSLPROGRAM
 #pragma vertex UnlitPassVertex
 #pragma fragment UnlitPassFragment
 ENDHLSL

pragam

pragma一词来自希腊语,指的是一个动作,或者需要做的事情。在许多编程语言中,它被用来发出特殊的编译器指令。

The shader compiler will now complain that it cannot find the declared shader kernels. We have to write HLSL functions with the same names to define their implementation. We could do this directly below the pragma directives, but we'll put all HLSL code in a separate file instead. Specifically, we'll use an UnlitPass.hlsl file in the same asset folder. We can instruct the shader compiler to insert the contents of that file by adding an #include directive with the relative path to the file.

着色器编译器现在会提示它找不到声明的着色器内核。我们必须编写具有相同名称的HLSL函数来定义它们的实现。我们可以在pragma指令下面直接编写其实现,不过这里我将把所有HLSL代码放在一个单独的文件UnlitPass.hsls中。我们可以指示着色器编译器通过添加一个#include指令和文件的相对路径来插入该文件的内容。

HLSLPROGRAM 
#pragma vertex UnlitPassVertex  
#pragma fragment UnlitPassFragment  
#include "UnlitPass.hlsl" ENDHLSL
hsls asset 文件

1.3 Include Guard | Shader 文件引用

HLSL files are used to group code just like C# classes, although HLSL doesn't have the concept of a class. There is only a single global scope, besides the local scopes of code blocks. So everything is accessible everywhere. Including a files is also not the same as using a namespace. It inserts the entire contents of the file at the point of the include directive, so if you include the same file more than once you'll get duplicate code, which will most likely lead to compiler errors. To prevent that we'll add an include guard to UnlitPass.hlsl.

It is possible to use the #define directive to define any identifier, which is usually done in uppercase. We'll use this to define CUSTOM_UNLIT_PASS_INCLUDED at the top of the file.

HLSL文件被用来像c#类一样对代码进行分组。除了代码块的局部作用域之外,只有一个全局作用域。所以任何东西在任何地方都可以得到。包含文件也与使用名称空间不同。它将文件的全部内容插入到include指令的位置,因此,如果您多次包含同一个文件,就会得到重复的代码,这很可能导致编译器错误。为了防止这种情况发生,我们将在UnlitPass.hlsl中添加一个include保护。

可以使用#define指令来定义任何标识符,通常是大写的。我们将使用它在文件顶部定义CUSTOM_UNLIT_PASS_INCLUDED。

#define CUSTOM_UNLIT_PASS_INCLUDED

This is an example of a simple macro that just defines an identifier. If it exists then it means that our file has been included. So we don't want to include its contents again. Phrased differently, we only want to insert the code when it hasn't been defined yet. We can check that with the #ifndef directive. Do this before defining the macro.

这是一个简单宏的例子,它只定义了一个标识符。如果它存在,则意味着我们的文件已经被包含。所以我们不想再包含它的内容。换句话说,我们只希望在尚未定义代码时插入代码。我们可以用#ifndef指令来检查。在定义宏之前执行此操作。

#ifndef CUSTOM_UNLIT_PASS_INCLUDED 
#define CUSTOM_UNLIT_PASS_INCLUDED
#endif

Now we can be sure that all relevant code of the file will never be inserted multiple times, even if we end up including it more than once.

现在,我们可以确保文件的所有相关代码不会被多次插入,即使我们最终不止一次地包含它。

1.4 Shader Functions | shader 函数

We define our shader functions inside the scope of the include guard. They're written just like C# methods without any access modifiers. Begin with simple void functions that do nothing.

我们在include规定的范围内定义着色器函数。它们就像没有任何访问修饰符的c#方法一样被编写。先从不执行任何操作的简单void函数开始。

ifndef CUSTOM_UNLIT_PASS_INCLUDE
 define CUSTOM_UNLIT_PASS_INCLUDE
 void UnlitPassVertex()
 {
 }
 void UnlitPassFragment(){
 }
 endif

将得到下面的结果

Cyan sphere.

To produce valid output we have to make our fragment function return a color. The color is defined with a four-component float4 vector containing its red, green, blue, and alpha components. We can define solid black via float4(0.0, 0.0, 0.0, 0.0) but we can also write a single zero, as single values get automatically expanded to a full vector. The alpha value doesn't matter because we're creating an opaque shader, so zero is fine.

为了产生有效的输出,我们必须让fragment函数返回一个颜色。颜色由一个包含红色、绿色、蓝色和alpha分量的四分量float4向量定义。我们可以通过float4(0.0, 0.0, 0.0, 0.0)定义纯黑色,但我们也可以写一个零,因为单个值会自动扩展为完整的向量。alpha值并不重要,因为我们正在创建一个不透明的着色器,所以0就可以了。

float4 UnlitPassFragment () {
	return 0.0;
}

float or half

Most mobile GPUs support both precision types, half being more efficient. So if you're optimizing for mobiles it makes sense to use half as much as possible. The rule of thumb is to use float for positions and texture coordinates only and half for everything else, provided that the results are good enough.

When not targeting mobile platforms, precision isn't an issue because the GPU always uses float, even if we write half. I'll consistently use float in this tutorial series.

There's also the fixed type, but it's only really supported by old hardware that you wouldn't target for modern apps. It's usually equivalent to half.

大多数mobileGpu都同时支持两种精度,half效率更高,所以如果基于mobile平台优化,尽量使用half。基本的原则是postion 和 uv使用float;其他尽量使用half。这就可以提供不错的最终效果

在非mobile平台,精度不是主要问题,应为GPU总是使用float,所以在本教程中,我们使用float

还有另外一个一种类型fixed,但是这种类型只在非常老的硬件中支持,大多数情况下等同于half

At this point the shader compiler will fail because our function is missing semantics. We have to indicate what we mean with the value that we return, because we could potentially produce lots of data with different meanings. In this case we provide the default system value for the render target, indicated by writing a colon followed by SV_TARGET after the parameter list of UnlitPassFragment.

此时shader编译器将失败,因为我们的函数缺少语义。我们必须添加返回值,因为我们可能会产生大量具有不同含义的数据。在本例中,我们为render target提供了系统默认值,通过在UnlitPassFragment的参数列表后面写一个冒号和SV_TARGET来指示。

float4 UnlitPassFragment () : SV_TARGET { return 0.0; }

UnlitPassVertex is responsible for transforming vertex positions, so should return a position. That's also a float4 vector because it must be defined as a homogeneous clip space position, but we'll get to that later. Again we begin with the zero vector and in this case we have to indicate that its meaning is SV_POSITION.

UnlitPassVertex负责顶点位置的transforming,因此应该返回一个位置。这也是一个float4向量,因为它必须被定义为clip space postion,我们稍后会讲到。同样,返回值暂定为0,在本例中,我们必须指明它的含义是SV_POSITION。

float4 UnlitPassVertex () : SV_POSITION { return 0.0; }

1.5 Space transformation | 空间转换

When all vertices are set to zero the mesh collapses to a point and nothing gets rendered. The main job of the vertex function is to convert the original vertex position to the correct space. When invoked, the function is provided with the available vertex data, if we ask for it. We do that by adding parameters to UnlitPassVertex. We need the vertex position, which is defined in object space, so we'll name it positionOS, using the same convention as Unity's new RPs. The position's type is float3, because it's a 3D point. Let's initially return it, adding 1 as the fourth required component via float4(positionOS, 1.0).

当所有的顶点都被设置为0时,网格会被折叠成一个点,并且不会被渲染。顶点函数的主要工作是将原始顶点位置转换到正确的空间。当调用时,如果我们请求,函数将提供可用的顶点数据。我们通过向UnlitPassVertex添加参数来实现这一点。我们需要顶点位置,这是在对象空间中定义的,所以我们将它命名为positionOS,使用与Unity的新RPs相同的惯例。这个位置的类型是float3,因为它是一个3D点。让我们首先返回它,通过float4(positionOS, 1.0)添加1作为第四个必需的组件。

物体空间Postion

The mesh shows up again, but incorrect because the position that we output is in the wrong space. Space conversion requires matrices, which are send to the GPU when something gets drawn. We have to add these matrices to our shader, but because they're always the same we'll put the standard input provided by Unity in a separate HLSL file, both to keep code structured and to be able to include the code in other shaders. Add a UnityInput.hlsl file and put it in a ShaderLibrary folder directly under Custom RP, to mirror the folder structure of Unity's RPs.

网格再次出现,但仍不正确,因为我们输出的位置在错误的space。空间转换需要Matrices,这些矩阵在图形绘制时发送给GPU。这些矩阵总是不变的,所以我们把Unity提供的标准输入放到一个单独的HLSL文件中,既保持代码的结构化,也可以复用在其他着色器中包含代码。添加一个UnityInput.hlsl文件,并将其直接放在自定义RP下的ShaderLibrary文件夹中,以保持和Unity的RPs一样的文件夹结构。

ShaderLibrary文件夹

Begin the file with a CUSTOM_UNITY_INPUT_INCLUDED include guard and then define a float4x4 matrix named unity_ObjectToWorld in the global scope. In a C# class this would define a field, but here it's known as a uniform value. It's set by the GPU once per draw, remaining constant—uniform—for all invocations of the vertex and fragment functions during that draw.

以CUSTOM_UNITY_INPUT_INCLUDED include保护开始文件,然后在全局作用域中定义名为unity_ObjectToWorld的float4x4矩阵。在c#类中,这将定义一个字段,但在这里它被称为统一值。它是由GPU在每次绘制时设置的,在绘制期间对顶点和片段函数的所有调用保持恒定一致。

ifndef CUSTOM_UNITY_INPUT_INCLUDED
define CUSTOM_UNITY_INPUT_INCLUDED
float4x4 unity_ObjectToWorld;
endif

We can use the matrix to convert from object space to world space. As this is common functionality let's create a function for it and put it in yet another file, this time Common.hlsl in the same ShaderLibrary folder. We include UnityInput there and then declare a TransformObjectToWorld function with a float3 as both input and output.

我们可以使用矩阵从物体空间转换到世界空间。由于这是非常常用的函数,我们为其创建一个名为Common.hlsl文件,同样在ShaderLibrary文件夹中。我们在这里包含了UnityInput,然后声明了一个TransformObjectToWorld函数,并使用float3作为输入和输出。

 ifndef CUSTOM_COMMON_INCLUDED
 define CUSTOM_COMMON_INCLUDED
 include "UnityInput.hlsl"
 float3 TransformObjectToWorld(float3 positionOS){
 return 0.0;
 }
 endif

The space conversion is done by invoking the mul function with a matrix and a vector. In this case we do need a 4D vector, but as its fourth component is always 1 we can add it ourselves by using float4(positionOS, 1.0). The result is again a 4D vector with always 1 as its fourth component. We can extract the first three components from it by accessing the xyz property of the vector, which is known as a swizzle operation.

空间转换是通过调用一个矩阵和一个向量的mul函数来完成的。在这种情况下,我们确实需要一个四维向量,但由于它的第四个分量总是1,我们可以使用float4(positionOS, 1.0)自己添加它。结果仍然是一个四维向量,它的第四个分量总是1。通过访问向量的xyz属性,我们可以从中提取前三个组件,这被称为混合操作。

We can now covert to world space in UnlitPassVertex. First include Common.hlsl directly above the function. As it exists in a different folder we can reach it via the relative path ../ShaderLibrary/Common.hlsl. Then use TransformObjectToWorld to calculate a positionWS variable and return it instead of the object-space position.

现在可以在UnlitPassVertex中将坐标空间转换到世界空间。首先引入 Common.hlsl. 因为它在不同的文件夹中,我们可以通过相对路径../ShaderLibrary/Common.hlsl来访问。然后使用TransformObjectToWorld计算一个positionWS变量,并返回。

The result is still wrong because we need a position in homogeneous clip space. This space defines a cube containing everything that is in view of the camera, distorted into a trapezoid in case of a perspective camera. Transforming from world space to this space can be done by multiplying with the view-projection matrix, which accounts for the camera's position, orientation, projection, field-of-view, and near-far clipping planes. It's made available the unity_ObjectToWorld matrix, so add it to UnityInput.hlsl.

include "../ShaderLibrary/Common.hlsl"
 float4 UnlitPassVertex (float3 positionOS : POSITION) :SV_POSITION
 {
     float3 positionWS = TransformObjectToWorld(positionOS.xyz);
     return float4(positionWS,1.0);
 }

结果仍然是错误的,因为我们需要物体在齐次剪辑空间中(homogeneous clip space)的postion。这个空间定义了一个立方体,包含了相机所能看到的一切,而在透视相机的情况下扭曲则成一个梯形。从世界空间到这个空间的转换可以通过与视图投影矩阵相乘来完成,该矩阵包括相机的位置、方向、投影、视野和近远剪切平面。将其添加到UnityInput.hlsl中。

float4x4 unity_ObjectToWorld;
float4x4 unity_MatrixVP;

Add a TransformWorldToHClip to Common.hlsl which works the same as TransformObjectToWorld, except its input is in world space, uses the other matrix, and produces a float4.

将TransformWorldToHClip添加到Common。hlsl与TransformObjectToWorld的工作原理相同,只是它的输入是在世界空间中,它使用另一个矩阵,并产生一个float4。

include "UnityInput.hlsl"
 float3 TransformObjectToWorld(float3 positionOS){
 return mul(unity_ObjectToWorld, float4(positionOS, 1.0)).xyz;
 }
 float4 TransformWorldToClip(float3 PostionWS){
 return mul(unity_MatrixVP, float4(positionWS,1.0));
 }

在UnlitPassVertex中使用正确的坐标空间来返回postion

float4 UnlitPassVertex (float3 positionOS : POSITION) :SV_POSITION
 {
     float3 positionWS = TransformObjectToWorld(positionOS.xyz);
     return TransformWorldToHClip(positionWS);
 }
正确显示的shader\

坐标空间变换:

object space -》 world space -》Homogeneous clip space

homogeneous clip space

1.6 Core Library | 核心库

The two functions that we just defined are so common that they're also included in the Core RP Pipeline package. The core library defines many more useful and essential things, so let's install that package, remove our own definitions and instead include the relevant file, in this case Packages/com.unity.render-pipelines.core/ShaderLibrary/SpaceTransforms.hlsl.

我们刚才定义的两个函数非常常见,因此它们也包含在核心RP管道包中。核心库定义了许多更有用和重要的东西,所以让我们安装这个包,删除我们自己的定义,而包含相关的文件,在这个例子中是Packages/com.unity. renderder -pipeline .core/ShaderLibrary/ spacetransformation .hlsl。(笔者注:如果使用的是Unity3d 2020 需要手动加载Core RP package

//float3 TransformObjectToWorld (float3 positionOS) {
// return mul(unity_ObjectToWorld, float4(positionOS, 1.0)).xyz; 
//} 
//float4 TransformWorldToHClip (float3 positionWS) { 
// return mul(unity_MatrixVP, float4(positionWS, 1.0)); 
//} 
#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/SpaceTransforms.hlsl"

That fails to compile, because the code in SpaceTransforms.hlsl doesn't assume the existence of unity_ObjectToWorld. Instead it expects that the relevant matrix is defined as UNITY_MATRIX_M by a macro, so let's do that before including the file by writing #define UNITY_MATRIX_M unity_ObjectToWorld on a separate line. After that all occurrances of UNITY_MATRIX_M will get replaced by unity_ObjectToWorld. There's a reason for this that we'll discover later.

编译出错,因为在SpaceTransforms.hlsl没有unity_ObjectToWorld的定义。而我们需要使用UNIITY_MATRIX_M macro来定义,所以让我们在包含该文件之前通过在单独的一行上编写#define UNITY_MATRIX_M unity_ObjectToWorld来实现这一点。之后,所有出现的UNITY_MATRIX_M将被unity_ObjectToWorld取代。

#define UNITY_MATRIX_M unity_ObjectToWorld 

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/SpaceTransforms.hlsl"

This is also true for the inverse matrix, unity_WorldToObject, which should be defined via UNITY_MATRIX_I_M, the unity_MatrixV matrix via UNITY_MATRIX_V, and unity_MatrixVP via UNITY_MATRIX_VP. Finally, there also the projection matrix defined via UNITY_MATRIX_P which is made available as glstate_matrix_projection. We don't need these extra matrices but the code won't compile if we don't include them.

我们可以用以上方法定义一系列有用的坐标空间转换

#define UNITY_MATRIX_M unity_ObjectToWorld 
#define UNITY_MATRIX_I_M unity_WorldToObject 
#define UNITY_MATRIX_V unity_MatrixV 
#define UNITY_MATRIX_VP unity_MatrixVP 
#define UNITY_MATRIX_P glstate_matrix_projection

同时在UnityInput中加入这些坐标系变量

float4x4 unity_ObjectToWorld;
 float4x4 unity_WorldToObject;
 float4x4 unity_MatrixVP;
 float4x4 unity_MatrixV;
 float4x4 glastate_matrix_projection;

The last thing missing is something else than a matrix. It's unity_WorldTransformParams, which contains some transform information that we again don't need here. It is a vector defined as real4, which isn't a valid type itself but instead an alias to either float4 or half4 depending on the target platform.

最后需要加上的是unity_WorldTransformParams,它包含一些我们在这里不需要的转换信息。它是一个定义为real4的向量,它本身不是有效类型,而是float4或half4的别名,具体取决于目标平台。

float4x4 unity_ObjectToWorld;
 float4x4 unity_WorldToObject;
 real4 unity_WorldTransformParams;

That alias and a lot of other basic macros are defined per graphics API and we can get all that by including Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl. Do so in our Common.hlsl file before including UnityInput.hlsl. You can inspect those files in the imported package if you're curious about their contents.

还有很多常用的宏,我们可以通过引用Packages/com.unity. renderer -pipelines.core/ShaderLibrary/Common.hlsl来直接使用。

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl" 
#include "UnityInput.hlsl"

1.7 Color | 颜色

The color of the rendered object can be changed by adjusting UnlitPassFragment. For example, we can make it yellow by returning float4(1.0, 1.0, 0.0, 1.0) instead of zero.

物体的颜色可以通过UnlitPassFragment来定义.

float4 UnlitPassFragment () : SV_TARGET { return float4(1.0, 1.0, 0.0, 1.0); }
Yellow sphere.

To make it possible to configure the color per material we have to define it as a uniform value instead. Do this below the include directive, before the UnlitPassVertex function. We need a float4 and we'll name it _BaseColor. The leading underscore is the standard way to indicate that it represents a material property. Return this value instead of a hard-coded color in UnlitPassFragment.

为了能够配置每个材质的颜色,我们必须将其定义为uniform value。在include指令下面,在UnlitPassVertex函数之前执行此操作。声明一个float4,_BaseColor的变量。前面的下划线是表示它表示一个材质属性的标准方式。在UnlitPassFragment中返回这个值,而不是硬编码的颜色。

然后在Unlit shader 文件中增加一个Properties

Unlit material with red color.

现在就可以通过参数来调整颜色了。

2 Batching | 批量处理

Every draw call requires communication between the CPU and GPU. If a lot of data has to be sent to the GPU then it might end up wasting time by waiting. And while the CPU is busy sending data it can't do other things. Both issues can lower the frame rate. At the moment our approach is straightforward: each object gets its own draw call. This is the worst way to do it, although we end up sending very little data so right now it's fine.

As an example, I made a scene with 76 spheres that each use one of four materials: red, green, yellow, and blue. It requires 78 draw calls to render, 76 for the spheres, one for the skybox, and one to clear the render target.

每次绘制调用都需要CPU和GPU之间的通信。如果需要发送大量数据到GPU,那么它可能会浪费时间。当CPU忙于发送数据时,它不能做其他事情。这两个问题都会降低帧速率。目前我们的方法简单粗暴:即每个对象都有自己的draw call。尽管我们最终只发送了很少的数据,对性能影响不大,但是当出现大量的物体,那么draw call将是一个需要优化的问题。

举个例子,我制作了一个有76个球体的场景,每个球体都使用四种材质中的一种:红、绿、黄、蓝。它需要78个绘制调用来渲染,76个用于球体,一个用于天空盒,一个用于清除渲染目标。

76 spheres, 78 draw calls.

If you open the Stats panel of the Game window then you can see an overview of what it takes to render the frame. The interesting fact here is that it shows 77 batches—ignoring the clear—of which zero are saved by batching.

如果你打开Stats 面板,你可以看到渲染概述。这里有趣的事实是,它显示了77个batches

2.1 SRP Batcher | SRP batcher

Batching is the process of combining draw calls, reducing the time spent communicating between CPU and GPU. The simplest way to do this is to enable the SRP batcher. However, this only works for compatible shaders, which our Unlit shader isn't. You can verify this by selecting it in the inspector. There is an SRP Batcher line that indicates incompatibility, under which it gives one reason for this.

批处理batching是通过联合draw call的过程,减少了花费在CPU和GPU之间的通信时间。最简单的方法是启用SRP批处理程序。然而,这只适用于兼容的着色器,而我们的Unlit着色器不是。您可以在检查器中选择它来验证这一点。有一个表示不兼容的SRP批处理程序行。

Not compatible

Rather than reducing the amount of draw calls the SRP batches makes them leaner. It caches material properties on the GPU so they don't have to be sent every draw call. This reduces both the amount of data that has to be communicated and the work that the CPU has to do per draw call. But this only works if the shader adheres to a strict structure for uniform data.

All material properties have to be defined inside a concrete memory buffer instead of at the global level. This is done by wrapping the _BaseColor declaration in a cbuffer block with the UnityPerMaterial name. This works like a struct declaration, but has to be terminated with a semicolon. It segregates _BaseColor by putting it in a specific constant memory buffer, although it remains accessible at the global level.

SRP并没有减少呼叫次数,而是让呼叫次数更精炼。它在GPU上缓存material properties,所以它们不必每次绘制调用都被发送。这既减少了每次通信的数据量,也减少了CPU每次draw调用所做的工作。但这只在着色器为统一数据坚持严格的结构时有效。

所有的material properties必须在一个特定的内存缓冲区中定义,而不是在全局级别上定义。这是通过将_BaseColor声明包装在带有UnityPerMaterial名称的cbuffer块中来实现的。它的工作原理类似于结构体声明,但必须以分号结束。它通过将_BaseColor放入一个特定的常量内存缓冲区来隔离它,尽管它在全局级别上仍然是可访问的。

cbuffer UnityPerMaterial { float _BaseColor; };

Constant buffers aren't supported on all platforms—like OpenGL ES 2.0—so instead of using cbuffer directly we can use the CBUFFER_START and CBUFFER_END macros that we included from the Core RP Library. The first takes the buffer name as an argument, as if it were a function. In this case we end up with the exact same result as before, except that the cbuffer code will not exist for platforms that don't support it.

并不是所有平台都支持常量缓冲区constant buffer——比如OpenGL ES 2.0——所以我们可以使用Core RP Library中包含的CBUFFER_START和CBUFFER_END宏来代替cbuffer。

CBUFFER_START(UnityPerMaterial)
     float4 _BaseColor;
 CBUFFER_END

We have to also do this for unity_ObjectToWorldunity_WorldToObject, and unity_WorldTransformParams, except they have to be grouped in a UnityPerDraw buffer.

对unity_ObjectToWorld, unity_WorldToObject,unity_WorldTransfromParam都需要做同样的处理,不过需要放在UnityPerDraw中

In this case we're required to define specific groups of values if we use one of them. For the transformation group we also need to include float4 unity_LODFade, even though we don't use it. The exact order doesn't matter, but Unity puts it directly after unity_WorldToObject so let's do that as well.

对于transformation组,我们还需要包含float4 unity_LODFade,(虽然目前没有用到LOD)。确切的顺序并不重要,但是Unity将它直接放在unity_WorldToObject之后,所以让我们也这样做。

CBUFFER_START(UnityPerDraw)
 float4x4 unity_ObjectToWorld;
 float4x4 unity_WorldToObject;
 float4 unity_LODFade;
 real4 unity_WorldTransformParams;
 CBUFFER_END
shader 显示可以兼容 srp batcher

With our shader compatible, the next step is to enable the SRP batcher, which is done by setting GraphicsSettings.useScriptableRenderPipelineBatching to true. We only have to do this once, so let's do it when our pipeline instance gets created, by adding a constructor method to CustomRenderPipeline.

接下来是启用SRP批处理程序,这是通过设置GraphicsSettings.useScriptableRenderPipelineBatching为true来开启。我们只需要做一次,所以在rp 实例化之后进行设置,通过向CustomRenderPipeline添加一个构造函数方法。

2.2 Many Colors | 更多的颜色

We get one batch even though we use four materials. That works because all their data gets cached on the GPU and each draw call only has to contain an offset to the correct memory location. The only restriction is that the memory layout must be the same per material, which is the case because we use the same shader for all of them, only containing a single color property each. Unity doesn't compare the exact memory layout of materials, it simply only batches draw calls that use the exact same shader variant.

即使我们使用了4中matereial,不过只有一个batch。这是因为shader的data都被存在了GPU,而每个draw call 只需要包含一个正确的内存地址的偏移offset就可以。不过前提是每个material所使用的内存结构必须要一样,在本例中,我们使用了相同的shader,这个shader只有一个color属性。

This work fine if we want a few different colors, but if we wanted to give each sphere its own color then we'd have to create many more materials. It would be more convenient if we could set the color per object instead. This isn't possible by default but we can support it by creating a custom component type. Name it PerObjectMaterialProperties. As it's an example I put it in an Examples folder under Custom RP.

在颜色比较少的时候,我们可以为每个color创建material,但是当物体数量非常多的时候,就不太合适了。如果我们可以为每个物体单独设置颜色就会显得更加方便。在默认情况下,是不太好实现的,但是我们可以进行扩展。创建一个新的component 类型,命名为perObjectMaterialProperties. 我把他放在CustomRp / Examples中

The idea is that a game object could have one PerObjectMaterialProperties component attached to it, which has a Base Color configuration option, which will be used to set the _BaseColor material property for it. It needs to know the identifier of the shader property, which we can retrieve via Shader.PropertyToID and store in a static variable, like we do for the shader pass identifier in CameraRenderer, though in this case it's an integer.

具体的思路是,我们为每一个物体加上PerObjjectMaterialProperties component. 它含有一个 Base Color 设置,用来指定material property 的 _BaseColor 属性。

需要注意的的是,shader property的 identifier,可以通过Shader.PropertyToID 来访问,他一static vraiable存储。就像我们在CameraRenderer中处理shader pass identifier一样,只不过这里使用的的integer。

using UnityEngine;
 [DisallowMultipleComponent]
 public class PerObjectMaterialProperties : MonoBehaviour
 {
     static int baseColorId = Shader.PropertyToID("_BaseColor");
 [SerializeField] Color baseColor = Color.white;
 }

PerObjectMaterialProperties component.

声明1一个 MaterialPropertyBlock 的static block

 static MaterialPropertyBlock block;

关于 Material Properties block的扩展:https://www.youtube.com/watch?v=Ci6AuQq_uI4(有点像UE4的 material property collection),在untiy里面是用来做gpu instance 减少draw call的;其主要作用是独立于Material 之外在脚本或UI中制定Material的参数

在脚本中,首先检查materialPropertyBlock是否存在,没有的话,创建一个新的。然后通过Renderer. SetPropertyBlock来制定block

private void OnValidate()
{
    if(block == null)
    {
        block = new MaterialPropertyBlock();
    }

    block.SetColor(baseColorId, baseColor);

    GetComponent<Renderer>().SetPropertyBlock(block);
}

扩展阅读:OnValidate 在每次component被加载或者修改的时候执行。所以当每次场景被加载以及我们修改component的时候,独立的color就会立即被设定

接下来,我们给每个sphere设定独立的颜色

多种颜色

不过,SPR 批处理,并不支持per-object material properties.所以可以看到24个球,每个都有单独的draw call

同时,OnValidate在build的时候并不执行。所以我们需要在Awake中加入Onvalidate

private void Awake()
{
    OnValidate();
}

2.3 GPU Instancing | GPU 实例替代

gpu instancing 意味着可以用一个draw call 来绘制同一个mesh 的多个物体。cpu收集每个物体的transformation, material properties然后将他们放在数组中传递给gpu。然后gpu遍历所有的实体依次渲染出来

因为GPU instacnce需要通过数组来访问数据,不过我们的shader目前还不支持。所以首先增加一个声明 multi_compile_instancing

        #pragma multi_compile_instancing
        #pragma vertex UnlitPassVertex
        #pragma fragment UnlitPassFragment

可以看到material 属性中多了gpu instancing 的选项

我们需要引入 UnityInstancing.hlsl文件来使用gpu instancing功能;这个引用需要放在UNITY_MATRIX_M之后 ,SpaceTransforms之前

include "Packages/com.unity.render-pipelines.core/ShaderLibrary/UnityUnstancing.hlsl"
 include "Packages/com.unity.render-pipelines.core/ShaderLibrary/SpaceTransforms.hlsl"

UnityInstancing.hlsl 的作用是重新定义可以访问inctanced data array的macros。但是如果想正确渲染的话,就必须要首先确认出物体在数组中的index。index通过vertex data来提供,UnityInctancing.hlsl所定义的macros使这个过工程变得简单,但是需要将vertex function 变成struct形式

可以通过声明增加stuct声明来实现——就像之前用cuffer一样。我们可以通过在stuct中使用semantics。这种方式比过长的parameter List更加方便。所以把UnlitPassVertex 的positionOS 参数包裹在Attributes struct中。

struct Attributes{
     float3 postionOS : POSITION; 
     UNITY_VERTEX_INPUT_INSTANCE_ID
 } 
 float4 UnlitPassVertex (Attributes input) :SV_POSITION
 {
     float3 positionWS = TransformObjectToWorld(input.positionOS);
     return TransformWorldToHClip(positionWS);
 }

GPU instancing 使用的object index 是以vertex attribute形式存在。所以我们可以通过在attritubes 中增加UNITY_VERTEX_INPUT_INSTANCE_ID的方法来实现

接下来在UnlitPassVertex 之后增加UNITY_SETUP_INSTANCE_ID(input),这可以将index从input中读取出来并且存放到一个global static 变量中,以便其他的instancing宏可以访问

float4 UnlitPassVertex (Attributes input) :SV_POSITION
 {
     UNIYT_SETUP_INSTANC_ID(input);
     float3 positionWS = TransformObjectToWorld(input.positionOS);
     return TransformWorldToHClip(positionWS);
 }

GPU instancing 正常了,但是SRP 优先batcher所以我们无法得到不同的结果(??)。但是我们暂时还不支持Per-instance materila data。

我们需要将_BaseColor在使用时换成数组。这可以通过CBUFFER_START UNITY_INSTANCING_BUFFER_START CBUFFER_END UNITY_INSTANCING_BUFFER_END来实现

//CBUFFER_START(UnityPerMaterial)
 //    float4 _BaseColor;
 //CBUFFER_END
 UNITY_INSTANCING_BUFFER_START(UnityPerMaterial)
     //float4 _BaseColor;
     UNITY_DEFINE_INSTANCED_PROP(float4, _BaseColor);
 UNITY_INSTANCING_BUFFER_END(UnityPerMaterial)

当使用instancing的时候,我们必须是instance index能够在UnlitPassfragment中访问。为了让这个过程简洁,我们即将使用一个struct 让UnlitPassVertex 同时output position 和Index。用UNITY_TRANSFER_INSTANCE_ID(input,output)来实现。我们将这个struct 命名为 Varying(参照Unity的规范)

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注