Behind the Pretty Frames: Detroit Become Human
- Introduction
- Configs
- Behind the Frame
- Vulkan
- Compute
- Blue Noise
- Detroit Vertex
- Frame
- Prepare Resources
- Video Frame Decoding [Compute][Not Always]
- Copy Skinning positions [Compute][Not Always]
- SH(Spherical Harmonics)/Indirect Lighting Blending [Compute][Not Always]
- Light Clustering [Compute]
- Procedural Textures
- Clear Previous ZPrepass/Depth
- Clear Previous Motion Vectors
- Z-Prepass/Depth Part 1 (Opaque/Static Geometry)
- Motion Vectors Part 1 (Opaque/Skinned Geometry)
- Z-Prepass/Depth Part 2 (Opaque/Skinned Geometry)
- Hair Accumulation
- Z-Prepass/Depth Part 3 (Alpha-Tested Geometry)
- Depth Texture
- Depth Texture 1/2 + Low Res Depth
- HBAO [Compute][Not Always]
- Shadow passes
- Eye Shadow (not the one girls put)
- Android/Deviant Mind View [Not Always]
- Footstep Layer [Not Always]
- Cluster Forward Rendering
- 1.All Geo
- 2.Skin, Eye & Teeth
- 3.Emissive
- 4.Sky/Background
- 5.Specular Lighting (IBL)
- 6.SSSSS (Screen Space Subsurface Scattering)
- 7.Volumetric Scatter(Volumetric Light Volume) [Compute][Not Always]
- i. Depth Tile
- ii. Participating Media Properties/Material Voxelization (Scattering/Extinction)
- iii. Clear some RW buffer
- iv. Light Scattering or Froxel Light Scattering (Froxel Light Integration & Temporal Volumetric Integration)
- v. Write to some RW buffer
- vi. Scatter Radiance
- vii. Final Integration (Final Participating Media Volume. Integrate froxel scatter/extinction along view ray)
- viii. Particles Voxelization [Not Always]
- Transparency
- SSR [Compute]
- Refractivity Mask [Not Always]
- SSR Composite
- Composite Refractivity Mask [Not Always]
- Post Processing
- UI [Not Always]
- Object Picking [Compute]
- HDR/Gamma [Not Always]
- CAS
- Flip & Present
- Life of a Frame [Rendering Graph]
- Extra Stuff for Future Investigation
- Engine General Observations
- Epilogue
- Related Readings & Videos
Note: Today the 24th of November 2024 this was the last line to be added to this article. The draft for this article & investigation initially started on the 29th of January 2022, that makes it the longest investigation in the series!
Introduction
When the “Kara” PS3 demo video show up years ago, i was blown away by that showcase (well, everybody was), from the premise of the idea, to the visual quality ending up with the voice acting & the outstanding performance by Valorie Curry!! And i wanted to play that game whenever it comes out, regardless how it plays! At that time i did not know what is “Quantic Dream” or if they even made games before, it was one of these companies that is outside my radar, simply because they did not make any of the fancy action adventure titles that i used to play back then.
Wait for it…”Running in real time on a PlayStation®3″
Fast forward to the release campaign and the actual release of the game, i was kinda disappointed knowing that it is just a game where you press buttons to do QTEs, and that’s it, no deep gameplay, no fancy mechanics, no precise platforming, no coyote time no headshots & no car hijacking! So, i decided to not invest neither the time nor the money and not to watch any of it’s coverage, simply NOT MY TYPE! Because it is a boring game, where you just press the buttons that show up on your screen…UNTIL…
One day my wife picked up the game based on her colleagues recommendations, that was not really far away from release, i would say in 2 months window or so, and i fortunately (or unfortunately) was free during that evening & decided to sit on the couch while she was playing Detroit for the 1st time, and here was the gamer genre-shock! The game hooked me (from the Hostage chapter) specially that it was being played in Arabic (my mother tongue, and we almost never get Audio localization in Arabic), and i did not hesitate to start playing it in my profile the next evening! Sadly i then had to sell that PS4 while relocating to avoid region crap from Sony, and since then, i didn’t get a chance to replay that game again till it dropped on Steam!
i would say that i was unfortunate to try this game, because it opened the pandora box for me, of who made that game, and hence i discovered what is “Quantic Dream” and their impressive set of previous games (Heavy Rain & Beyond, that are thankfully on Steam too) and i started playing them. Not only that, but i started to reconsider similar games, such as all Supermassive Games (i played Until Dawn at launch, didn’t like it, but didn’t dislike it either! and maybe this was the game that gave me that impression for the QTE genre), and in my seeking to familiarize myself with that category of games, i was always aware of the praise to TT games that i always avoided playing because they looked liked badly shaded “comics” style game and not up to the contest of current gen fancy-shiny graphics (graphics is not everything boys, even while we talking here about graphics), i looked up for any TT games that was available by that time to play them too. Anyways, it eventually turned out that games falling in that genre is NOT just a “boring game, where you just press the buttons that show up on your screen”! These are games deeper than that, and they compensate the weak or lack of gameplay mechanics & buttons mastery with good story which could be covered with some nice acting, graphics, audio, performance capture and a lot more; but it remains, the writing is the solid core base for such games, and i find that makes it enough to invest the time in playing them. After beating Detroit for the 1st time, it felt for me like an IMDB 9.5/10 movie that has some interactivity elements where i sat on the director chair & i was able to change the course of the movie to get an ending that satisfies me! And that last unique interactivity element worth a 0.5 in movie rating, and that makes Detroit an IMDB 10/10.
This article been in the make for long time, if you’re following the series you might recall that in the Death Stranding article i mentioned that i’ve been working on a Vulkan based game pretty frames article (now you know it is Detroit) and that was side by side with the Resident Evil article (not to mention that Diablo IV article was made from scratch while Detroit’s still an in-progress draft).
From Death Stranding’s Pretty Frames
Anyways, it is a fact now that yet 3 entries in the series went live, while this Detroit one was a draft for long time! But why was that?!
- Frame Captures for Detroit was taking very long time to capture compared to other games (and even compared to projects i work on professionally in the AAA space). A Detroit capture could take more than several minutes up to 10 minutes freezing the computer!!
- The captures could be corrupt most of the time due to that previous reason, many times opening a capture won’t succeed and faced with a popup telling me that “File is corrupted”!
- If a capture is corrupt and i need to retake it, or if i needed an exact or specific capture, then i had to play the chapter again, usually from the beginning or nearest checkpoint (which usually is not that near), as it is not like other games where you can save at many places & keep copies of save files to swap, and it is not an open world where you can chill around until find a nice capture or conditions meeting the intention behind the capture m (bloom, lens flares, particles, wetness,…etc). and not only re-play chapters from scratch, but also at some cases i’ve to do exact QTE choices to get to the camera view or the scene that i wanted to capture from.
- Captures are huge in terms of disk space, but this could be due to the fact i played & captured most of the time on 4K. Average capture was between 4GB and 5GB (Compressed), and some captures unexpectedly passed 6GB (small scene/map and limited space).
- Loading a single capture was enough to inflate a huge amount of data into memory, resulting my PC to be at ~90% memory. Yes Chrome eating some, OS (Windows 11) eating some, but the big portion goes to the capture file (uncompressed) which was usually reaching 15G in memory! And to do such investigations, with other games i used to load multiple captures at the same time (4 or 5) to compare details, worst case i ever had is when i wasn’t able to load more than 2 captures only, but for Detroit, during the entire investigation, i had to load a single capture at a time.
- Of course if the memory was not the only reason to work on one capture at a time, it is the gpu memory and of course glorious Vulkan replay would be complaining with something like VK_ERROR_OUT_OF_DEVICE_MEMORY!
- Last but not least, i wanted this one to be 100% complete breakdown of the game with macro details for each step even if that step is not always there, and without leaving anything as a TODO that never get done!
Me & Detroit Fun Fact
Kara’s & Connor’s music themes in this game, became two of my all time favorite game OSTs, and they’ve been for long time in my playlist that i listen to during work (or while writing this article), so Kudos to Philip Sheppard and Nima Fakhrara for putting these extra joy to my ears : )
Me & Detroit Much More Fun Fact
After more than half way into this investigation (about the post processing section), i found out about couple of GDC talks of Detroit (links by the end of the article), and by the end of the investigation & after most of the article were written & media made & uploaded, i found out about another French talk which in its turn led me to a AMD series of articles on the GPUOpen website discussing the porting process of Detroit to PC/Vulkan. These resources are very informative resources (only GDC’s & the French talk so far that i watched), but they pop-up very late to me, and did not impact or affect what was already written (or what remains to be written), i like to put things from my perspective as a result of me blindly digging deeper & deeper inside the game binaries & the many captures i take. In fact, i’m glad that i did not find out these resources earlier, or before choosing this game or proceeding into this journey of digging it, i would’ve not considered this game if i found these talks earlier!
So if you see any misalignment between what i put here & these resources, you know why, these resources were never been part of the knowledge build-up process i had for this game. Also some of these resources (GDC talks & French talk) i think they’re more about the PS4 version of the game due to their release time, and of course there may be lots of technical differences either due to the API, hardware architecture or due to some tech decisions that came after the official release.
Configs
Still using my common gaming PC, which is made of AMD Ryzen 9 5950X 16-Core 3.40GHZ, 32G RAM, RTX3080 and playing on 4K (no HDR for all captures except the ones in the HDR section, as the game looked nice with none HDR).
Steam’s Build ID: 12158144
SKU Application Version: 01.00
Regarding the game graphics/video settings, i left it to the default of whatever steam build initially launches with
Despite the fact that you see the configs screen in English, but my captures were all including Arabic text for the game UI,
as i preferred to play & re-play the game in Arabic (SO MUCH FUN!!!).
i made sure to swap the language for the configs screenshot only.
More about Configs
The game also offers a GraphicsOptions.JSON
{
"GRAPHIC_OPTIONS": {
"VIDEO_OPTIONS": {
"FULLSCREEN_RESOLUTION_WIDTH": 3840,
"FULLSCREEN_RESOLUTION_HEIGHT": 2160,
"RESOLUTION_SCALING": 1.0,
"FRAME_RATE_LIMIT": 0,
"VSYNC": true,
"BRIGHTNESS": 0.0,
"HDR": true,
"CASSHARPEN": true
},
"ADVANCED_OPTIONS": {
"TEXTURE_QUALITY": 3,
"TEXTURE_FILTERING": 3,
"SHADOW_QUALITY": 3,
"MODEL_QUALITY": 3,
"MOTION_BLUR": 3,
"VOLUMETRIC_LIGHTING": 3,
"SCREEN_SPACE_REFLECTION": 3,
"SUB_SURFACE_SCATTERING": 1,
"DEPTH_OF_FIELD": 1,
"AMBIENT_OCCLUSION": 1,
"BLOOM": 1
},
"GPU_INFO": {
"Name": "NVIDIA GeForce RTX 3080",
"Driver": "556.12",
"Benchmark Score": 2900.206787109375,
"Benchmark Timing": 0.6206454038619995
}
}
}
As well as a WindowInfo.JSON
{
"WINDOW_OPTIONS": {
"MONITOR": 0,
"DISPLAY_MODE": 0,
"WINDOW_POSITION_X": 0,
"WINDOW_POSITION_Y": 0,
"WINDOW_RESOLUTION_WIDTH": 3840,
"WINDOW_RESOLUTION_HEIGHT": 2160
}
}
Also the game runs a shader pre-compilation screen at start for the first boot, it took quite sometime at the first time, but after re-installing the game it took about 4 minutes, and in total it cached 99453 pipelines.
Behind the Frame
Buckle up, and let’s visit Detroit in the year of 2038 inside my RTX 3080!
General Note
For the very tiny image resources, If you click on the images you open the original files in new tabs, which could be as small as 1*1 pixels. But, if you right click on images and “Open image in new tab”, you will open the upscaled detailed version.
Vulkan
i’m a developer who stuck (in a good way) with Khronos & Silicon Graphics(SGI) for a long time, there is no wonder in that! Since the dawn of graphics APIs and i was at the OpenGL (Open Graphics Library for short) side of the table. If you think our market in a fight today, you didn’t see a real fight then, the fight was super super hot regardless it was C against C++ as a game developer choice, or 3dfx’s Voodoo against Nvidia’s Rive or GeForce as graphics cards with better capabilities. Sega’s Dreamcast (NEC PowerVRSG) against Sony’s PlayStation(Graphics Synthesizer) as powerful 3d accelerated gaming consoles, and most notably, the graphics APIs like Direct3D against OpenGL and Glide (not to mention proprietary APIs which is always out of the public competition). As a hobbyist learning at my own back then, OpenGL, and only OpenGL made sense in the middle of all of that for me, it was hard to find resources, no way to travel to attend events, it was very very hard to get any support in that passion driven adventure, there were no home internet, no local courses or schools teaching these topic, importing tech books about these topics is a wish that never gonna happen & waiting for local translations for these books means getting a book with 5-10 years old info & technology, but the friendly & make-sense nature of OpenGL made it at least more friendly to learn at my own with lots of trial & error (of course a huge shout-out to gaming/entertainment magazines availability back then, these used to have some tech topics & sometimes tutorials on programming topics and it happened that OpenGL tutorials was much more frequent & much more detailed than Direct3D abstract tutorials,…at least that was my luck). So OpenGL at some point was basically the one thing that summarized my creative & fun hours spent on a computer!
Some full articles (not the ones in the gallery above) – Totally recommended to sneak a peak if you’ve the time!
Things have changed a lot year after another, the gap shrunk a lot, things died (i mean 3Dfx, Glide, Dreamcast), the mindset and idea of “graphics cards” evolved to be much more modular, new players became essential in the market (AMD, Vulkan, Xbox), and it became that an industry standard for a graphics person (hobbyist or professional) or a game engine to know & support all existing graphical APIs, you don’t even have to know multiple shading languages anymore; with things like SPIR-V or Microsoft’s Shader Conductor and other proprietary translators and/or shader compilers you write once on a language of your choice & these tools do the job (a clean job) for you 95% of the time, but i’ve always been preferring the Khronos’ deal, at least in my side/hobby projects (AAA companies always had another opinion, for the reasons that makes sense for them : $). Till the release of Vulkan!! In the first few months, i decided to fully jump into that early Vulkan boat, because i kinda knew that it is not an experimental thing, and eventually it will be the future of Khronos, where OpenGL will fade away at some point, and this shift is my next step with OpenGL, it just got a different name…a totally different one! i was lucky that during that time of a new API birth to be involved in multiple AAA projects that supported a Vulkan backend, from multiple recent Assassin’s Creed games (Origins, Odyssey, Valhalla, Fenyx Rising, and of course early Assassin’s Creed games (i, ii, iii,…) ports for the Nintendo Switch.. All that when Stadia was a thing i believe, each AC game or Anvil Engine based game i worked on had a Vulkan backend even if it is not really needed, who knows where the suit will decide to ship), to one of the most perf friendly games i worked on (didn’t succeed commercially unfortunately coz it was very very late to the battle royale trend) Hyper Scape, ending up with something i stuck with for a while like Skull & Bones (i’m not sure about the final shipping state of S&B as i left the project & the company several years ago, but around ~2017-2020 we had always a maintained Vulkan version of the game that compiles & runs on Windows side by side to the DX12 version).
But as you can see, i touched multiple AAA titles, they were all supporting Vulkan backend, but none seem to ship with Vulkan as main rendering backend! no surprise in a market that was seem to be dominated by (& totally saturated toward) Direct3D to choose DX12 as a successor for all new titles!
Year after another since the new generation of rendering APIs that requires more & more code, and yet all AAA fancy titles ships on D3D12, no wonder, it has been most of the time the case for D3D, at least till very recent years and till the day i got Detroit on steam, and launched it, and my lovely RivaTuner at the corner was saying it all out loud “Vulkan”!!
Enough ranting about my view of OpenGL/Vulkan vs DX, i would never stop! Let’s see some Detroit Vulkan related details.
Vulkan Extensions
For the sake of curiosity, when looking at any Vulkan based code/app, i always like to check what extensions are used to deliver, these things are easy to miss, and you can lose tracking of how many they’re or what they do, and nothing lets you know them better than seeing what been used in a good game or renderer that you liked. Here are some Vulkan extensions that i was able to find traces for while digging through this game. You may find some of them “inspirational” for your work as they help in reducing latency or improving performance in a way or another!
Vulkan Version
Seem to be Vulkan 1.1 (with GLSL 4.5), as there were some code path at the initialization function bleating about this exact version if not found.
Bindless
And the last note about Vulkan general notes, that the game seem to be using bindless texture resources (bad name i know), and you may’ve noticed this from the mention of VK_EXT_descriptor_indexing above. No much to say here, except that i like the technique & was glad to see it utilized here in a Vulkan game, in the past Pretty Frames (at least the ones that made it to written articles) i don’t think any of the games had bindless resources!
So without further ado, and before jumping into the nitty-gritty of the game’s frames breakdown, i would like to re-phrase the name of this article (sadly can’t do this at the main header for consistency with the previous articles):
“Behind the Pretty Frames: Vulkan Become Human“
Compute
There is no wonder that such a visually appealing game running in budget must be utilizing compute in a way or another. The game (if you don’t know yet) is mainly using a Clustered forward rendering approach, which means compute must be utilized for light culling & view frustum partitioning into cells. But it goes a lot more beyond that. i believe for such a game, if it was possible, they would’ve done the entire rendering pipeline in compute, as you will see later that majority of the pipeline is actually compute based and that includes shading.
To give you an idea about what compute usage in a typical Detroit frame (Cinematic or Gameplay, not difference as the game seem & look exactly the same, after all Detroit is a game-long cut-scene with some inputs), i’ll leave below just the summary for the compute utilization (in dispatch order). But the details of those usages i prefer to put at their correct order in the pipeline at the Draw section of the article.
Compute Dispatches Queue (in execution order)
- Video Frame Decoding (For specific surfaces such as TVs, Billboard displays, Screens, Tablets, ads,…etc.)
- Copy Skinning positions (Skinning & Blendshapes)
- SH(Spherical Harmonics)/Indirect Lighting Blending
- Light Clustering
- Rain Flow
- Procedural Caustics
- Depth Down Sample
- HBAO
- Clustered Forward Shading
- Footsteps
- Volumetric Scattering
- Averaging Luminance
- SSR
- Exposure
- TAA
- Motion Blur
- DOF
- Bloom
- Android/Deviant Mind View
- Color Grading
- Noise
- Final Frame Compositing
- Object Picking
And before leaving that section, i want to (for the first time in the series) put some more details about the workgroup sizes, number of local workgroup as well as the final predicted invocations count. These numbers are important and over the years i’ve seen some people do some art behind picking the numbers they choose (sometimes it is part of optimization phase to tweak these numbers as some numbers combinations would make good utilization where other numbers may not be perfect). Here are some of the most uncommon numbers for vkCmdDispatch & the workgroup size. i’ll leave you to judge the numbers!
The Workgroup Size
These ones below are not everything (indeed), but the most interesting ones.
Local Workgroup (from dispatch cmd) | Workgroup Size (within the shader) | Invocations | Usage |
---|---|---|---|
(32, 16, 1) | (32, 32, 1) | 524,288 | Video Frame Decoding |
(64, 32, 1) | (8, 8, 1) | 131,072 | Video Frame Decoding |
(8, 4, 1) | (8, 8, 1) | 2,048 | Video Frame Decoding |
(305, 1, 1) | (64, 1, 1) | 19,520 | Skinning & Blendshapes copying |
(47, 1, 1) | (64, 1, 1) | 3,008 | Skinning & Blendshapes copying |
(55, 1, 1) | (64, 1, 1) | 3,520 | Skinning & Blendshapes copying |
(644, 1, 1) | (64, 1, 1) | 41,216 | Skinning & Blendshapes copying |
(725, 1, 1) | (64, 1, 1) | 46,400 | Skinning & Blendshapes copying |
(3773, 1, 1) | (64, 1, 1) | 241,472 | Skinning & Blendshapes copying |
(15, 15, 3) | (8, 8, 1) | 43,200 | LPV |
(360, 1, 1) | (32, 1, 1) | 11,520 | LPV |
(1440, 1, 1) | (32, 1, 1) | 46,080 | LPV |
(64, 1, 1) | (32, 1, 1) | 4,048 | Light Clustering |
(1, 1, 1) | (64, 1, 1) | 64 | Footstep Extracting |
(30, 17, 64) | (8, 8, 1) | 2,088,960 | Volumetric Scattering |
(121, 68, 1) | (8, 8, 1) | 526,592 | SSR Prefilter main pass |
(240, 135, 1) | (8, 8, 1) | 2,073,600 | SSR’s TAA |
(1920, 1, 1) | (1, 544, 1) | 1,044,480 | Motion Blur |
(1, 1080, 1) | (964, 1, 1) | 1,041,120 | Motion Blur |
(1, 270, 1) | (960, 1, 1) | 259,200 | SAT |
(1, 480, 1) | (576, 1, 1) | 276,480 | SAT |
If you may asking why “Skinning & Blendshapes copying” differs, it is because each of them for a different skinned mesh (just in case you’re asking), and these ones in the table are not all of the variations, just few of them to showcase the idea.
Possibly we need to rephrase the article name once more…
“Behind the Pretty Frames: Compute Become Human“
Blue Noise
You may notice below during the frame that Blue Noise been relied on heavily in this game, and there is no wonder in that for such a beautiful and smoothed frames. The game utilizes a form of Bart Wronski’s BlueNoiseGenerator to denoise & smoothen things such these ones below.
Things that benefit from Blue Noise
- Shadows
- SSSSS (Screen Space Subsurface Scattering for short)
- HBAO/SSAO
- SSR
- Volumetric Scattering (for Volumetric Lighting)
..and may be more!
Take it like that, any thing has temporal element (a previous frame resources dependency) or any sort of noise in the output, it will be use using Blue Noise in a way or another.
And for one last time i think, because it is gonna get out of control!!! There is a new proposed name for this article…
“Behind the Pretty Frames: Blue Noise Become Human“
i promise, it was the last time to do that silly rename thing!
Detroit Vertex
Of course, like every other game & engine, vertex descriptions get unique & fancy. Below is probably not everything, but these are the most common ones i’ve been seeing during my investigation at 95% of the time, so i think Detroit get away with a handful number of unique vertex shaders.
Detroit Become Human’s Vertex Description – Super Simple Opaque Meshes (Like eye shadowing, no-wind grass, opaque hair,…etc.)
in_uv0 R16G16_FLOAT 0
Detroit Become Human’s Vertex Description – Simple Mesh (Like terrain, floors, walls, wood boards,…etc.)
in_color R8G8B8A8_UNORM 0
in_uv0 R16G16_FLOAT 4
Detroit Become Human’s Vertex Description – Foliage 1 affected by wind (small like grass)
in_lodPosition R16G16B16A16_FLOAT 0
in_windBranchData R16G16B16A16_FLOAT 8
in_windExtraDataFlag R16G16B16A16_FLOAT 16
in_LeafAnchorHint R16G16B16A16_FLOAT 24
in_uv0 R16G16_FLOAT 32
Detroit Become Human’s Vertex Description – Foliage 2 affected by wind (medium like long grass, bushes & small tree)
in_color R8G8B8A8_UNORM 0
in_lodPosition R16G16B16A16_FLOAT 4
in_windBranchData R16G16B16A16_FLOAT 12
in_windExtraDataFlag R16G16B16A16_FLOAT 20
in_LeafAnchorHint R16G16B16A16_FLOAT 28
in_uv0 R16G16_FLOAT 36
Detroit Become Human’s Vertex Description – Skinned Meshes (like characters’ head, arms,…etc. flesh)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 12
in_vfSelfOcclusionColor0 R8G8B8A8_UNORM 0
in_vfSelfOcclusionColor1 R16G16_FLOAT 4
in_uv0 R16G16_FLOAT 8
Detroit Become Human’s Vertex Description – Skinned Meshes (like characters’ clothing)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 12
in_color R8G8B8A8_UNORM 0
in_vfSelfOcclusionColor0 R8G8B8A8_UNORM 4
in_vfSelfOcclusionColor1 R16G16_FLOAT 8
in_uv0 R16G16B16A16_FLOAT 12
Detroit Become Human’s Vertex Description – Hair 1 (Eyebrow)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 20
in_uv0 R16G16_FLOAT 0
Detroit Become Human’s Vertex Description – Hair 2 (Head’s hair, Beard,..etc.)
in_color R8G8B8A8_UNORM 0
in_uv0 R16G16B16A16_FLOAT 4
in_vfMsBindPosePosition R16G16B16A16_FLOAT 12
in_vfMsBindPoseNormal R16G16B16A16_FLOAT 20
You may or may not noticed that the vertices descriptions are missing common things such as the Positions and Normals of the vertices, this is not a typo or a mistake, as these info passed to each draw step as buffers part of the bindless data. Interesting…!!
Keep in mind that particles have their own unique buffer other than any other mesh types.
General Vertex Positions & Normals Buffers
struct unaligned_float3
{
float x;
float y;
float z;
}
struct g_rbVertices_Layout
{
unaligned_float3 g_rbVertices[];
}
struct g_rbNormals_Layout
{
int g_rbNormals[];
}
Particles Only Buffer
struct S_PARTICLE
{
float _fX;
float _fY;
float _fZ;
uint _uiSizeLife;
uint _uiSeedFree;
}
struct dyn_g_rbParticles_Layout
{
S_PARTICLE g_rbParticles[];
}
Frame
Heads up, despite the fact that you will be seeing render targets & backbuffers looks “normal”, but they were all in fact upside down due to the choice of the developer, and the final image get flipped before presenting to swapchain. This “upside-down” thing could be due to multiple reasons! The Vulkan coordinate system is top left towards bottom right, and (iirc) the PlayStation 4 & 5 APIs (let’s just call them the PlayStation APIs here🤐) are similar to Vulkan in that regard, so this is why i suggested it is the developer choice to be like that, they could’ve made it none-flipped if they wanted. It could be an indication that the game port started as OpenGL (it is bottom left towards top right) and this is remains of that? Or maybe it is something related to the formats reading? Or flip copies? Or could be as simple as wrong Y flipping in the vertex shader? Negative vs Positive viewport Hight value for the offscreen rendering?…etc. multiple common known areas can cause such a behavior (in glsl shader code or in viewport setup), but eventually it is not a big deal (but indeed little annoying during debugging).
So, keep in mind, that what i was seeing during the entire investigation was actually a vertically flipped images, and i flipped them back to look normal before uploading to the article for the sake of sanity!
Prepare Resources
Nothing really fancy during this long Copy/Clear pass at the start of a typical frame, just a handful amount of vkCmdCopyBufferToImage, vkCmdCopyImage and vkUpdateDescriptorSets for different buggers (like skinning ones) or image resources to copy image data (and/or mips) between resources (like previous frame’s Rain, Caustics, TAA edges, Motion Vectors,…etc.).
Video Frame Decoding [Compute][Not Always]
This compute dispatches only if there are any video displays around the frame, and boy Detroit is full of these, from holograms in streets, to TVs in bars, houses, offices & even public transportation busses. This is probably something executed by RAD’s Bink Video (not a custom video solution) as i was able to find some traces of Bink functions within the game executable.
The purpose of this dispatch is to get the video frame playback data into a texture resource so it can be used later to texture a TV surface or such, and because the solution used affords it, this texture gets some mips during this compute dispatch (11 mips so far for all cases i’ve investigated).
This can take place for a single or multiple videos, depends on the situation (like in a street for example, can process multiple videos for the billboards compared to an indoor level/frame).
keep reading to learn why i put gifs for the texture pyramids
You may (or may not) have noticed that there is a pop at the beginning of the mips gif, and that the first mip (mip 0) is slightly different frame (an earlier frame) than the rest of the mips in the chain (mip 1 to mip 10), and you are right if you noticed that, it is not a mistake in the gifs, it seems that possibly a bug in the video/bink code as it gets the frame X from the video to be the target frame to write to texture’s mip 0, but then get the frame X-1 for all the down mips. The independence of compute and the lack of barriers could sometimes result in similar things i believe, or it could be simpler than that, and it may be just a -1 or -=1 somewhere in mips data copy & write to images! i always try to keep myself distant from bink & wwise code in productions so i can’t confirm if this is a bink expected & desired behavior (for some anti-logical reason) or just another annoying bink bug!
Ironically, you don’t have to see a surface that displays a video in the frame in order to have this step processed, but we will come to this later @ the Engine General Observations section by the end of the article.
Copy Skinning positions [Compute][Not Always]
At most cases this compute executes for skinned characters/objects in order to keep the skin positions in memory from the previous frame so the game don’t re-do skinning from scratch. Nothing very special to show here, just a lot of floats…a lot of them!
The only observation to keep in mind that, not because you don’t see any characters or possibly skinned meshes (like the few example below), this step will be absent from the rendering graph (will discuss further in the Engine General Observations section), even without characters seen, skinning copy still taking place at many many cases.
And if it is, we shouldn’t copy skinning, or should we?
SH(Spherical Harmonics)/Indirect Lighting Blending [Compute][Not Always]
The game utilizes the Light Propagation Volumes (LPV) algorithm (similar to Crytek’s as a foundation) as a technique for achieving a good indirect light bounce. In order to achieve that, it stores the lighting information from the light sources in a 3D grid of Spherical Harmonics (SH) in order to get the best efficiency at runtime.
While this step is flagged as “Not Always”, this doesn’t mean indirect lighting is absent from parts or areas of the game, but i believe that this step is only taking place when the actual SH need to be re-written into new volume texture atlas in order to “blend” between two or more different probes grid volumes (baked in volume texture atlas), where in other areas of the game, seem to utilize right away the volume textures that is already baked beforehand (offline) without modifying or blending them through this compute step (they’re usually part of the bindless textures list).
At this step the engine will be blending between the different volumes baked lights information (color component coefficients) into the new volume texture atlas (type of StorageFloat3D Image of float4s so it stores 4 coefficients per probe) with different sizes that depends on the volume size which depends on the space size that the volume covers (you’ll see below), but the general rule is that is has 3 slices to represent one slice per lights color channel. These 3d textures are of 16bit precision via the format R16G16B16A16_FLOAT (well, it is the most used format in this game as you’ll see during the course of the article!).
The best example to showcase that, as it seem to be something that happens for a very brief moment, is when Markus is drawing the curtains in order to wake up Carl. While you are in the room & before you draw the curtains, the engine will be using the dark version of the Indirect Lighting baked volume textures (this compute is not running), and once the curtains starts being drawn, the compute kicks the blend, and as soon as the blend is full, and the room is lit, this compute is not running anymore, and the renderer will be using the baked versions of the bright version of the Indirect Lighting baked volume textures.
Of course these 3d texture are not really at their real size, as 54*54 would be very very tiny to show the details.
Keep in mind while each of these gifs represents a “single” channel, but each of them is RGBA & stores data in all 4 channels, this is why i did not put them in grayscale 1 channel.
And in action, from 100% dark, to blend two volume textures sets (via this compute step) ending up with 100% bright.
This type of blend doesn’t have to be an instant thing like that previous curtain example, here is another example, where if the player (Kara here) moved to an area that is between two different volumes, then you can have that blend compute happens pretty much every frame in order to blend between both volumes as long as she is between these two volumes & the propagation reaches her worldsapce position.
And if you think that the volume are the same ones as the previous example, no they’re not, it is common in such textures to look similar between different areas a game. Feel free below to compare them in new tabs if interested (these are for Kara’s example).
Volume 1
Volume 2
As mentioned earlier, this compute runs when there are 2 or more volumes influence is required so the blending can take place, here are these several frame below are from different areas of a similar level (mid-size level), but the results differ from frame to another based on the player & view position, but the shared thing between all these frames that they are requiring the same 3 volumes that covers the play area (not 2 volumes this time).
Feel free to check the 3d textures below if interested to observe the difference between the 4 frames that shares the same 3 volumes. Keep in mind that volumes used for this level is larger than previous examples, it is 117*177*3 (still R16G16B16A16_FLOAT).
Example 1 (Connor looking to the level)
Example 2 (Top view of the level)
Example 3 (Crowds by the crime scene)
And for the sake of curiosity, here is the struct that defines the data stored within these probes of the volume texture/grid.
Structs
struct S_LPV_TO_STACK
{
float4 _vColor;
float _fIntensity;
int _iTexSize;
uint _uiSHCoeffTexRIdx;
uint _uiSHCoeffTexGIdx;
uint _uiSHCoeffTexBIdx;
uint _uiPad0;
uint _uiPad1;
uint _uiPad2;
}
struct dyn_Stack1_Layout
{
S_LPV_TO_STACK Stack1[];
}
Light Clustering [Compute]
The game utilizes a clustering technique, which means that the view frustum get split into a 3d grid of clusters, and we use that to know which lights are going to be rendered, and which clusters these lights will influence. At this step (can call it Fill Cluster), each cluster does an intersection test against each light source’s radius and fill the cluster data with valid (not culled) light source indices via some glsl atomic memory functions such as atomicAdd. The compute does multiple dispatches, as a one dispatch per light source type. There are 4 light source types in Detroit, dispatches usually ran for the types 0, 1 & 2 (Spot, Point & Box) and not for the directional light source type (type 3). Eventually either the clusters that end up with no light source intersections, or light sources that end up with no cluster intersections, get denied & we don’t care about them for the rest of the frame.
Something like that…
From Real-time many-light management and shadows with clustered shading
Ola Olsson, Emil Persson, Markus Billeter – Siggraph 2015
No much visual impact at this point, but there are data output to be used later in shading color passes.
In Buffers
//Source Light
struct S_LIGHT
{
float4x4 _mVsToLs;
float4 _vfVsPos;
float4 _vfVsDir;
uint _uiType;
uint _uiVolumetricData;
uint _uiSceneZoneBits;
float _fSpHumbraOut_PrCUTranslate;
}
//Source Lights Descs Array
struct S_LIGHT_ARRAY
{
S_LIGHT[512] _aArray;
}
//Fill Cluster Constant Buffer
struct CLUSTERED_SHADING_ASSIGN_LIGHT_TO_CLUSTER
{
uint _uCurrentMipWidth;
uint _uCurrentMipHeight;
uint _uCurrentMipDepth;
uint _iPadding0;
uint _uCellWidth;
uint _uCellHeight;
uint _uCellDepth;
uint _uSizeBitShift;
uint _uLightType;
uint _uLightCount;
uint _uLightIndicesBufferSize;
uint _uPassIndex;
uint _uDebugMode;
uint _uMipIndex;
float _fCombinedEyeAdjustX;
float _fCombinedEyeAdjustZ;
float4[16] _avfApproxWorldCellSize;
}
//Cluster Info Constant Buffer
struct CLUSTER_INFO
{
float4x4 _mViewToClusterView;
float4x4 _mClusterProjMatrix;
float4x4 _mClusterInvProjMatrix;
float4 _vfLeftEyeToCombinedEyeVsDeltaXZ;
uint _uiFinestMipWidth;
uint _uiFinestMipHeight;
uint _uiFinestMipDepth;
uint _uiFinestMipWidthHeight;
float _fZSliceGeomFactor;
float _fRcpLogZSliceGeomFactor;
float _fDummy0;
float _fDummy1;
float _fNearP;
float _fInvNearP;
float _fFarP;
float _fClusterCoincidentWithCamera;
}
Out/Storage Buffers
//Shared
//Cluster Cell
struct S_CLUSTER_CELL
{
uint LightCount;
uint LightFirstIndex;
}
//Light to Cluster Projector
struct ProjectorMatrix
{
float4x4 mLightSpaceToClusterSpace;
}
//Source Light Indices
struct stc_SourceLightList_Layout
{
int SourceLightList[];
}
//Source Light Cluster Data
struct dyn_SourceLightCluster_Layout
{
S_CLUSTER_CELL SourceLightCluster[];
}
//Dest Light List Indices
struct stc_DestLightList_Layout
{
uint DestLightList[];
}
//Cluster Grid Data
struct stc_ClusterGrid_Layout
{
S_CLUSTER_CELL ClusterGrid[];
}
//Source Light Projector Matrices
struct dyn_SourceLightProjectorMatricesDescArray_Layout
{
ProjectorMatrix SourceLightProjectorMatricesDescArray[];
}
Procedural Textures
Ahead of time, some procedural textures created &/or updated with the efforts of a collaboration between some compute & fragment shaders. The resulted textures will be needed later in the frame to serve multiple visual enhancements (mostly to do distortion or faking surface normals).
1.Rain
Detroit is very rainy in that game (idk irl), and pretty much all the time the rain procedural textures are processing at the start of each frame, even while inside an interior set & regardless it is rainy or not outside (like inside Kamski’s house or even at the rooftop at the Hostage first mission).
How it works?
The rain procedural textures preparation is done in a set of passes that are back to back to each other as you’ll see below, but all in all the output of this step is a about 3 textures at most cases, one used for diffuse and the other ones used for water drops & flow (like on a glass window or some character face & body), these are usually in 256*256 of the format B8G8R8A8_UNORM and 9Mips. Let’s see how this works!
i.Rain Ripples
Around 8 different draws split into twos, updates the rain ripples texture (most likely starts as an empty texture at the begin of the level), with the help of a water drop diffuse sampler, new source of ripples (points) are drawn.
Each of these 8 draws is split into 2 draws, one to draw new ripples (i call it new ripples roots) and the other draw to update existing ripples flow (scaling).
If you can’t spot the roots for the new ripples, you indeed can do here, these are the exact 8 steps for drawing the ripples, you can observe the new roots every other draw. The first draw is in fact a “new roots” draw, but you can’t tell or observe that, because there is not previous draw to compare to, but it uses the Diffuse Sampler texture above, which is used for the new roots draw.
Procedural Rain Ripples Texture Constant Buffer
struct ProcTexRain
{
float _fTexOffsetX;
float _fTexOffsetY;
float _fAbsorption;
float _fDispersion;
float _fSharpness;
float _fFlowScale;
float _fFlowGain;
float _fFlowToRipples;
}
This output ripples texture (aka Rain Diffuse texture) while it looks off, but you understand it better if you look to it as a set of channels, like so:
Now you may be asking, why B & A is exactly the same?
In fact they’re not! Open them & look closer!
The A channel is used in the past few steps to accumulate or blend between the previous & next step in the animating ripples effect, and hence by the end of this step, it is possibly not needed anymore, right? Let’s see below!
ii.Rain Ripples Mips [Compute]
The rain ripples texture get mipmaps generated, to end up with total of 9 mips, from 256*256 to 1*1.
iii.Rain Dripples
The “Rain Dripples” is not really born as a separate texture, it is stored (for now) in the alpha channel of the Ripples/Diffuse texture, by the end of ripples step (the previous step), the info at the alpha channel of this rain diffuse texture is not needed anymore, so it gets cleared & replaced with something else here, this something else is the Rain Dripples. So all steps in this section takes place for the final RGBA texture we got from the previous step, BUT it happens in the A channel only, the RGB channels remains intact.
So, at this step drawing starts by clearing the alpha channel of the ripples texture to black. Then few rain ripples are randomly drawn to that render target’s alpha channel using the same water drop diffuse sampler used earlier.
iv.Rain Dripples Distortion & Sharpen [Compute]
Using a distortion texture, the Dripples (Alpha) is copied to it’s own new (and final) texture and slight distortion applied to it, this texture will act later as a normal map.
Rain Distortion Constant Buffer
struct RainDistortion
{
float _fDistortionFactor;
float _fSharpenNormals;
float _uiTextureWidth;
float _uiTextureHeight;
}
v.Rain Dripples Mips [Compute]
Just generate mipmaps for that normal map-y thing! to end up with the standard of the rain texture, a 9mips from 256*256 to 1*1.
vi.Rain Flow
The “Rain Flow” is not really a separate texture too, and not any different from the Rain Dripples (previous step), it is stored (for now) in the alpha channel of the ripples/diffuse texture (one more time), as we cleared the content of the ripple texture’s Alpha channel to draw dripples, we do the same thing to draw the flow, except that this time, we will copy over the dripples to the flow texture to make it much more…busy!
Keep in mind that for rain flow too, all steps in this section still taking place in the updated RGBA texture we got from the previous step, EXCEPT that it happens in the A channel only, the RGB channels remains intact here too.
So, few lines are drawn on Alpha channel of the ripples/diffuse texture, these lines shortly will be used as base for water flow. Drawing for these ugly lines done using the exact same water drop diffuse sampler used in the previous steps.
Now that lines texture data are copied to/with the water dripples, so both are ended up on top of each other in the alpha channel of the ripples/diffuse texture, as we don’t need this channel anymore to hold anything related to ripples, now on the alpha channel will be holding the “flow” info. Remember, we are still doing all that on the Alpha channel only, so RGB still not touched.
Now we got a base for the Rain Flow & it is ready for the next step.
Alpha Channel Note
Just because this Alpha channel been through a lot, to put all steps for the alpha channel of the Rain Ripples(Rain Diffuse) from holding the “Ripples” info, to hold the “Dripple” info, ending up with the “Flow” info. This channel served as a back blackboard for drafting things!
vii.Rain Flow Distortion & Sharpen [Compute]
Using a distortion texture (similar to Dripples), the Flow (Alpha) is copied to it’s own new (and final) texture and slight distortion applied to it, this texture will act later as a normal map.
Rain Distortion Constant Buffer
struct RainDistortion
{
float _fDistortionFactor;
float _fSharpenNormals;
float _uiTextureWidth;
float _uiTextureHeight;
}
viii.Rain Flow Mips [Compute]
Just generate mipmaps for that normal map-y thing! to end up with the standard of the rain texture, a 9mips from 256*256 to 1*1.
Put it all together
And to put all steps together with the final frame
It always rain, even when not needed!
As mentioned earlier that the rain procedural textures are processing at the start of each frame, even while in interior set regardless it is rainy or not outside (like inside a house or so). So rain procedural step happens all the time, but not all the time it draws meaningful procedural textures. And even if it process in writing some procedural textures, there is no guarantee that the output textures will be really used. What really defines if the rain procedural textures are processed to valid textures (regardless really used or not) is a flag/param _uiRainIsEnabled passed as part of the forward clustered PBR pass rain params struct (you can see it later as part of one of structs), but regardless this flag is On or Off, rain flow textures are processed & simulated most of the time at the beginning of each frame. Even at the game’s living main menu aka “Chloe”!
Here are some frames where there are no presence of rain, but still have rain flow textures (ripples too) processed 100% as if will show that effect at the final frame.
While we ended up with nice rain flow textures for these previous set of frames, but eventually the coloring passes down the road had the value _uiRainIsEnabled of the rain parameters struct set to 1. So it is not really used visually, but that flag made them simulate & produce correct flow textures!
And here are some frames of the other case where there are no presence of rain as well, and the process still take place, but at this case resulting an empty flow texture per frame. And for these frames the _uiRainIsEnabled of the rain parameters struct were set to 0.
Pseudo Rain Flow
Rain Flow textures (and sometimes Rain Dripple) are not exclusive to showcase rain effect/mode, but also used sometimes for none-rain purposes, things that is not that far away from being considered as rain. For example, when one of the protagonists get wet after coming out of pool or something, it is not rain, but it is a “water flow” on top of the body after all!
Rain, Compute, Fun!
When taking a GPU capture, games (any game) will usually go into a frozen state. At least game & render threads sleeps while something like audio thread keeps chilling in the background to run the last request/s it got from game thread before the game thread sleeps for a while, and this is expected, we need that mechanism so we can record all the API calls with all their data & stuff at that exact frame (data write/transfer out of question now) for a later re-play. Now the fun part, due to the dependency relationship between the frame drawing & the rain compute dispatches, which by it’s turn seem to be based on sampling the game clock time value sent by CPU, compute will jump in time once capturing frame is completed, and process all the frames we missed during the capture duration into the same rain flow texture within the first unfrozen frame, and that resulting in a “showering” or “sweating badly” distortion textures instead of rain distortion textures, which looks funny!
And it all make sense, as mentioned earlier that the generation of these flow & dripple maps are 100% done on the Alpha channel only, when you look to all the outputs of the rain procedural textures generation, you would notice the change in the Alpha channels only, and what is based on them (like the final normal maps), but something like the diffuse color, still look normal & not changed, because it is not based on the same matrices & it is done at it’s own100% fragment based steps (compute involved only mipmaps generation).
Here are the detailed outputs for that Markus showering rain example..
And here it is one of them in action
Just a full Video for that previous showring gif
and i truly don’t blame you if you thought about this meme!
2.Caustics
Yet another step that runs regardless if it is really needed or not to contribute in the frame. Here, in a much simpler process than Rain, the engine will work on simulating or generating texture used for caustics-like effects.
i.Draw Caustics Texture
A fragment shader draws some caustics using multiple intensity samplers in addition to a distortion texture. The output texture most of the time is a 256*256 but sometimes a higher resolution required (not sure based on what metrics) and hence the output caustics texture is 512*512.
ii.Caustics Texture Mips [Compute]
Generate mips of the caustics texture, to end up with 9 mips, from 256*256 to 1*1.
iii.Draw Caustics Texture 2 [Not Always]
An updated caustics texture, a second one and not sure for what use, as all the time the earlier generated caustics texture is the one used. Exact same intensity samplers are sued, but at this time the distortion texture is quite different.
This is not always taking place, as at many times only a single caustics texture is generated & used.
iv.Caustics Texture 2 Mips [Compute][Not Always]
Generate mips of the caustics texture, to end up with 9 mips, from 256*256 to 1*1.
Put it all together
And to put all steps together with the final frame
And for that, few parameters are required for the caustics drawing, just the amount of distortion & the speed of the pattern animation.
Caustics Constant Buffer
struct ProceduralCaustics
{
float _fDistorsionPower ;
float _fSpeed;
int _iPadding0;
int _iPadding1;
}
As mentioned earlier that caustics generation is always there, regardless it is need or not. Here are few examples of frames that had caustics textures generated while you can’t really spot any existence of caustics effect utilization!
Caustics itself is not easy to spot in the game i agree, regardless it is a gameplay or cut-scene. It works like a hidden ninja soldier, it adds a subtle effect you can feel, but you can’t really easily spot it all the time. But there are areas where it really shines, and in order to see the actual impact or contribution of a caustics texture into the final presented frame, here are some final frames before & after i do some shader modifications in order to remove the instructions block responsible for sampling & mixing the caustics texture into the affected areas (there is no graphical option in the game settings menu to toggle the effect, so had to improvise!), you can tell how much it adds to the final frames!
It really is adding some nice subtle touch!
Clear Previous ZPrepass/Depth
Just clear the previous’ frame ZPrepass output in order to write to it shortly for the current frame.
Clear Previous Motion Vectors
Nothing really fancy as well, just fill the motion vectors rendertarget (that contains previous frame motion vectors) with solid black color, so we can start drawing into it very soon after the ZPrepass.
Keep in mind, this is the main rendertarget to draw motion vectors that we clear here, the data that used to be in there was copied earlier at the start of the frame to another temporary target during the long copy/clear queue.
Z-Prepass/Depth Part 1 (Opaque/Static Geometry)
Z-Prepass became almost an industry standard, and a ton of engines & games adopted it by heart as a for granted shading optimization technique and there is no wonder to find a game such as Detroit having it, it is the type of game that strives for every precious fraction of a millisecond in order to deliver as realistic as possible photorealistic realtime frames.
Z-Prepass here was literally imitating the formula of hitting many birds with a single stone (not tiny stone though), it is not only benefit shading later to avoid fragments overdraw, but also utilized multiple other things such as hair rendering (and other cut-out / alpha-tested geometry), SSAO/HBAO, Shadow Maps Z-partitioning, SSR, Clustered light culling, Decals,…any many more!
Z-Prepass in Detroit is a full Z-Prepass but it neither fully done in a one go nor in a single pass, it takes few distinctive (and logical) steps to fully complete the Z-Prepass. At first step, with backface culling, and only the power of vertex shader, a just regular old friendly Z-Prepass is completed for all opaque geometry in the scene. The good sign here, that this pass is used vertex description that contains the vertices positions only, which is a good sign of granular optimizations.
The largest piece of the Z-Prepass is done here, let’s say 80% of the final Z-Prepass!
✔️Vertex Shader
❌Fragment Shader
Motion Vectors Part 1 (Opaque/Skinned Geometry)
After the 1st step of the Z-Prepass and through a fragment shader, the process of generating motion vectors data/texture that will be needed a lot later for a wide range of effects (TAA, SSR, Blur, …etc.). Geometry processed here is unique, and not processed yet in the previous step for the Z-Prepass. This is a dedicated render pass, the color output of this renderpass is the motion vectors texture, but the depth output is holding something else (next step).
The output here (3840*2160 of R16G16_FLOAT) is not including everything that moves indeed, but it seem to be for skinned meshes only (Characters, clothing, Trees, Grass,…etc.). There are some objects missing indeed from the motion vectors texture, but these coming later…a little bit very later than expected!
✔️Vertex Shader
✔️Fragment Shader
Z-Prepass/Depth Part 2 (Opaque/Skinned Geometry)
This is not a very individual step, it takes place as part of the previous step, so basically it is a single pass that does the motion vectors (outputs to color) + characters/skinned Z-prepass geometry (outputs to depth).
You can’t really consider this pass as a Z-Prepass because here will be writing via fragment shader in addition to the vertex shader, so output is not depth only, but i consider it part (or a step) of the Z-Prepass because it adds up to the depth output of the Z-Prepass in order to include more “opaque” geometry.
A lot smaller piece of the Z-Prepass is done here, let’s say 15% of the final Z-Prepass!
of course these % vary based on the space of screen occupied by characters, it is just for demonstration, but in average frame it would be near this number
✔️Vertex Shader
✔️Fragment Shader
Hair Accumulation
This is a very important step/pass for hair rendering. Once this short step is done, anything that is related to hair rendering would need the output of this step (this includes the very next step, which is the last step of the Z-Prepass steps).
With the help of this step, Detroit avoids fully transparent hair rendering, but at the same time keeping on a close hair rendering quality to fully transparent hair, and at the same time avoid fully alpha-tested hair for most of the time. This is done by splitting hair between two types, hopefully the larger portion of hairs to be rendered as Opaque, where the tinier portion (that will make you tricked) to be rendered as Transparent. The idea here for hair accumulation is simple, it is to allow using transparency and alpha-tests in order to reduce the amount of overdraw to almost near none (in terms of hair card). The end result is very recognizable hair cards, but it is partially translucent as huge part of it would be rendered as opaque cards.
So render a single hair mesh in two passes (Opaque and Translucent), but only the Hair Accumulation is the one going to decide which parts of the hair pixels going into which pass (Opaque or Translucent).
Shame on EA, BioWare and Veilguard graphics team!
If you worked in games or tech, you probably met at some point in your career that co-worker who comes in person and in private (not in meetings, not in public team chat) & asks you questions about their task as they can’t progress in it, and you answer them with all the details you know about the topic because you are a nice person. But you then at the next day in a big team meeting or a standup sync, when they asked about that task progress, they explain the exact same things you told them (even exact wording at some times), without crediting you or mention that you helped them through the topic, and sometimes they even go further a mile and would say that they “i figured it out” or “i did some research and found out that …”. You know that guy,…this is EA & BioWare here!
Today, rn, it is the 13th of November 2024, this Detroit’s Hair Accumulation section written long time ago, and the entire article itself is almost done, i’m just revising all sections trying to make wording clearer, moving paragraphs of same section where it may fit better & adding some extra footages, and getting ready to release it as soon as i can. Went to visit twitter for a break, and then an article came across my path from EA’s blog, it is about the Hair tech in Veilguard, and while i was reading it, i was opening my mouth shocked, it is very familiar, it is the exact same technique i broke down in this article months ago, and there is no problem in that at all, it is cool to see similar techniques in different games, the problem that shocked me was when i went through EA’s entire article, and earlier in it, they were (the Veilguard team) crediting themselves for the “development” of this “new” technique, and they never mention or reference anything, like Detroit’s GDC talk for example!!
No mention for previous works, no mention that this was made in other games before, no mention to any papers or GDC talks, no mention to anyone! Keep in mind, Detroit shipped with this technique 6 years ago, which means the tech is older than that release date (regardless innovated by Quantic Dreams or else).
Nothing can upset me more than not crediting others where they deserve crediting, or not being honest or transparent about what you do, or self crediting as pioneer on something you clearly mimicked from a REAL PIONEER! So, i don’t regret putting that header above! Such mentality is bad for such industry! Shame it comes from such big team!!
Note: this is not a hate letter or me trying a new way in burning bridges with EA. Nope! i have some friends and previous co-workers & some nice legit twitter friend who works under EA, i still love some games from EA, but this is simply me, i don’t like this type of crappy behavior, specially when i see such a thing in work/gaming related topic!
Not always outputting!
This step takes place all the time indeed, it is a core step for the Detroit’s rendering pipeline, but it does not really fully complete and output something useful all the time. From my observation, i can tell only some hair material or material flags for only hair of story main characters that you see most of the story time (not only playable, but characters such as Hank or North) would have accumulated hair output from this step, where other secondary (guest) characters (like Chloe, Daniel, Elijah, Todd, Amanda,..etc.) that don’t have much appearance at the course of the 15 hours of game story, would end up with empty Hair Accumulation renderpass attachments as they will just get away with alpha tested hair. For example, in all these frames below, the Hair Accumulation renderpass attachments were always solid black, despite the fact that there are all sorts & types of hair!
1.Clear previous Hair Accumulation
First step is to take the Hair Accumulation renderpass attachments from previous frame, and do clear them to solid black. Nothing fancy. Clearing the attachments from the previous frame data is something that happens all the time, even in cases where there are no Hair Accumulation to do for the in-progress frame (like these previous set of images), and it make sense to do clear all the time, it is for example for the sake of just in case this frame is a totally different camera view (camera cut).
2.New Hair Accumulation [Not Always]
Remember, all the upcoming efforts, is only to reduce the amount of transparent hair to the minimum-est minimum amount!
i.Tweaked Alpha (Color)
In a lower resolution attachments(1/4 of the target resolution), and with the use of hair cards alpha texture/s (size & format of that texture vary per hair type/mesh), the tweaked alpha transparency for specific character’s hair materials will be drawn. Let’s look at a capture that features some nice hair (Kara & Alice capture that we used in the previous step uses hair materials that doesn’t output a Tweaked Alpha).
iii.Opaques (Depth)
Using the same hair cards Alpha texture, do accumulate and when there is too much accumulation happens (certain coverage percentage), the pixels covered by these hair cards are considered as opaque, and thence drawn to the the output Depth render target (1/4 of the target resolution) as solid.
So to put it together
And here is another example that has some much motion (character & view), so you can observe the difference between the given Tweaked Alpha & Opaques & the output one. Also you can observe that the secondary character doesn’t have any Hair Accumulation processed, only main characters, remember!
Hairs alpha here is different, it is a 1024*512 – BC1_SRGB
Also keep in mind, it is not always a single hair alpha texture used for a single character, sometimes there are more textures if the case requires that, for example Hank (the funny detective), has in addition to the John Wick hair a long beard, and hence he uses two alpha textures to draw, which makes it 4 steps instead of two (beard tweaked alpha, hair tweaked alpha, beard opaques, and hair opaques).
And draw steps for Hank alons is something like that:
And the more valid hair materials that supports Hair Accumulation, the more steps will be taking place to produce the Tweaked & Opaques images (aka more main characters in the scene = more draw commands).
Now what didn’t really make sense, and i did not investigate further, that both steps (Tweaked Alpha and Opaques) are using the exact same couple of vertex & fragment shaders, so why done in two different draw commands per hair mesh?! Why not draw Tweaked + Opaque at once per mesh (once for beard, and then once more for hair)!?
Z-Prepass/Depth Part 3 (Alpha-Tested Geometry)
This is the final step for the Z-Prepass, to write the last few geometries that didn’t make it yet due to the obvious reasons (alpha-tests). Things processed here like hair,…or usually it is hair only! Things such as head hair, eyebrow, beard,…etc. A simple fragment shader runs, that does discards/kills() based on the given texture sampling (like these ones below for that Kara & Alice bed time scene). And of course, the alpha tests here are all done with the help of the hair accumulation textures that was generated previously at the previous step.
An finally the smallest piece of the Z-Prepass is done here, let’s say about 5% of the final Z-Prepass!
✔️Vertex Shader
✔️Fragment Shader
And to put this last couple of steps in perspective, from taking previous Hair Accumulation, to generate new ones, to use it to update the Z-Prepass with hair info, it is something like that (watch out, so hair focused captures, as this Kara & Alice bed-time capture not showing much hair accumulation details due to the material used for Kara’s & Alice’s hair)
Previous Tweaked Alpha, Previous Opaques, Hair Cards Alpha, Current Tweaked Alpha, Current Opaques, Previous Z-Prepass part 2 (Skinned), Z-Prepass part 3 (Alpha-Tested) and finally Swapchain
By now the entire Z-Prepass is ready, and to put all together, it is like that (3 passes altogether)
i know that some would consider this design as a partial Z-Prepass and it is not really a Full Z-Prepass as i mentioned at the beginning of that section, but i tend more to consider it as a full prepass due to the final output, yes it did involve fragment shader at some point, and not done in one go in a single pass, but this doesn’t negate the fact that the final Depth image was ready before any & every thing else. Its more about the intention behind it and the time it executes & the role it plays. Think about it like deferred’s gbuffer (some calls g-prepass), if it took place on multiple render passes, let’s say 3 passes, due to some reasons such as different shader bindings, would this change the fact that all these 3 passes are eventually represents the full g-prepass?
Anyway, because producing Z-Prepass was not just straight in one go, and there were some other few tangled steps in the middle, here are steps outputs from Z-Prepass 1st step, ending up with the full & final Z-Prepass for the frame.
And few more other example that has much more motion going on
the hair was either far enough or in a specific camera view, to be considered as fully Opaque
during the “Hair Accumulation” step.
Depth Texture
Taking the full & ready Z-Prepass Depth/Stencil attachment and convert it to a Depth Texture. So from D32S8 to D32, but still same size.
Depth Texture 1/2 + Low Res Depth
Using a fragment shader to down sample depth and output two different version of the 1/2 depth with different formats (needed at the next step for HBAO).
i wanted to leave the note here that this new Low Res depth (red guy) is made of 11 mips, as this is very important to keep in mind. Yet we draw to the mip 0 (1st mip) the 1920*1080 (1/2 original frame res) depth in R32_FLOAT format, while the rest of mips still holding whatever used to be since previous frame. This texture as mentioned is needed for HBAO and it only needs the mip 0 & mip 1, so yet the mip 1 that we didn’t modify, is still holding data from the previous frame (which is important for the HBAO implementation as you will see below).
Because the two mips are two different sizes, you may find hard time to find any different, and you may think that the 540p is just a scaled down (mip) version of the 1080p, but in fact it is not, here are the difference between both
As it may not be visible in such a frame (Kara & Alice) due to being very subtle scene in terms of movement (camera & characters) considering that during this moment there were a very very subtle camera movement towards Kara & Alice and characters were almost frozen, but this can be observed easily in other frames that has some more action & motion like frames captured from a chase or running sequence.
So it is clearly that yet, mip 1 is still holding old R32_FLOAT depth from the previous frame
HBAO [Compute][Not Always]
Only if AO is enabled at Video Settings (which is the case for all my captures) the game will go through this step. SSAO here seem to be processing one of the many unique-yet-close HBAO implementations, it looks for me a lot like the the Frostbite’s 2012 one called “Stable SSAO in Battlefield 3 with Selective Temporal Filtering”. Let’s see how this works!
1.Down Sample
First, we start with Downsample the given linear depth to1/2 or better to say it to 1/4 of the target resolution (considering the given linear depth is already 1/2 the target resolution from the previous step). So the depth taken from 1920*1080 (half done already) to 960*540, this new output is not stored in an individual texture, instead it is stored in the 2nd mip of the texture (mip 1), the mip that was still holding data in the previous step. The rest of the 11 mips remains solid black, and not utilized.
Depth Down Sample Constant Buffer
struct PostProcessingDownSample4xDepthMin
{
int2 _viSrcSize; //1920, 1080
int2 _viDestSize; //960, 540
float2 _vfSrcTexelSize;
int _iPadding0;
int _iPadding1;
}
2.HBAO & Classification Mask
Now with the Depth, Low Res Depth, Blue Noise as well as the previous frame’s HBAO output, the HBAO algorithm runs to output the Packed Depth & AO as well as a Classification Mask that is needed to stabilize the AO result from any possibly “trailing” or “flickering” effects.
Using the full target resolution depth here is important, so later when we do blur, we do it in full resolution, which will help in avoid bleeding across edges.
In this output “Classification Mask” , the black color means “Unstable” pixels where white color means “Stable” pixels. Unstable means it is possibly either going to trail or flicker.
HBAO Constant Buffer
struct PostProcessingHBAO
{
float4 _vfProjSetup;
float4 _vfProjCoef;
float4 _vfScreenRatio; //1.00, 1.00, 0.00, 0.00
int4 _viNeoScale; //1, 1, 2, 2
float _fDefRadius; //100.00
float _fDefNegInvSqrRadius; //-4.00
float _fDefSqrRadius; //10000.00
float _fDefMaxRadiusPixel; //85.00
float _fDefAngleBias; //0.2618
float _fDefTanAngleBias; //0.26795
float _fDefExponent; //1.00
float _fDefContrast; //1.60
float2 _vfDefFocalLen; //5.44595, 9.68169
float2 _vfDefInvFocalLen; //0.18362, 0.10329
float2 _vfDefResolution; //3840.00, 2160.00
float2 _vfDefInvResolution; //0.00026, 0.00046
float2 _vfDefAOResolution; //3840.00, 2160.00
float2 _vfDefAOInvResolution; //0.00026, 0.00046
float2 _vfDefAORealResolution; //1920.00, 1080.00
float _fUpsamplingFactor; //1.00
float _fDefHQDistance; //20.00
float _fFrameIndex;
float _fMaxRadiusMip0Squared; //16.00
float _fMaxRadiusMip1Squared; //64.00
int _iPadding0;
int _iMouseX;
int _iMouseY;
int _iPadding1;
int _iPadding2;
}
3.Mask Dilation
Taking the “Classification Mask” from the previous step, and stabilize it by dilating the unstable pixels.
From the values passed to this compute shader, you can see that is step between either 0 or 255 (also uses viewport size + 1 to avoid any artifacts by screen edges).
Mask Dilation Constant Buffer
struct DilateTiles
{
int2 _viViewportSize; //240, 135
int2 _viOutlinedViewportSize; //241, 136
int _iDilateMask; //255
int _iPadding0;
int _iPadding1;
int _iPadding2;
}
4.Blur & Selective Temporal Filtering
Blur (Grainy Blur) that is done in full resolution in order to avoid bleeding across edges.
Grainy Blur Constant Buffer
struct PostProcessingHBAOGrainyBlur
{
int2 _viViewportSize; //3840, 2160
float2 _vfInputTextureInvSize; //0.00026, 0.00046
float _fNoiseDelta; //0.25
int _iPadding0;
int _iPadding1;
int _iPadding2;
}
And to put it all together, the entire HBAO workflow
And here are few more examples with some variations & different scene conditions (snow blizzard, fast motion, large crowd, interior with a nice rug,..etc.)
Example 1
Example 2
Example 3
Additional AO Notes
1.Low Contribution
Don’t let the AO image fool you, they always does. The impact of the final AO image quite low and even much less on characters. While you may be able to spot the AO contribution by naked eye for things like buildings or walls (still low but noticeable when shuffle frames), it is nearly impossible to observe it for characters most of the time, even in close-ups.
Let’s take this close-up of Markus at the The Stratford Tower’s lobby.
With tiny shader change (don’t even have to fully reverse it, just modify where the HBAO texture is fetched), you can see the difference between having AO Enabled (my config’s default), Off and Exaggerated.
Not that much impact on characters visuals! It is very subtle, hers is another example that it can be observed much better, look by North’s neck or by the hair at the right side of the face (your left side).
Here is the original frame data
& the 3 different AO values as follow…
Heads-up… Color Spaces, Channels, Conversions & Stuff
During this HBAO sections I’ve altered some images slighting in order to look fine as a PNG. Images used to look very “Red” due to the used format, and at many cases the output itself was inverted, so i inverted them back. Here is some couple of examples:
AO Image
Dilation Mask
2.Not all Skies are made equal!
While looking thought some captures, i did notice the very clear existence of the sky sphere during the AO image processing. And at these cases there were some parts of the frame would be impacted by the far away sky sphere, usually meshes that are far as well (like edges of a building). Being horizon/depth based AO technique, this may be possible to an extent, but what was more interesting that the absence of sky from the AO processing at some other parts of the game, while it was very clear that there is sky sphere present at the final frame. Here are few examples (4 and 4) for frames where sky sphere contributes to the HBAO and for other frames where sky sphere was absent from HBAO but contributes to the final frame.
while the last four examples does not have a skybox during the AO processing
Sky sphere during HBAO can result in things like that, but not the end of the world indeed.
Shadow passes
Total of 8 passes at most. Half of them for clearing atlases (or parts of an atlas) and the other half is to draw new shadows. They’re as follow (in order of execution):
- 1 to 2 passes (most of the time 2) for Top View Sky Visibility.
- 1 pass for Directional Cascaded Shadow map.
- 1 pass for Point Lights Shadows.
- 1 pass for Spot Lights Shadows (including Close-up Shadows).
It is not always the case you get full shadow feature stack in a frame, for example in a daylight time exterior, you probably end up with just the Directional CSM, if in interior you either get the Sky Visibility + Local Lights(Spot and/or Point Lights), or many times the Local Lights(Spot and/or Point Lights) only for interior frames. And there are things that are in-between (like balcony, Night time exterior,…etc.), the point is, it is 3 types of outputs but not always all of them or same combination of them is present under the different conditions.
1.Clear
Before drawing to any of the shadow outputs, need to clear them first. Clearing is not done at full size (explaining below), but the important point here, that clearing usually done for the targets that the game need to draw to. So, if we’re in an interior scene that has CSM only, then we clear the CSM atlas only, which means less passes in general.
Considering this frame
i.Clear Top View Sky Visibility Atlas/es [Not Always]
ii.Clear Directional Cascaded Shadow Map [Not Always]
iii.Clear Spot Lights & Close-up Shadow Map [Not Always]
Tight Clearing
Clearing is not always very refined or not “literally” clearing in the way you expects the words to mean. The area or the defined viewport that we will be consuming to draw shadows of the current frame, is the only space that get cleared/freed from the target shadow atlas, and over the time with frames accumulating, this results in lots of left-overs which (i assume) would make debugging shadow maps quite confusing during development time.
Lots of things not only coming from previous frames or minutes of the current level,
but also things coming from previous levels too.
You may know, or may not, but clearing in this step is not really ‘Clearing’ as what usually “clear” means for a texture/attachment/rendertarget, as we speak here about depth attachment that uses depth format which makes things different. If you’re not familiar with what does that mean, let me explain it quickly:
Clearing here done simply by filling in a black color in the target rect area of a given “large” target shadow atlas (depth attachment) for the existing light sources, so it is not done via something like vkCmdClearColorImage as you may initially think based on reading the word “Clearing”, instead it is done via multiple vkCmdDraw (once per rect/viewport area of the shadow atlas). But what or how to draw?!
In a case like this, nothing easier than just calling a clear to empty everything, but The vkCmdClearColorImage is not used, simply because it “can’t” be used here due to the API specs, this is known & fine. And because this step not using any fragment shaders (remember it is shadow pass, so vertex shader only), drawing here is not really done in the sense of what the word “drawing” means, it is basically a triangle coordinates sent to the vertex shader, like a large enough triangle that covers the target rect/viewport area of the shadow atlas, and use tis triangle’s vertices X & Y position in addition to 0.00 as Z position (depth), and because we are working on specific area of the atlas as a “viewport” that is set by calling vkCmdSetViewport and vkCmdSetScissor, then that large triangle not drawing/painting anything outside the given area. So basically, in a simpler words, draw the depth of a triangle that fully covers the given rect of the atlas with a depth of Zero!
And in action, it is something like this
Red is the draw viewport (portion of the atlas),
and Yellow is the triangle to draw depth for within the given viewport.
2.Draw
i.Top View Sky Visibility [Not Always]
This one is not really a shadow pass that gives shadow output, but because it is done in a similar way to shadow passes, using the same vertex shader, it is sticking next to the shadow passes and do just like them!
This sky visibility or top-view shadow pass, outputs an atlas that can tell what the “sky” can see and what it can’t see (blockers), and this comes handy later for things like rain, snow, footsteps/walkable areas or volumetric effects. So as long as there is a sky, and it is an exterior environment, this is step most likely going to take place (most of the time).
In order to cover the entire level space regardless you can reach or not, it is done in one or two passes (usually two) that each pass is split into two splits of 1024*1024 of D16 (each pass outputs an atlas of 1024*2048). The two splits is just like regular cascades from shadows except that they’re not serving shadows. The splits are one from very far (less details/res), and another one that is much closer (higher details/res).
don’t worry, we will explore it closer below at the Footsteps section.
ii.Directional Cascaded Shadow Map [Not Always]
This is the Directional light (Sun Light) shadow atlas, which is usually made total of 5 splits that are 4 splits for the environment + 1 split for Characters (usually main character). BUT, there are very few cases where total number of splits is either 3 or 6.
Each of the splits is 1460*1460 (at most cases), so the vertical extent of the atlas vary depends on the number of environment splits, but usually1460*5840 with 16bit precision (D16 format). Regarding total splits count change, for example when 6 split we end up with 1450*1450 per split, instead of 1460*1460, which then makes the final atlas size 1450*8700.
iii.Point Light Shadow Map [Not Always]
Due to the nature of the game, this type of light shadows are not there all the time, for example in a daylight shot, probably you won’t have any of these! So, this current frame we looking at for shadows unfortunately doesn’t’ have any! i did choose that frame because it can demonstrate a lot of shadow types in the Detroit engine, but sadly it is super hard to find a shot/frame that can demonstrate all types of shadowing (point light shadows usually @ night, where directional light shadows are usually @ daytime!). But anyways, here is another different frame example from a night time shot, which of course demonstrate point lights.
The atlas for the point lights shadows is of size 8192*8192 (of D16) that is split into about 32*32 shadow blocks/squares/views (pick your favorite name!) or total of 1024 shadow blocks/views in that atlas, that makes each block/view of the size 256*256, and each point light can take up to 6 of these squares for all the pointing directions. Most of the time, i observed that more than half of this atlas is empty even at the most point-light-y night gameplay moments!
iv.Spot Lights & Close-up Shadow Map [Not Always]
The final & the most common shadow atlas that is pretty much exist all the time regardless the scene environment or the situation, as long as there are characters, there is most likely spot lights and/or close-up shadow atlas. The what so called “Close-up Shadows” are basically just local light sources (Spot) but with very very high quality (larger slices of the atlas) shadows, which is usually used with characters when they’re up-close and meant for delivering much more cinematic shadowing for characters..
The spot & clost-up lights atlas usually of the size 11468*11468 with 16bit precision(D16 format) too.
The large parts with Markus, these are the “Close-up Shadows”,
and these are left-overs from previous frame of a close-up few seconds ago
More Shadow Examples
You can take a loot at some extra examples (idea remains the same as described above) so you can see not only the variations of enabled passes per level/scene or frame, but also to see the difference in atlas size and splits sizes per atlas in the different occasions.
4 Passes
3 Passes
2 Passes
Eye Shadow (not the one girls put)
This step is essential for characters shading in order to add some subtle self-shadows to the eyes later during the clustered shading pass. This step always takes place, even if you don’t see eyes (for example single character in screen, the one you play with and you seeing it’s back due to the 3rd person camera), and it is including all characters in the frame, regardless how important they’re or how far from the camera or in-focus.
1.Clear Previous Eye Shadow
First step is to clear the eye shadows rendertarget from the previous frame’s data. Just by drawing fullscreen black triangle, clearing the eye shadows dedicated render target.
Nothing fancy to show here, except a solid colored black rendertarget!
2.Draw Current Eye Shadow
Using a radial gradient sampler to wrap the eye shadowing mesh (a near round unique mesh on top of eye mesh that is used only for shadowing eyes), the dedicated rendertarget get all eye shadows for all characters drawn to it.
The used eye shadowing mesh is something like that one below at this case
Here are some more examples with previous & current (this mesh rendering phase has no impact on motion vectors, showing prev & current for clarity only).
As mentioned that each characters get their eyes draws, this draw per character takes place even if the character’s eyes are not visible in the given view like when a character is not facing the view in any possible way, and because of the backface culling, nothing draws! Here is an example of about ~12 characters you have zero chance of seeing their eyes, and yet all of them draw eye shadows layers that is backface culled at most cases. Ironically, the only two characters you see their face, due to being very far away at the top of the building, their LOD seem not including the eye shadow (make sense) and hence they don’t process.
Impact of eye self-shadowing
Here are quick modification to the pipeline to clear the eye shadows rendertarget before clustered shading, so you can see the impact of that on the quality of characters rendering in Detroit.
Yea, it’s hard to see in that last example =D they either out of focus or with eye closed! But it is there!
So from without eye self shadowing, to self shadowed eyes, it is something like that
Android/Deviant Mind View [Not Always]
There are some times during the gameplay where the game switch to what you can consider like a view of what is going on inside your current protagonist’s head, i call it “Mind View” or “Android/Deviant Mind View”, it show you what you think about and give you the opportunity to make some story twisting decisions. There is other instant of it that (with Connor only) where you rebuilding a crime scene, but it is the same thing, and same idea. But all in all, this is step is not always there, only when you enter the mind view.
This mind view is very unique, and it caught my attention at the first time i played the game, and i was curious to see when or how it is made. The idea is simple, there are some meshes (usually simple ones) are drawn during this step into an empty rendertarget, that rendertarget get composed to the final frame near the end of the pipeline.
1.Clear Previous Mind View
Just draw a solid black color to get an empty slate for the current frame’s data.
2.Draw Mind View
Step by step draw cmds issues to process very simple meshes to end up with a full resolution Mind View map.
Drawing progresses as follow
And this is done with the utilization of some simple meshes (usually quads) and textures. Here is a sample of what used to draw all the quads in Kara’s Mind View.
And for pretty much each character visible in that view, there is a simplified body mesh that at first glance seem to renders twice (as you may’ve observed in the previous gifs) but that is not the case, they are two different meshes that have the exact same topology, except one of them got the faces inverted so tit can render as outlines. And finally a 3rd mesh that is for the inner skeleton of the character.
And this is how both simplified character meshes differ from each other
Footstep Layer [Not Always]
This is almost always there, as long as there is something “outside”, this takes place. So you don’t have to be in an exterior open area for that, because you could be in a small cabin or location, where you just came from the outside, and this is enough to justify that the “Top View Sky Visibility” would have some output results, and hence you will have a Footstep Layer step in the pipeline. So in short, as long as there is Top View Sky Visibility output during shadow passes, there will be Footstep Layer step in the rendering pipeline regardless there are really use case for it or not in the frame. But let’s say you start the level in a room inside a building, then most likely you don’t have this step altogether with all it’s sub-steps. Also keep in mind, that not all outdoors have a Top Level Sky Visibility, so these won’t have footstep layer too!
Don’t let the name of the step mislead you, it is not really a step towards rendering the footsteps of the playable characters (not even the secondary or NPC characters), this step is pretty much generating some data (textures and channels) that holds up information about the walkable areas and their details. You can think about it like “an existing” footsteps before you even touch the ground with your player. A layer that is on top of the ground/terrain.
1.Clear Previous Footstep Layer
By now you should be familiar with the technique Detroit is following, before drawing anything, just fill it’s target rendertarget with solid black. Not all games does that, but it is a good technique, so you have resources initialized & bound once and you keep updating them, either with color data or with solid black to mimic clearing it.
Here the Footstep Layer 512*512 B8G8R8A8_UNORM get filled in black color.
2.Draw Footstep Layer
Drawing the footstep layer is done with the use of some normal and mask textures (brushes) that serve different goals, and with the help of the Top View Sky Visibility to define areas that are walkable.
i believe that the shader made with 2 variations in mind, and if only one needed, then just bind it twice in the texture array
The footstep layer is holding much information, and you can tell from the weirdo color that each channel represents something different, things like Snow Coverage, Wetness, Roughness and Blocked areas (blocked from containing snow or water).
And here are some guide lines so you can related this tiny 512 to the final view.
At this example, drawing the entire things is done in one go, this is due to the scene construction & it’s simplicity despite the fact it looks busy. After all, the entire walking areas in this frame is just the terrain, which is a single mesh (this one below), and hence it draws in a single draw.
But this is not the case for every scene/frame, and with more meshes composing the walkable areas, more masks & normal textures used during the creation of the footstep layer. Let’s take this another example, this one is for areas about the same size in space, but it is made of concrete, which is basically several batches of meshes.
And of course, the Wetness, Snow,…etc. that makes the footstep layer
And to put it in perspective
And finally the draw steps for the concrete meshes that makes the playable area
3.Footstep Extracting [Compute]
At this last step, a compute runs to extract some data from the generated footstep layer texture based on a given “requested” number of footstep and a given buffer of individual footsteps parameters. The output buffer here (the result buffer) is possibly used later down the road in drawing the actual unique footsteps for the playable & NPC characters.
Footstep Extracting Buffers
struct FootstepLayerExtract
{
float4x4 _mFXFoostepLayerXWsToCs;
int _iFXFootstepContextRequestCount;
}
struct FX_FOOTSTEP_CONTEXT_REQUEST_BUFFER
{
float _fWsPositionX;
float _fWsPositionY;
float _fWsPositionZ;
float _fWsFrontDirectionX;
float _fWsFrontDirectionZ;
float _fWsRadiusFront;
float _fWsRadiusSide;
uint _uiHandlerIndex;
}
struct dyn_sFXFootstepContextRequestBuffer_Layout
{
FX_FOOTSTEP_CONTEXT_REQUEST_BUFFER sFXFootstepContextRequestBuffer[];//[2048]
}
struct FX_FOOTSTEP_CONTEXT_RESULT_BUFFER
{
float _fSnowLevel;
float _fWaterLevel;
float _fWetProgress;
uint _uiHandlerIndex;
}
struct dyn_sFXFootstepContextResultBuffer_Layout
{
FX_FOOTSTEP_CONTEXT_RESULT_BUFFER sFXFootstepContextResultBuffer[];//[2048]
}
Before leaving this step, here are couple more for reference (Snow & Wetness)
Cluster Forward Rendering
Now comes to the most exciting step in the pipeline, where everything been prepared so far get into an actual use within the lighting fragment shader. The hard and complicated part (imo) for this pass was already done early in the frame in clustering the light sources & sorting them in a list, all what is needed to be done here is shading the pixels in accordance to these lights in a PBR fashion.
Totally useless note
Few years ago i decided to add a 4th rendering approach to Mirage. It used to have Forward, Deferred and Path-tracing with horrible monte-carlo, and at that time i saw that if i want to really scale larger & produce something, clustered or forward+ going to be the way to go for my engine, shortly after getting it to work, and before i make some material about it (only hinted in a tweet), i decided to retire Mirage in favor of a new engine with better design, Delusion!
This step is very long, and it is divided into some distinctive steps; even though they are not necessarily different passes or such, but by looking to verity of captures, you can get sense of the sub-steps layout for this long pass. Let’s see the progress of a frame (this frame below) across all these steps that are under the “Cluster Forward” shading..
Not Always
Heads-up & keep in mind, any of the following step is not always there, of course except the 1st step. Because sometimes there is no skin or characters on screen, or it is a dark place where there is nothing emissive,…etc. So, apart from opaque step (1st step) all the other clustered forward shading steps could be absent, and that frame above is really a good case as it has pretty much all the steps existent in Detroit’s PBR forward shading.
1.All Geo
First step is to draw all opaque geo (excluding SSS, Hair,…etc.), using some core inputs such as the BRDF, Blue Noise, HBAO, Shadows, Rain and/or Caustics texture, and SH 3d textures.
And the final output for this step for all opaque geo, is a set of rendertargets that starts empty/black at the start of the pass. These outputs are the Color (eventually will be the frame to present), TAA Flags (needed for TAA), GI Specular, Normal Roughness (Needed for SSR & others), Zone ID (Interior vs exterior mask, so not always holding info like current case) as well as Depth.
These previous inputs are the core inputs needed for shading, but not everything input into the fragment shader, the rest are very very Mesh and PBR specific and they can vary for each mesh or draw cmd, either from their count or from their content. Let’s take Markus’ jacket from this capture as an example to explore the different textures used to lit it in the PBR workflow.
These are the unique inputs used for shading the jacket’s fragments of the frame. A big variations of normal, roughness, color & other surface properties, and most likely for multiple and/or different layers (some parts are fabric, other parts of the jacket are leather,…etc.), just the complexity of the material define the amount of the used textures!
And in action, it is something like that, from un-shaded fragments, to shaded ones.
Here are some buffer info used for that magic pass
Cluster Forward Buffers
struct S_INDICES
{
uint _uiObjectIndex;
uint _uiMaterialIndex;
uint _uiMaterialInstanceIndex;
uint _uiLastFrameMaterialIndex;
uint _uiLastFrameMaterialInstanceIndex;
uint _uiPositionsBufferIndex;
uint _uiPrevPositionsBufferIndex;
uint _uiNormalsBufferIndex;
uint _uiPositionOffset;
uint _uiPrevPositionOffset;
uint _uiNormalOffset;
uint _uiMatricesIndex;
}
struct S_MAIN_DEFAULT
{
S_INDICES[32] aMain;
}
struct S_RAIN_MASK
{
float4 _vfUvTransform;
float4 _vfDetailUvTransform;
float _fThreshold;
float _fTopFactor;
float _fSideFactor;
float _fDetailTopAndSideFactor;
}
struct S_RAIN_PARAMS
{
S_RAIN_MASK _RainWetProgressMask;
S_RAIN_MASK _RainWaterLevelMask;
S_RAIN_MASK _SnowLevelMask;
float4x4 _mRainCloudCascadeMatrix;
float4x4 _mRainCloudInverseCascadeMatrix;
float4[2] _vfRainCloudLinearZProjFactorsXY;
float4 _vRainCloudTextureTexelSize;
float4 _vRainCloudSplitDepth;
float4[2] _vfRainCloudStableRadius;
float4[2] _vfRainCloudLinearZProjFactors;
float4 _vfRainCloudWsDirection;
float4 _vfRainRipplesUvScale;
float4 _vfRainDropsUvScale;
float4 _vfRainFlowsUvScale;
uint _uiRainIsEnabled;
float _fRainCloudAttenuationFactor;
float _fRainSplatDepthDampingStart;
float _fRainSplatDepthDampingLength;
float _fRainMinTAAValue;
float _fRainOrSnowFactor;
float _fRainPower;
float _fDummy;
}
struct S_PROBE_GRID_INFO
{
float4 vGridOriginAndIntensity;
float4 vTexelAndCellSize;
float4 vGridEnd;
float4 vRotation;
float4 vColor;
float fOctreeSize;
uint iOctreeHashSize;
uint iOctreeHashSizeBits;
uint iOctreeMaxDepth;
}
struct S_LPV
{
S_PROBE_GRID_INFO _GridInfo;
uint _uiOctreeHashTableIndex;
uint _uiOctreeTexCoordsIndex;
uint _uiSHCoeffTexR;
uint _uiSHCoeffTexG;
uint _uiSHCoeffTexB;
uint _uiPad0;
uint _uiPad1;
uint _uiPad2;
}
struct S_LPV_ARRAY
{
S_LPV[32] aArray;
}
struct S_PASS_MATRICES
{
float4x4 _mViewJitteredProj;
float4x4 _mViewProj;
float4x4 _mView;
float4x4 _mInvView;
float4x4 _mJitteredProj;
}
struct S_EYE_DATA
{
S_PASS_MATRICES _mCurrentFrameMatrices[2];
S_PASS_MATRICES _mPreviousFrameMatrices[2];
float4x4 _mJitteredProjection;
float4x4 _mNoJitterProjection;
float4x4 _mInvViewJitteredProj;
float4[2] _vWorldSpaceCameraVector;
float4[2] _vWorldSpaceCameraPosition;
float4 _vFrameBufferScale;
float4 _vFrameBufferViewportSize;
float4 _vFrameBufferOneOnViewportSize;
float4 _vViewportSize;
float4 _vFogSubCrop;
uint _uiAlignment0;
uint _uiAlignment1;
float _vCameraJitterDeltaX;
float _vCameraJitterDeltaY;
float4 _vfRcpFVolumetricFogDepthTileSize;
}
struct S_CASCADED_SHADOW
{
float4x4 _mCascadeMatrix;
float4x4 _mViewSpaceToStaticShadowMapSpace;
float4 _vfCascadeShadowAttenuation;
float4 _vfShadowCascadeTextureTexelSize;
float4 _vfStaticShadowBlurRadius;
float4[8] _vfCascadeLinearZProjFactors;
float4[8] _vfCascadeParams;
}
struct S_CLUSTER_LIGHT_SETUP
{
S_CASCADED_SHADOW _CascadedShadow;
float4 _vfSkinUVParamsAndNormalBias;
float4 _vfShadowTextureTexelSize;
float4 _vfDummy_SkinFresnelF0AndF90;
uint _uiCascadedUsed;
float _fSSSWidth;
float _fOffsetBias;
float _fDummy;
}
struct S_CLUSTER_INFO
{
float4x4 _mViewToClusterView;
float4x4 _mClusterProjMatrix;
float4x4 _mClusterInvProjMatrix;
float4 _vfLeftEyeToCombinedEyeVsDeltaXZ;
uint _uiFinestMipWidth;
uint _uiFinestMipHeight;
uint _uiFinestMipDepth;
uint _uiFinestMipWidthHeight;
float _fZSliceGeomFactor;
float _fRcpLogZSliceGeomFactor;
float _fDummy0;
float _fDummy1;
float _fNearP;
float _fInvNearP;
float _fFarP;
float _fClusterCoincidentWithCamera;
}
struct S_PASS
{
S_EYE_DATA _EyeData;
float4[4] _avFrustumCorners;
float4 _vTime;
float4 _cFogColor;
float4 _vFogSetup;
float4 _cSunColor;
float4 _vSunDir;
float4 _vProjectionSetup;
float4 _vProjectionCoeff;
float4 _vOutputColor;
float4[2] _vfVsClippingPlane;
S_CLUSTER_LIGHT_SETUP _LightSetup;
S_CLUSTER_INFO _ClusterInfo;
float _fIBLIntensity;
float _fGIIntensity;
float _fAmbiantOcclusionDirectIntensity;
float _fAmbiantOcclusionIndirectIntensity;
float _fNormalSign;
uint _iFogType;
float _fSelfOcclusionDebugRatio;
float _fEmissive;
uint _iEnableSkinSSSSS;
uint _iUseHairOpaquePrepass;
float _fLocalSceneExposure;
uint _uiDebugMode;
uint _sFlags;
float _fTAADropsSharpness;
uint _uNeoScaleX;
uint _uNeoScaleY;
uint _uiDirectOverdrawFilter;
float _fCoderTest0;
float _fCoderTest1;
float _fCoderTest2;
uint _uiBakeOutput;
float _fDefaultAmbientIntensity;
float _fDefaultLightingIntensity;
uint _uiOutputColor;
float _fAARotation;
float _fClampDirectSpecular;
float _fNearPlane;
float _fFarPlane;
float _fNDFFilteringScreenSpaceVariance;
float _fNDFFilteringThreshold;
float _fNDFFilteringScreenSpaceVariance_EyeShader;
float _fNDFFilteringThreshold_EyeShader;
float _fSurfaceGradientTextureFiltering;
float _fZero;
uint _uDepthClamp;
uint _uiLightDebugMode;
float4 _vfFragCoordToDepthTexFactor;
float4 _vfFragCoordToDepthTexFactor_HalfRes;
uint _uHdr;
uint _uiDrawMayaSwatch;
float _fEyeRoughnessBias;
float _fDummy1;
}
struct S_LIGHT0
{
float4x4 _mVsToLs;
float4 _vfVsPos;
float4 _vfVsDir;
uint _uiType;
uint _uiVolumetricData;
uint _uiSceneZoneBits;
float _fSpHumbraOut_PrCUTranslate;
}
struct S_LIGHT1
{
uint4 _uiLightColorIntensity_Flags;
float4 _vfAtlasScaleBias;
float4 _vfAtlasScaleBiasStatic;
float4 _vfLinearZProjFactors_Static;
float4 _vfSpPrProjectedScale;
uint _uiGoboTrans;
uint _uiGoboScale;
uint _uiGoboIndex;
float _fPenumbra;
uint _uiScatteringAndShadowAttenuation;
uint _uiBlurRadiusNormalAndStatic;
uint _uiNearClipX;
uint _uiNearClipY;
float _fShadowBias;
float _fInvRadius;
float _fSpHumbraIn_PrCUScale;
float _fDummy;
}
struct S_LIGHT_ARRAY0
{
S_LIGHT0[512] _aArray;
}
struct S_LIGHT_ARRAY1
{
S_LIGHT1[512] _aArray;
}
2.Skin, Eye & Teeth
Here we lit things such as Skin, Eye & Teeth, but yet we don’t do full SSS, just lit the geometry with light sources & GI data (geometry already leaving empty/black fragments in the rendertargets so far from the previous step) as well as light scattering in the surface, so we basically end up with lit head without albedo & smooth SSS.
Shading these “special” meshes requires a set of special textures as well, just like before, won’t be looking into each mesh, but let’s focus on shading the head of Markus (no hair, no teeth, no eyes, nothing just pure head flesh mesh).
these are the required textures to not only detail the skin in terms of color and normal and output a lighting pass for the head, but also to fake wrinkles and to write so SSS info that is needed soon.
Middle, Wrinkles
Bottom, Coloring (SSS, Cubemap, Base Color, Roughness,…etc.)
And of course, here is the output of this step, same set of rendertargets from previous step, except that we wrote to them the new info regarding skin, teeth & eyes with the addition of a single blur (horizontal), this is why the lit/color output from this step can also be referred to as “Diffuse with Horizontal Pass” as it includes diffuse lighting plus the 1st blur pass for Skin/SSS.
Bottom, output of this step (Skin, Eyes & Teeth)
Now you seen the rendertargets that got changed, but this is not everything, as there are new data that was not exist before. There is albedo values are drawn into an albedo stencil, also a mask values for skin/albedo stored as well during this step within the alpha channel of the color output, and finally a stencil value into the depth output. These are needed to isolate the parts of the frame that will need to be processed in the SSS pass later (to form a stencil).
Keep in mind that the Stencil, this is the final depth/stencil attachment which is in the format D32S8, this is not the one used as input for most steps, this is output only, storing info in it, but the one used as input (for shading or screen space effects) is the one of the format D32 to be sampled as depth.
Apart form that, the last output is the Skin/Eye/Teeth specular lighting output. Don’t let its look fool you, as most of the time it looks very funny & bizarre (and you probably know why), but if we used any of the previous output masks from this step (Color’s A channel, Stencil or Albedo) we can see truly the skin specular lighting output.
All in all, a lot of outputs in this step that will be all needed later in the SSS.
Also while here, as you may or may not notice, the last rendertarget (aka Zone ID) in the outputs of this step is solid black still, so one thing i want to elaborate on here, previously i mentioned that the Zone ID output is used to determine the Interior vs exterior space and used like a mask more or less (possibly for eye adaptation, light propagation, …etc.), but yet we don’t see anything and it is solid black, this is because this frame/scene is fully an exterior, but this mask rendertarget can shine else where, here are some examples.
Black is outside, White is Inside…regardless you’re Inside or Ouside…
3.Emissive
As the name implies, a bunch of emissive meshes! no much lighting though, this is the meshes themselves, their light was either already done during opaque step for the affected meshes surrounding them, or comes later in the pipeline (at fake flares step or as a result of bloom).
Bottom, output of this step (Emissive)
And here are couple more examples before & after drawing emissive to the frame
Middle, output of this step (Emissive)
Bottom, swapchain (for reference)
Keep in mind, this step is not always running as an individual step like this previous examples, at some other times emissive are drawn as part of opaque step (1st step), like these cases below
Middle, output of previous step (Skin, Eye & Teeth)
Bottom, swapchain (for reference)
And there’re other cases where some emissive meshes are done at the early step (with opaques at the 1st step) and at the later independent emissive step as well, such as these ones below.
4.Sky/Background
Just draw sky box or hemisphere
Something like that in action
This step don’t have to be really sky only, it can include the very far distant background planes. So it can be something like city buildings or dense foliage,…etc. but the idea remains the same, it is a these large far away planes surrounding the scene. Here are couple examples for such cases
5.Specular Lighting (IBL)
Outputting a Specular Lighting Map for Image Based Lighting with the use of lots of offline cubemaps (baked at cook/edit time & none captured at runtime)
Specular Lighting/IBL
struct S_CASCADED_SHADOW
{
float4x4[2][8][2] _mCascadeMatrix;
float4x4 _mViewSpaceToStaticShadowMapSpace;
float4 _vfCascadeShadowAttenuation;
float4 _vfShadowCascadeTextureTexelSize;
float4 _vfStaticShadowBlurRadius;
float4[8] _vfCascadeLinearZProjFactors;
float4[8] _vfCascadeParams;
}
struct S_CLUSTER_LIGHT_SETUP
{
S_CASCADED_SHADOW _CascadedShadow;
float4 _vfSkinUVParamsAndNormalBias;
float4 _vfShadowTextureTexelSize;
float4 _vfDummy_SkinFresnelF0AndF90;
uint _uiCascadedUsed;
float _fSSSWidth;
float _fOffsetBias;
float _fDummy;
}
struct S_CLUSTER_INFO
{
float4x4 _mViewToClusterView;
float4x4 _mClusterProjMatrix;
float4x4 _mClusterInvProjMatrix;
float4 _vfLeftEyeToCombinedEyeVsDeltaXZ;
uint _uiFinestMipWidth;
uint _uiFinestMipHeight;
uint _uiFinestMipDepth;
uint _uiFinestMipWidthHeight;
float _fZSliceGeomFactor;
float _fRcpLogZSliceGeomFactor;
float _fDummy0;
float _fDummy1;
float _fNearP;
float _fInvNearP;
float _fFarP;
float _fClusterCoincidentWithCamera;
}
struct S_CLUSTER_LIGHT_SETUP_INFO
{
S_CLUSTER_LIGHT_SETUP;
S_CLUSTER_INFO _ClusterInfo;
}
struct S_IBL_PASS
{
float4x4 _mInvProjMatrix;
float4 _vfProjSetup;
float4 _vfViewportTranslationScaling;
int4 _viNoiseOffset;
uint _uEyeIndex;
uint _uEyeIndexAlignment;
uint _uDebugDisplay;
float _fFarPlaneZ;
}
struct S_IBL
{
float4x4 _mVsToLs;
float4x4 _mLsToWs;
float4 _vfExtentSceneZone;
float4 _vfParameters;
float4 _vfLsBoxMin;
float4 _vfLsBoxMax;
uint _uUVSpecularEnable;
float _fUVSpecularBlend;
uint _uiSpecular0TextureIndex;
uint _uiSpecular1TextureIndex;
}
struct S_IBL_ARRAY
{
S_IBL[128] _aArray;
}
6.SSSSS (Screen Space Subsurface Scattering)
In this step we will end up with the soft skin, by applying the 2nd skin blurring pass, which is the “vertical” one, but this not taking place right away, below are the steps helping in achieving that.
Keep in mind Because the implementation here seem very loyal to the original implementation, it uses a stencil buffer to define the exact SSS pixels. Hence there is no wonder seeing some calls during this entire step to things such as vkCmdSetStencilReference, vkCmdSetStencilCompareMask and vkCmdSetStencilWriteMask.
i.Prepare the Color Attachment
Just clear the rendertarget that will hold the final color data of that pass (fullscreen triangle fill in black color).
ii.Vertical Blur
Earlier while drawing the head, we went through a 1st round of blur to the lighting/scattering result, it was horizontal blur, at this step here we go into the vertical blur, so we end up with the full cross blur filter. (results for that frame may not be that clear, we come to this soon below)
iii.Compose Skin to the frame
In this last step, will be composing the albedo, specular as well as the smoothed lighting output (SSSSS) to the diffuse lighting output in order to get a final lit color frame that includes the Skin meshes as well (last time from the Sky/bg step, the frame did not include skin meshes).
You can think about this last step literally like a photoshop layers compositing step, just putting the layers we’ve together on top of each other with the right blend operations to form a nice image, just like that:
Well, it is what realtime 3d graphics about, realtime photoshopping at rate of 60-120 images per second with shader code instead of a toolbox!!
The impact of SSSSS in this game is quite gentle but impactful, and maybe this previous frame doesn’t really show it shines. So here are some SSS focused captures.
You can compare 1st & 2nd columns to see changes due to blurriness
And because it is always good case study to either disable or boost such subtle effect’s values in order to see it’s impact, here are some toying with the shader values, in the images table below it is a comparison between the effect is 100% off or boosted to a very high value.
These two are my favorite exaggerations in this article so far =D
7.Volumetric Scatter(Volumetric Light Volume) [Compute][Not Always]
The target for this step (including all sub-steps) is to generate the “Volumetric Light Volume Texture” with some temporal component (counting on previous frame’s scattering & extinction) and at this case it may be better calling it “Temporal Volumetric Light Volume Texture“. The method used here clearly inspired from Frostbite’s “Physically-based & Unified Volumetric Rendering”. Let’s go through the process…
i. Depth Tile
Using the depth attachment, outputs a 16*16 tiles (256 tiles in total on XY) linear depth image (also known as Depth Min/Max Tile). For the given 3840*2160 target resolution, when divided by the tiles size of 16*16 we end up with the resolution of 240*135 fore the depth tiles image.
Compute Depth Tile
struct COMPUTE_DEPTH_TILE_CONSTANT_BUFFER
{
float4 _vfProjSetup;
int4 _viViewportSize;
float _fInvTileSizeX; //0.0625
int _iTileSizeX; //16
int _iTileSizeY; //16
int _iTileTotalPixelCount; //256
}
ii. Participating Media Properties/Material Voxelization (Scattering/Extinction)
iii. Clear some RW buffer
iv. Light Scattering or Froxel Light Scattering (Froxel Light Integration & Temporal Volumetric Integration)
To overcome volume possible flickering, TAA with blue noise jittering + Temporal volumetric integration with reprojection of the previous Scattering and Extinction.
v. Write to some RW buffer
vi. Scatter Radiance
vii. Final Integration (Final Participating Media Volume. Integrate froxel scatter/extinction along view ray)
Integrate along the view ray of the camera frustum (gif below) and output the final volume 3d texture, a 240*135*64 (64 depth slices) of the format RGBA16_FLOAT aka Volumetric Light Volume (much more volume depth than a game like Death Stranding which was 240*135*48 – RGBA16_FLOAT for 1080p where here it is same output for 3840*2160 so there we /8 here we /16).
viii. Particles Voxelization [Not Always]
This step seem to only take place wherever there are local light sources (point lights to be specific) and particles, so in a daylight area that is lit by directional lights only, or in a closed interior that probably have no particles around (regardless daytime or nighttime), this is usually not going to take place, best fit for this would be outdoors at night time where there’re a lot of light poles or other point light sources. In order to let particles show correctly inside a volumetric volume (and possibly impacting any shadowing), they need to go through this voxelization step, using some of the shadowmaps as well as the SH…
in addition to a mix of special light or particle masks (both) such as these ones
These ones above are the ones used for that frame, but here below some more random samples from ones that are used with another different areas around the game.
And finally, here are the final volumetric light volume texture alongside the final frame for some more examples!
These examples above are what i would call “intense” examples, where you can clearly see volumetric effects contribution, but pretty much all the time there are volumetric effects in a way or another that forces the active scene to go through all the previous steps to get a participating media volume, but could be super subtle to a degree that you would think there isn’t any! Here are few examples..
Transparency
In order to render the last existing primitives or scene data to hand-off the frame to Post-Processing, the engine needs to generate a Transparency Map that includes the remaining meshes (which are all translucent). This takes place in few distinctive & interesting steps. In short,
- Everything will be drawn into a full resolution translucency map.
- Then some billboards for fake flares are added to that map (usually nearby or high quality fake flares), you could consider this billboards fake flares step as a part of the full resolution translucency step if you wish, i just distinguished it as separate step as it is done in its own pass and it use its own unique shader.
- After that motion vectors get updates to include some of these translucent objects, only objects that are relevant & matters to motion vectors.
- Then draw the last few translucent object into a new half resolution translucency rendertarget (usually flipbooks or far away billboard flares).
- Finally the half resolution translucency get combined with the full resolution translucency in order to output the final Transparency Map that is in size of target resolution (aka full resolution).
With that said, let’s go into details of each step.
Full Resolution Translucency
With some heavy use the of the volumetric output from previous step (240*135*64 – RGBA16_FLOAT), the SH 3d textures, cubemaps, or some other familiar textures from earlier steps such as shadowmaps or hair masks, and some particle detailing textures, all translucent objects (particles, windows, foliage cut-out,…etc.) get drawn in full resolution, then followed by hair (hair, eyelash, eyebrows, beard,..) by the end of the pass. In the middle of all this, things that cand be considered as decals such as cutouts or stamped thing are drawn too. And of course, every thing is perfectly instanced, like particles or hairs in a beard or head.
Just keep in mind, the main distinguishable feature here, that this is done in full resolution.
And in action, it is something like this:
During these draws, the inputs vary, apart from the 3d textures mentioned earlier such as SH, Volumetric volume, cubemaps or other things such as BRDF, noise, hair masks, shadowmaps, there are some artist authored fancy FX textures such as these ones below:
Decals
Regarding decals that draw in the middle of this pass, nothing special, except that there are fixed amount of maximum decals that can be drawn, as there’re 3 decal arrays going around, each with the max element size of 819 elements, each entry of the array is a struct which is made of two elements, a 4×4 to-local space transformation matrix (64 bytes) + float4 extents (16 bytes) with total size of each decal entry is 80 bytes. This 80 * the 819 entries = 65520 bytes as the size for each of the decals arrays.
And if you did not see much decals previously, you may be observing something in that following frame
And before leaving this section, here is another full translucency example, as we will need this one much more later to demonstrate the “Half Resolution Translucency” that may won’t be super visible in that previous Snowy-Night frame.
Billboard Fake Flares & Emissive
i call this fake flares because most of the time it render what looks like flares, but they’re not actually lens flares. They share the look of lens flares, and they share the billboarding behavior, but they are not showing as a secondary response to the some lighting conditions, they’re just put in there.
Using the full resolution translucency as a slate, billboards draws into it using some flares/halo like textures.
Here is another different frame that has much more things drawn, and they all using that single R8_UNORM below. Funny enough that the previous frame used multiple flare/halo texture where you can barley see the impact, probably only observing the light posts. Where that frame below uses single halo texture but draws a ton with it!
And because we later we will be going through “real” flares, i’ll leave this last example here, so we can differentiate between the fake flares (here) and the real flares (they come soon for the exact same frame below by the end of the post-processing step).
Here is what used to produce the billboard flares & halos in that last frame
Motion Vectors Part 2 (Alpha-Tested/Translucent Geometry)
At this second part of revisiting the Motion Vectors rendertarget, it is the time to draw any relevant (to motion) geometry that are missing, which are at this moment the translucent objects that been processed during the past few steps as part of the full resolution translucency.
Keep in mind, not everything translucent going to be drawn to motion vectors, only relevant things, which most of the time hair/beard as well as any translucent “moving” object, such as a window car of a moving car!
While there are some drawing happens, but i bet you can get a single difference between the input and the output! The issue here that most of things get drawn are far away, and Markus hair is so so so tiny to a degree that it is not exist! Not to mention, that there isn’t much movement in that moment! Here are couple of examples with some actual changes to the motion vectors to include translucent objects.
Format & Size of the Motion Vectors rendertarget did not change, we just draw extra stuff to it at this step.
Down Sample & Linearize Depth
A simple fragment shader that just does a linearization of the depth image (format change) into half of the target resolution. The output attachments of this step is needed not only in the next step (Half Resolution Translucency), but also very soon in the post-processing steps for things such as SSR.
except that it is just get attached here as a depth attachment
This step is similar to an earlier step that done almost the exact same thing, except that the linearized depth earlier was of R32_FLOAT but this time it is R16_FLOAT, but at both cases, outputs were at half of the target resolution.
Half Resolution Translucency [Not Always]
There are some remaining translucent objects that was not yet considered. These are usually between far away flares that needed in low quality or flipbook effects such as fire or smoke. These things are rendered in half resolution (for some reason). But because this is can be considered as individual cases, this step is not always present, only when there are some relevant objects or flipbook based particle systems.
And for that example, if you notice, there are fare away billboard fake flares (red & blue) that ususes the 1st & 2nd texture below. There is also a tiny bit of smoke on the ground, that possibly coming from a gutter, and this one is using the 3rd texture below (the flip book). And finally there is a large but fare away billboard fake flare hiding behind a building, and this one is using the last texture in the textures below.
Because this half resolution step shines more when you see something larger than the few things in the previous example, here is another example where you can see clearly the a dust effect get rendered that takes a good enough portion of the frame.
This one is using a flipbook too, which is as follow.
Combine translucency
At this final step, the half resolution translucency image get upscaled and combined into the high resolution translucency in order to output the final full resolution translucency or better call it the Transparency Map.
i find this interesting an bizarre, to not render everything at once into a full resolution translucency map, specially where you can clearly see that degradation in quality for the effects that are rendered in half resolution (such as the dust in the last example). i don’t judge or question the developers, they defiantly got their own reasons for that, but it is quite interesting. i personally would not hesitate to adopt such a technique, i see some benefits in there, but only if i’m sure that nothing from the half resolution translucency will be rendering anytime close to the view (just to not be at the cost of the quality).
Here is another example with a flipbook!
The half res particles in this “Kara & Alice” frame looks much better & smoother than the previous examples, except that, it is much more translucent at the final frame, but at least you can tell about the quality & smoothness from the Half Res Translucency rendertarget itself (2nd image).
Ironically, the used flipbook with this “Kara & Alice” frame is much more smaller (1024*512) than the one used in the earlier Markus’ frames (which was 2048*2048)!!
Earlier mentioned that it is not always the case that we get a “Half Resolution Translucency” step, and regardless we end up with a half res translucency or not, the combine step will still take place, except that the half res translucency image will be empty. So, all in all, the combine step has to take place even if there is nothing to combine.
SSR [Compute]
SSR is entirely compute based (Stochastic Screen-Space Reflection) and it is done in half of the target resolution. Let’s see how this works!
1.Prepare & Tile Classification
While the main target for this step is to prepare for the upcoming SSR steps, but there will be multiple outputs that are not SSR only related and are general frame related. First, the shader will composite the transparency map into the colored frame in order to get the near final colored frame that is ready for any post processes (lit opaque + translucency). Then, generate the TAA Transparency Threshold Tile (or the Classification Mask that will be used to decide where to put the heavy work for reflections ray marching), you may recall it from earlier from the HBAO step, but earlier the used version was the one came from previous frame, and now we update it to current so it can be used in the next frame. And finally, generate the classification mask for SSR raymarching. All that made with the use of the different specular images, as well as the Transparency, Color, Depth and of course Depth tiles Min/Max.
Regarding the TAA Transparency Threshold Tile (or the Classification Mask), here are both side by side, the one used at the start of the frame (HBAO) and the current one that outputted by the end of the frame (Post Processing’s SSR Prepare step).
2.Pre-Filter
At first, a multiple dispatches executed to prefilter (you can tell from the blurriness) & outputs the mip chain for the given SSR Classification image. To be exact, 10 dispatches to end up with 11 mips, from 1920*1080 to 1*1.
This image is one tells what is important & what is not, and based on it we will be shooting rays. This is why mipmapping comes to be very handy, because if we count on the mip0, there will be a lot of work that is possibly not contributing at all to the final reflections, hence going doing mips make things less “noisy” or less “busy” and hence making rays allocation much more conservative.
3.Trace & Resolve Reflections (SSR Image)
The goal here is to raymarch & get the base for the reflection image that we will be refining in the upcoming remaining steps. As you will notice there is heavy usage of noise below, that is for the GGX rays distribution (importance sampling) as well as for the jittering to reduce the banding artifacts
The Reflected Color’s alpha channel is holding a mask for SSR/Reflectivity.
The output Reflection Motion Vectors is going to be used to prevent the moving objects from generating smears the reflection image. Motion vectors in this yellowish image are basically holding the motion vectors from the input Motion Vectors image (the green one) only at the ray intersections points.
4.Temporal Reprojection
As you already noticed that the jittering (for banding artifacts reduction) generated a little (well a lot) noise reflected color image. But this is fine, not the end of the world. At this step it will be taken care of by temporally reprojecting against the previous frame.
As a secondary output to the temporally reprojected current frame’s SSR Color image, there is the SSR Tiles (or Clamp Event Tile Texture), this includes 16*16 texel tiles, (240*135 Tiles image that covers the target resolution 3840*2160). This one is needed in the next remaining few steps, so we can have a mask we can count on that won’t result in some “flickering” output reflections.
5.Dilate Tiles (SSR Tiles)
Just and stabilizing the image by dilating some unstable pixels (don’t against the value of -65)
Not sure why exactly -65, but there could be other values that worked a little better, such as -20, but this is out of question, not because it works a little better in that frame, it works better across the board!
6.Temporal Filter
A Temporal AA for the SSR color using the previously dilated tiles mask
7.Bilateral Upsampling
Just a Bilateral Upsampling on the frame’s color, so we end up with the SSR diffuse mask (will be very soon to composite SSR to the frame).
By reaching this point, we can consider the SSR is complete, as we have the SSR image out, compositing it to the frame happens later, but not too much later, as there are other things stick in the middle, so this is why you don’t see composing SSR image listed here as part of the SSR section.
Refractivity Mask [Not Always]
This step is not always present in the pipeline, it only takes place wherever there are some objects that are meant to demonstrate some refractivity. This is not necessarily glass, as most of the glass in the game (windows for example) is not refractive surfaces and handled in the transparency step, but this can be something like car head/rear lights, a glass bottle/cup with some liquid inside, water surface such as a pool, or even some rain or ice drops off a roof.
None-Refractive Glass (100% on Transparency)
Here is an example of a frame full of glass, but none of it handled as “refractive” and hence, there is no “Refractivity Mask”, all done just in transparency !
You can not only observe the absence of all glass surfaces from the input Color image (1st image), but also
you can observe their absentee from he Depth attachment!
Not: i put the SSR output here for clarity, as some surfaces in the “Color + Transparency” image are black due to missing SSR
Let’s now see how the Refractivity get added to the frame that needs it.
1.Clear the Mask
Drawing a large solid black color triangle, to prepare it to the next step where we draw stuff to it.
2.Draw Refractive Surfaces
Just process refractive surfaces one by one into the mask! For example, here there are 3 refractive surfaces in that frame
A 3840-2160 – R8_UNORM
And here are some more examples for refractive masks across the game
Bottom: Swapchain
The “Android/Deviant Mind View” always have a large plane that is refractive, usually most of the frame is behind it (except the protagonist)
SSR Composite
Put the SSR on the Color we’ve (Color + Transparency) to end up with reflections in the current frame’s color using the diffuse mask.
As Detroit streets are full of rain, here are some more examples breakdown that shows the beauty of SSR in this game!
SSR Composite Examples Breakdown (Unfold if you dare!)
Well, not all about Rain…there is one about Lake!
If you want to compare before & after, then check the 1st and 5th columns.
Composite Refractivity Mask [Not Always]
Only if there were Refractivity Mask generated earlier, it is the time to run refractivity shader on the pixels that are covered by the mask!
The interesting observation, that you may (or may not)’ve observed, that so far if the frame (which is not the case in many of the previous frames) contains something should be refractive, that it was not drawn to the frame yet and there is an “empty” space for it.
With SSR and Refractivity added to the colored frame, the frame is ready to jump into the pos-processing queue.
Post Processing
Post processing is quite long, interesting & full of cool ideas, the main distinctive feature here that it is almost fully done in compute…almost!
TAA [Compute]
Detroit counts a lot on Temporal Anti-Aliasing passes, here is the main TAA pass that is going to resolve the entire frame, but earlier you may’ve noticed multiple utilizations of TAA passes, such as very near at the top by the end of the SSR step, or much earlier in the frame in things such as SSSSS, HBAO or Volumetric fog/lighting.
TAA in Detroit is almost very straight forward, just jittering & accumulating frames and using motion vectors to help in identifying pixels between previous & current frames. Except that there is one unique input/output that you don’t see in most or standard TAA implementations (and it is why i liked the TAA results here more than some other games), which the the edges detection element or “Contours” that is used to clean up as much ghosting as possible from previous accumulated frames.
Pretty much each input from previous frame (Depth, Contours or TAA Color) will be replaced with an output that represents the current frame, so it can be used with the next frame if applicable. And of course, the contours here are used as a guide to reject previous frame pixels in order to reduce ghosting. So it is neat way to mitigate the ghosting problem, but not every ghosting problem!
While is sounds good on paper, and while i like the output most of the time, but when disable the responsible code lines in that compute shader, you can see that sometimes ghosting looks much better and smoother and less “fuzzy” around the cleaning-up edges, but unfortunately this is not something that we can judge 100% from a nice still image, it’s better in motion, which i can’t test.
Regarding the amount of work TAA does, you can look to these images below that are a R32_UINT version of the input current (not TAA) & previous color (with TAA) images as well as the output Color of the TAA step (has TAA). You can easily spot the impact of TAA in the amount of noise in this type of view.
Back to the outputs of the pass, now you may ask, why we’ve two different contours as outputs?
The first one (the full resolution one) is going to be used in the TAA step of the next frame, as a “Previous Contours”, where the Half Res Contours is needed during this current in-pipe frame, and to be exact in the next major post-processor step, which is the Motion Blur (done in 1/2 res), in order to prevent/reduce the TAA bleeding streaks/haze on the pixels that are near to depth discontinuities (aka Depth Discontinuities Rejection). So, one for the next frame (for TAA) & one for the current frame (Motion Blur & DoF maybe, we’ll see).
Totally useless personal opinion on TAA
i hate the idea of TAA still being called the “industry standard” anti-aliasing technique! As a gamer-first, i don’t mind playing a game with an aliased footage rather than playing it with washed-out/blurry details or a ton of ghostly moving objects (not all TAA games suffer from severe levels of this of course).
Detroit has one of most interesting & best TAA implementations with some nice improvements as it seem, and yet you can still see loss of detail & ghosting at ‘some’ places under specific conditions. i’ll refer to screenshots from the TAA On/Off comparison made by Quantic Dreams at their GDC18 talk, and let you judge the quality, just look Connor’s suit!
What do you think?!
And you can watch this slowed down video, to see the level of ghosting in that section of the game. Connor’s hands are the moving objects, i’m sure you won’t miss it!
Of course such a surface/texture detail is a big challenge to most of TAA techniques!
Blurry & ghostly, but yet still looks a lot much better and much more stable than most other games (like Unreal 4 or 5 games for example)! Seeing some ghosting here doesn’t mean that you see it quite often in Detroit, nope, you don’t, but it is not 100% vanished.
So please, try as much as possible to avoid TAA if you can’t invest the time & budget to make it right or near perfect (such as in Detroit), it was good technique for few years when it show up, but then it became silly, and i can’t believe that we still stuck with that horrible technique till today! Temporal techniques in general are good (or even great) for many use cases, but AA is not one of these favorite use cases!
If you agree on that, if you’re anti-TAA, please help yourself & enjoy that random community on reddit r/F*$#TAA, posts & opinions there will help you in being better person who hates it even more than ever!!
If you wondering why i dislike/hate TAA, and/or if you did not notice already from the frame we looked into it’s details, here is a closer look to before & after in that frame (you can always go back and compare the image above). I did this comparison using the ImageComparator FREE tool, have a look..
It is indeed better to look to the frames themselves in fullscreen (original) size, instead of these gifs (due to gif size & compression)
- Loss of detail in the marble surface (from marble fountain to iron one) and the reflections, 100% killed the intended look by the artists!
- Skid marks or wet ground reflections gets a downgrade & not looking as good as it was before the TAA magic!
- Overall loss of details & quality, from visible characters faces & trees details, to kinda blurry image, full of “blobs” of colors!
Looking to TAA’s output vs input would most of the time makes me feel like a Peter Parker at his first morning of being a spiderman!
Exposure/Scene Exposure [Compute]
Also you can call it many names like Scene Zone Exposure, Averaging Luminance, Histogram-ing or Eye Adaptation.
1.Half Color
First step is to get a half color output of the format R11G11B10_FLOAT (same as the format of the upcoming Pre-Exposure).
2.Clear Intermediate Luminance
Now clear the Pre-Exposure image from the previous frame’s data, it is a 120*68 of the format R11G11B10_FLOAT. Clearing here as used to be done across the pipeline, just draw solid black color values.
3.Pre-Exposure (Intermediate Luminance)
The Pre-Exposure or Intermediate Luminance step is important to ramp the scene color’s range in a range similar to or around the previous frame (which is after exposure). This is done by using the previous’ frame scene exposure to get an intermediate luminance or pre-exposure image (120*68 – R11G11B10_FLOAT).
4.Current Scene Exposure
Just averaging luminance & write the new exposure value for the current frame!
While previous & current exposure values may look identical in this last step, and maybe the image exporters failed me in storing these RGBA16_FLOAT into regular PNGs as photoshop color picker or such would show that both are exactly the same color value, but in reality the debug value were fairly different between current frame’s exposure & previous one.
Previous: {0.10602, 1.00586, 0.00017, 0.00}
Current : {0.10608, 1.00488, 0.00017, 0.00}
One note to keep in mind, that each area get a different pre-defined scene exposure value in EV100 (possibly based on some artistic test & taste), this street frame above for example using the value of 12.3f, but here are some exaggerated values for that, just to learn the impact of it on the final frame.
Motion Blur [Compute]
Motion Blur is done entirely in 1/2 of the target resolution (so here @ 1920*1080 on 3840*2160). Let’s look at a frame that demonstrates some Motion Blur…
This is a Swapchain final frame, so don’t let things like DOF confuses you!! Focus on hands, legs or grass!
1.Motion Blur Map
At this first step, the main goal is to get the Motion Blur Map & Depth Velocities out of this dispatch using the TAA (smoothed) depth and Motion Vectors. The given motion vectors is outputting a smaller version, this 1/2 velocity out is also can be called “Motion Blur Map” as basically wherever there is motion velocities at certain delta, there are motion that we need to “Blur”!
2.Velocities Tile Map
At this next step, the target is to generate a Velocities Tile map from the Motion Vectors.
The target single tiles size in the Tile Map for this game is 32*32, so the output Tile Map by the end of this step for this target resolution of 3840*2160, would be 120*68 (120*67.5 to be exact). This takes place in few steps to get correct coverage.
i.Base Tile Map
From the full resolution Motion Vectors, downscale to get the Max Tiles out.
ii.Cover Horizontally (U)
Horizontal max (scale on X)
iii.Cover Vertically (V)
Vertical max (scale on Y)
So putting the Tile Map progression next to each other…
but, if you right click on images and “Open image in new tab”, you will open the upscaled detailed version.
3.Alpha Map
While the Tile Map and Blur Map are great representation for the moving areas & deltas in the frame or can give an approximation of it, but in reality it could end up in doing lots of needless work when blurring pixels. Hence comes the importance of using these mentioned earlier to generate an Alpha Map that can be used not only to apply the Motion Blur but also to enhance the work of the contours image in rejecting the pixels with smears coming from TAA (next step).
4.Rejection (Depth Discontinuities Rejection)
Use mainly the TAA contours to reject any TAA bleeds near depth discontinuities.
6.Apply
Apply Motion Blur to the frame.
It doesn’t happen all at once, but it is hard to isolate steps of a single dispatch from a replay!
Motion Blur Constant Buffer
struct POST_PROCESSING_MOTION_BLUR
{
float4 _vfProjSetup;
int2 _viWorkSizes;
int2 _viWorkHalfSizes;
float2 _vfFbSizes;
float2 _vfFbHalfSizes;
float4 _vfMbParams;
float2 _vfNeoScale;
int2 _viNeoScale;
int2 _vi2xNeoScale;
int2 _viTileSizes;
float _fUpsamplingFactor;
float _fInvUpsamplingFactor;
float _fPadding0;
float _fPadding1;
}
DoF [Compute][Not Always]
Depth of Field in Detroit is not the best in the world, but it is not either, i would say it is just in the sweet spot where a ton other games are falling.
Depth of Field step is not always present in the pipeline, as you may have guessed yes, cinematics always have this step, but also some parts of the gameplay are demonstrating Depth of Field, so it is not totally absent from gameplay, perhaps it is the parts that you can freely call a “cinematic gameplay”.
Because DOF is one of the very few post-processors that i adore, let’s have a look this time at two different frames simultaneously! Here are the frames..
1.Prepass CoC
The CoC prepass is all about preparing the CoC tiles that will be used to create the actual CoC images later. This takes place in two distinctive steps.
i.Generate Near & Far Tiles (Min/max)
Create tiles min/max from the give “Anti-Aliased” depth. bare attention to that “Anti-Aliased”, as it is a main reason to do erode later.
The output looks fancy, this is because each channel is holding a piece of information either Min or Max and for Near or Far. here are the channels breakdown!
R & B is Near Min/Max
G & A is Far Min/Max
ii.Tile Neighbors
This will output the final CoC Min/Max tiles.
And of course, as the previous step, this holds similar info per channel, here are the channels breakdown!
2.Near & Far Coc
Use the generated CoC tiles Min/Max to isolate the Near & Far color from the frame colored image (that will get blurred and to be called later DOF Near and DOF Far), and the CoC Near & Far from the Anti-Aliased Depth.
3.Erode Coc
This is important step to avoid some artifacts such as leaking that can be mostly around edges near depth discontinuities. Here we Erode the Circle of confusion images (Near & Far, generated in the previous step) to avoid/decrease leaking that comes as a result of the application of TAA on depth directly (earlier during TAA that gave us the TAA depth) that is used mainly to compute the CoC at the previous 2 steps of DOF. Other games get away by applying the TAA on CoC itself, but it is not the case here.
2nd row, Outs (Eroded)
4.Blurring Near & Far
Now while we’ve a Near color and Far color isolated, those are things that will be “out of focus” of the lens, we need to blur these images. This takes place in two steps of convolution blur.
i.Convolution Blur 1
Here is the first example:
And the second example:
i.Convolution Blur 2
Here is the first example:
(Conv. Blur 1)
(Conv. Blur 1)
(Conv. Blur 1)
(Conv. Blur 1)
960*540 – RGBA16_FLOAT
960*540 – RGBA16_FLOAT
And the second example:
(Conv. Blur 1)
(Conv. Blur 1)
(Conv. Blur 1)
(Conv. Blur 1)
960*540 – RGBA16_FLOAT
960*540 – RGBA16_FLOAT
Now the Color Near & Color Far, can be freely called DOF Near and DOF Far, as they’ve already been blurred successfully and ready to be merged into the final frame.
5.Composite
Just put everything together to get the final DOF-ed frame out!
3rd row the final outputs of DOF
This DOF pipeline the perfect example for the term “Realtime Photoshopping”!
In that Marcus & North example, you may claim not seeing any DOF impact on the near area (foreground), but in fact there is, but is is very slight due to the fact that the distance between the lens and the focused objects is larger than the distance between the lens & the foreground target objects/pixels (character arms at this case).
And for the love of DOF & for the curious ones, here are the major stops for few more examples …
Depth of Field Constant Buffers
struct POST_PROCESS_DOF
{
float4 _vfProjSetup;
float4 _vfDofParam;
float4 _vfLensParam;
float4 _vfFull_DOFSize;
float4 _vCoCScaleFactor;
int2 _viPrepassTexSize;
int2 _viDOFViewportSize;
int2 _viViewportSize;
float2 _vScaleFactor;
float4[169] _avfKernel;
}
struct POST_PROCESS_ERODE_COC
{
int2 _viViewportSizeMinusOne;
float2 _vfErodeCoCAmount;
}
Bloom [Compute][Not Always]
Bloom in Detroit is not the best in the world, it get the job done and it looks fine BUT i assume it is one of the smartest & one of the fastest techniques every using in games. It is not only based on compute, but also based on the robust algorithm of Summed Area Table (SAT for short).
What is SAT?
Summed Area Tables is a way to do a box filtering (filter a rectangular shaped area) of an image within a constant amount of time. The simplicity of the algorithm (at least on the current decade hardware) makes it great candidate for post-processing that we rarely see. The applications for SAT are many & not only rendering related, but the goal here is to use the output table as a value for convolution blurring the image.
The way it works is very simple, given a grid of values (aka pixels in an image here), we can generate a table as follow
Simon Green, NVIDIA
And this is how a SAT image would look like most of the time, some “degrading” values horizontally & vertically.
SAT..It is a matter of “Direction”
Looking to Summed Area Tables and their resources can be little confusing at first if you’re not familiar with the topic. While in that 2003 Simon’s GDC talk you can look to the table and make sense out of it based on the given instruction earlier in the slide that says:
“Each texel is the sum of all texels below and to the left of it”
You can then make sense of the example grid (4*4) & table
But then you can see some other tables else where, and try to apply the same thing, but it doesn’t work out! Such as this (6*6) one from Wikipedia page on that same topic:
And then you think this Wikipedia one is flipped or messed up some how! But you’re wrong, because @ the wiki page they stated that:
“the summed-area table is the sum of all the pixels above and to the left”
The idea remains the same but the direction of traversing is different, but i believe as long as you understand how you build the table and then how to use it, it is not a big deal. But to know which one is the the very correct one, it is better to revisit the original paper from Siggraph 1984
And looking to the calculation diagram side by side… (Original paper vs Wikipedia)
Bloom in Detroit needs to go through 3 SAT steps where by the end of each step the generated SAT is used to blur the Bloom image, this makes it total of 3 convolution blur passes in a cheap & quick way (not to mention it is in compute).
1.Bloom Map base
At this first step, a 1/8 of the target resolution (480*270 at this case) image is generated from the given frame, with the help of luminance, the new image can include only the bright areas. This is called the Bloom Map Base or Bloom Image Base.
2.SAT 1
Generate the 1st Summed Area Table.
i.Horizontal Area
First, do horizontal sum pass to in all the pixels on the given Bloom Image Base.
ii.Vertical Area
Then, do a 2nd vertical pass to in all the pixels on already Horizontally summed values from the previous step.
iii.Apply
Now use the 1st SAT as input to blur the Boom Image Base. (not yet blurred at all).
And to put that 1st SAT generation & utilization to apply the 1st blur pass all together, it is like:
3.SAT 2
Generate the 2nd Summed Area Table.
i.Horizontal Area
First, do horizontal sum pass to in all the pixels on the given output of SAT 1 step.
ii.Vertical Area
Then, do a 2nd vertical pass to in all the pixels on already Horizontally summed values from the previous step.
iii.Apply
Now use the 2nd SAT as input to blur the Boom Image Base. (it was already blurred once before using the 1st SAT).
And to put that 2nd SAT generation & utilization to apply the 2nd blur pass all together, it is like:
4.SAT 3
Generate the 3rd Summed Area Table.
i.Horizontal Area
First, do horizontal sum pass to in all the pixels on the given output of SAT 2 step.
ii.Vertical Area
Then, do a 2nd vertical pass to in all the pixels on already Horizontally summed values from the previous step.
iii.Apply
Now use the 3rd SAT as input to blur the Boom Image Base. (it was already blurred twice before using the 1st and 2nd SATs).
And to put that 3rd SAT generation & utilization to apply the 3rd blur pass all together, it is like:
5.Bloom Composite
Just composite the bloom image to the full size frame!
And to give it a summary view, here are the major steps for blooming the given frame
And just in case you think there is not much difference between the inputs & output of the Bloom step, as the input frame already came with some sort of “bloom” feel that came from the emissive, the early flares (billboards) as well as the clustered point lights, but in fact, there is a big Bloom impact in that frame. But in general, Bloom have always been (at least in my captured frames) very subtle in that game!!
Here are few more bloom examples
Lens Flare [Not Always]
A not always step, only present when there are things & view angles that should generate flares or better to call it “Artistic Lens Flares”, and it is done entirely between the vertex & fragment shaders.
1.Lens Flare Occlusion (against Depth)
This first part is a vertex shader only part, it is a sequence of depth only passes to check depth against a given matrix for each les flares pass that we are going to draw, this is also helpful in sorting the flares in the depth of the frame. So, if the total flares in the current frame ending up being in 10 draw passes, then at this current step there will be 10 depth occlusion check passes too. Unfortunately there is nothing to show in this step except the Depth that will be checked against. Here are depths for the 3 frames that we will be investigating for Lens Flares
Depth are in 3840*2160 of the format D32S8
2.Draw Flares
This second part is the one takes place mostly in the fragment shader, and it is wroth mention that no parameters passed to the shader directly, it only passed to the vertex shader (a transform and a color), and the color is exported from vertex shader to fragment shader.
This part takes place as a sequence of draw indirect commands (vkCmdDrawIndirect) one after another properly sorted, and the amount of draw/flares can vary based on the scene complexity, but the main thing to keep in mind that flares are split into passes, as if a single pass per very bright light source/area at most cases.
Here is an example of very simple scene, with only 1 pass, the most minimal of what i can find.
Note: when i say 1 pass, i mean 1 depth check pass + 1 flares drawing pass.
These are the 3 samplers used to draw the flares in that frame…very simple!
And here it is progressing in action
And here is a more flare busy scene example, with 3 passes drawing flares..
These are the 7 samplers used to draw the flares in that frame.
And here it is progressing in action
And here is a much much more complicated example of a very complex & busy scene, with about 17 passes drawing flares only!
And this is the army of samplers used to draw the flares in that super pretty neon-y night frame.
And here it is progressing in action
i l💘ve lens flares!!!
Android/Deviant Mind View Composite [Not Always]
Only if the current frame presenting a Mind View and there is a map for that generated earlier in the frame.
1.Downscale
The main focus when there is mind view is not actually the game, but the view itself, and hence the game works on resizing the frame to 1/4th of the target resolution, and this new scaled frame will be the background of the Mind View.
Wasted Opprtunity?
Now the interesting take away here, if the frame is known from the beginning that it will be a “Mind View” frame, then why not the frame rendered and targeted 1/4th of the target resolution during all those previous steps, as eventually the entire frame that been carefully carved during all this time, will end up being 1/4th of the target resolution?
2.Convolotion Blur
Taking that new 1/4th resolution frame, and do a horizontal and vertical blur.
i.Horizontal
First, blur the given frame horizontally
ii.Vertical
The blur the horizontally blurred frame once more but vertically
Convolution Kernel
float4[8] _vfWeights;
_vfWeights[0] = { 0.10221, 0.10221, 0.10221, 0.10221 };
_vfWeights[1] = { 0.12064, 0.12064, 0.12064, 0.12064 };
_vfWeights[2] = { 0.13474, 0.13474, 0.13474, 0.13474 };
_vfWeights[3] = { 0.1424, 0.1424, 0.1424, 0.1424 };
_vfWeights[4] = { 0.1424, 0.1424, 0.1424, 0.1424 };
_vfWeights[5] = { 0.13474, 0.13474, 0.13474, 0.13474 };
_vfWeights[6] = { 0.12064, 0.12064, 0.12064, 0.12064 };
_vfWeights[7] = { 0.10221, 0.10221, 0.10221, 0.10221 };
3.Composite [Compute]
The last step is to compose the Mind View map that was generated at early time in the pipeline into the downsized & blurred frame, but do all that in the final target resolution.
Still a Wasted Opprtunity?
As you can see the frame itself is barley visible, and even the visible parts, it already ran through a convolution blur, this is why i was thinking about rendering that entire frame from the beginning at 960*540!
Also you may notice that the out composited Mind View into the frame includes some extra “red things” that was not part of the original Mind View, and you are right, as this dispatch get to draw some extra stuff into the rendertarget during composition, these draws are done with the help of lost of textures that include circles & text-y like textures.
Tone Mapping, Color Grading, Noise & Composite [Compute]
In a single dispatch & with the same shader, all things takes place at once, so it is hard to show distinctive outputs for each step, but only in & out.
The shader always use some LUT (separate texture per channel) as well as a noise sampler.
Color Grading & Composite Constant Buffers
struct POSTPROCESSING_COLOR_GRADING
{
float4 Lift;
float4 Gamma;
float4 Gain;
float4 Shadow;
float4 Midtone;
float4 Highlight;
float4 Offset;
float LowRange;
float HighRange;
float Contrast;
float Pivot;
float Saturation;
float Hue;
int ColorSpace;
float Dummy0;
}
struct POSTPROCESSING_VIEWPORT_COMPOSITING
{
int4 _viScaleNeo4k;
float4 _vfInvViewportSize;
int4 _viViewportOffsetSize;
int4 _viMireSrgb;
float4 _vfGrainSize;
float4 _vfGrainIrregularityU;
float4 _vfGrainIrregularityV;
float4 _vfGrainIntensity;
float4 _vfGrainBlack;
}
UI [Not Always]
Detroit UI seem that wanted to be entirely in vector graphics that is powered by Autodesk’s Scaleform, but it didn’t 100% achieve that. A lot of the UI elements are processed as very fine and dense triangles, except “normal” text & images. For example things like big title text, icons or symbols, such as a controller’s Thumbstick icon, or a “Y” gamepad button icon, would be fully drawn in polygons, where something like a text or screenshot of a level will be just bitmaps rasterized into quads.
Personally i’m not in deep love with vector based UI/Texts, but i don’t hate it either, i find it cool to have sharp & crisp interface that can scale where needed (which is usually y not the case in most of games), but i don’t like the special way content need to be authored, or the overhead that may comes with rendering them, specially if it is handled the same way as Detroit! What Detroit has, is what i would call a Hybrid-Vector-Graphics!
When there are any normal sized text in the UI (which is the case most of the time), there is a runtime atlas generated with the characters on screen, with all sizes, so for example if a number ‘1’ show in screen 5 times in 5 different sizes, then the atlas would include five ‘1’, a ‘1’ for each size (imagine bold, italic, underline, strikethrough…etc.), then text is done as usual bitmap atlas based fonts rasterization, so it is not 100% vector graphics! imagine you need to have much more size variations in the UI for some different reasons (let’s say RGP busy menu full of all sizes of texts, descriptions & tooltips), how many versions of the atlas you would end up with? And this is where is dislike this used approach. Other methods of vector graphics can be done in a different way, but the one used here is not my favorite.
This mentioned previously is not the case with larger texts, such as the Chapters names or choices that pops-up during gameplay for example, they’re processed fully into polygons.
At the other hand, things like lines connecting “Story Graph”, or icons of buttons, or maybe a UI button highlight, or even the entire UI background, all these things are fully 100% in fine-detailed polygons and they keep crisp details.
Look at those atlases for these videos in the left, and let’s see how many duplicates of the same glyphs are present : D a debugging nightmare (at least when the UI system is still getting off the ground). And of course, all these atlases are auto-generated, and they differ each time.
All font atlases are in 1024*1024 of R8_UNORM
But to be fair, the final look is good & it serves the game overall identity, so i believe it was good choice regardless the technique or technology powering it.
Anyways, it seem that Quantic Dream is diverging away from Scaleform in their upcoming games, but still…vector graphics too! But maybe 100% vector graphics this time!! Let’s wait & see..
Object Picking [Compute]
Read picking from the full resolution D32 depth, this is possibly remains from the game’s editor features (maya as editor)
And because it is always cool, here is a snippet from Detroit’s level editor (aka altered Maya), i’ll leave a talk from “Game Camp France 2019” that goes into some details about that (⚠️warning, it is in French, but you can try with auto-gen YouTube subtitles, it may be good)
HDR/Gamma [Not Always]
A Camera or OETF (Opto-Electronic Transfer Function for short) Gamma applied if the game running in HDR (so it seem to not be a common Display Gamma). This step always exist even in SDR, but when in SDR the value used for gamma is just 1.
CAS
AMD’s AMD FidelityFX™ Contrast Adaptive Sharpening (CAS)
Flip & Present
Yet if you still recall at the start of the article mentioned that everything renders upside-down, at this step this get adjusted , and the frame flipped vertically.
There is also a format conversion from R16G16B16A16_FLOAT to B8G8R8A8_UNORM in case of SDR, or to a R10G10B10A2_UNORM in case of playing on HDR, either way, the frame is served on the windowing surface with that format conversion.
And that’s about it!
See all what you’ve been reading to reach this line,…you probably spent in reading a few hours if not went through the article at the course of few days, but in reality this is entirely done for 4k (about 8.25 million pixels) just in 33.33ms and it is going to be restarted again entirely almost from scratch for a new frame just after a tiny bit of nano seconds from now! With all that in mind, it is the right moment to quote from Sir David Attenborough:
Life of a Frame [Rendering Graph]
Since the first time i wrote in this series, i learned about the fancy tool “miro” & been using it quite often either at work or at my personal projects, and for this time’s “Life of a Frame”, i won’t be doing it in a video format as used to be, let’s try something new…something much more graphics programming friendly…Graphs!
And here is an image of that graph, just in case miro service discontinue someday in the future or something! But navigating that graph above is the intended way, don’t check the image below unless you can’t open the graph.
Extra Stuff for Future Investigation
These few things below are topics or stuff that interested me while playing the game, but unfortunately didn’t have enough time or capacity to dig them further than the first iteration and don’t’ have enough time (or power) to write about them in detail step by step. For now i’ll just leave some random key images or media that i stored during my 1st iteration of pipeline digging (this is where i put a rough layout to the game’s pipeline), and these images may give you a lead about what’s going on.
1.Water Mesh Deformation
2.Skin Dissolving
3.Carpets/POM
4.Mirrors
Engine General Observations
Culling & LODs
The game seem to be suffering form good culling, while looking through a lot of frames, i used to see objects processed behind another objects, like something small far away outside a blocked window or cars down the street or behind another large object in the street, or even an entire interior of a house with the tiniest details while being outside & can’t see anything.
I mentioned earlier @ the Video Frame Decoding section that you don’t have to see a surface that displays a video in the frame in order to have this step processed. You may’ve thought that this could be due to some loading & streaming engine decisions (load any videos will be needed in that level ahead of the level), but i don’t think so, it is deeper than that, and it is possibly a big culling issue (occlusion culling for this video surface example to be specific). An example, in that shot below outside Todd’s house, you can see the video frame of the hockey match get processed at the start of each frame while being outside the house as long as the house is in the view frustum!
Culling issue seems to be not only in occluded surfaces, as it seem that the game suffers from occlusion culling as well as distance culling (in addition to the lack of some level of detail). I’ll leave you with this video of draws for the “Outside Todd’s House” frame.
Not only the entire interior of the house draws, but also the backyard, the neighbor’s house backyard,
Alice (the little girl) with all her details draws in side the house in a ‘A’ pose
& not to mention the teeny tiny details of the crane on the bridge!
Another example with some stuff in the street, outside the building, behind walls & behind doors. Things that you have no chance of seeing during this frame. And yes you can see them at some point, but when view changes!
You need to go near the window or change camera to look behind the door
until then, lots of these things can be safely culled!
These things that needlessly draw, they draws all the time! What you see above is the forward clustered pass, but same things draws in shadow pass, z-prepass, and any other occasion.
FXAA
At some point there seemed to be FXAA support in the game, as there were some entries for it in the debug menu as well as some fragment shaders & some traces in code.
Neo
This PC port seem didn’t invest enough time to strip everything PlayStation related, lots of the time structs would include variables that are meant to work exclusively for the PlayStation 4 Pro codenamed Neo. Stripping these things either manually in that PC port (if in separate branch) or through some preprocessors would’ve been great and would’ve made many structs smaller in size, just if data “weight” matters!
Naughty Dog’s ICE collab
There seem to be some references in code for code files under an large ICE folder (if you don’t know, ICE stands for Initiative for a Common Engine). No wonder, afaik many of Sony exclusives are done under Maya as editor mindset & they get some support form ICE in a way or another. There were some headers for AI, Scripting Interface, Animation Lib, and Core Primitive Types as well as Math headers (that’s a lot of core stuff!). Buy it like that, as i won’t (and don’t like to) put any screenshots for this content for the very obvious reasons. (but i’m sure with very minimal effort, you can access these things at your own if you’re curious about them).
Shader Compilation & Cache Issues
As you may’ve observed by now, that this game been installed on my PC and i’ve been going through it for more than 2 years, during that time, every few weeks (or few months) when opening the game i get prompted with the “Shader Compilation” slow screen, i believe this is due to me updating my nvidia drivers quite often & never miss an update, hence after the game detecting the difference, it has to recompile again under the most recent drivers.
There is no problem in that, and we (as gamers) kinda got used to that by now. But there were one issue that came once after one of these compilations, and i didn’t miss the opportunity to record it.
It is well played Chloe!!! i found it very very funny that Chloe said that exact voice line during the first boot with the broken shader!!
The game remained like that for me during this entire play session, after closing the game it did not go away, but was not consistent, sometimes when i open the game i get it and other times i don’t. But after the next shader compilation screen (aka the next nvidia drivers), i think i did not see that issue again!
If you are interested in watching more of this bug, i’ve recorded a lot, here is a part of 20 min playthrough under this condition.
Pretty much everything Character but clothing. Skin, Eye & Hair shaders, not only skin, if you look closer!
Finally we Know what Engine Quantic Dream is using!!!!!
Quantic Dream got the most innovative name for their engine, it is called “Quantic Dream Engine”
Epilogue
There is no doubt that Detroit looks beautiful & it’s visuals will be always remembered as one of the most outstanding qualities on gaming consoles, but i believe that the game could be able to reach 60fps solid (at least on PC) with some optimizations & minimal sacrifices any maybe (and i don’t believe i’m recommending that now) some upscaling crap.
After all, ~100hrs were not bad for a 15hr game! (keep in mind i played already on PS4 Pro)
And as you used to in the course of this series, we just set a new record for the data size on disk for this article!
-m
Related Readings & Videos
Vulkan Tutorial – Compute Shader
Learn OpenGL – Compute Shaders Introduction
Wikipedia – Lens flare
minutephysics – But What IS A Lens Flare?
Simon’s utak – LENS FLARES. The bad and the beautiful. Why do lenses flare and what lenses are ‘best’ at flaring?
Cambridge in Colour – UNDERSTANDING CAMERA LENS FLARE
Cluster Forward Rendering and Anti-Aliasing in ‘Detroit: Become Human’
The Lighting Technology of ‘Detroit: Become Human’
ACM – Real-time many-light management and shadows with clustered shading
Emil Persson – Practical Clustered Shading
Art & Code, la synergie de Detroit : Become Human | A.Coltel & A.Montouchet | Game Camp France 2019
HDR Rendering – Gaussian Filter using Compute Shader
Wikipedia – Summed-area table
Siggraph’84 – Summed-Area Tables for Texture Mapping-1984
GDC2003 – Summed Area Tables using Graphics Hardware
GPU-Efficient Recursive Filtering and Summed-Area Tables
Wikipedia – Box blur
Wikipedia – Gaussian blur
Sascha Willems – Flipping the Vulkan viewport
FidelityFX™ Contrast Adaptive Sharpening (CAS)
pixelmager – bluenoise.md
Wikipedia – Colors of noise
Screen Space Reflections in The Surge
Light Propagation Volumes in CryEngine 3 [Anton Kaplanyan]
DETROIT: A VULKAN IN THE ENGINE
AMD – Porting Detroit: Become Human from PlayStation® 4 to PC – Part 1
AMD – Porting Detroit: Become Human from PlayStation® 4 to PC – Part 2
AMD – Porting Detroit: Become Human from PlayStation® 4 to PC – Part 3
ACM – Image-space horizon-based ambient occlusion
Frostbite – Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Frostbite – Physically-based & Unified Volumetric Rendering
Frostbite – Stochastic Screen-Space Reflections