Behind the Pretty Frames: Detroit Become Human
- Introduction
- Configs
- Behind the Frame
- Vulkan
- Compute
- Blue Noise
- Detroit Vertex
- Frame
- Prepare Resources
- Video Frame Decoding [Compute][Not Always]
- Copy Skinning positions [Compute][Not Always]
- SH(Spherical Harmonics)/Indirect Lighting Blending [Compute][Not Always]
- Light Clustering [Compute]
- Procedural Textures
- Clear Previous ZPrepass/Depth
- Clear Previous Motion Vectors
- Z-Prepass/Depth Part 1 (Opaque/Static Geometry)
- Motion Vectors Part 1 (Opaque/Skinned Geometry)
- Z-Prepass/Depth Part 2 (Opaque/Skinned Geometry)
- Hair Accumulation
- Z-Prepass/Depth Part 3 (Alpha-Tested Geometry)
- Depth Texture
- Depth Texture 1/2 + Low Res Depth
- HBAO [Compute][Not Always]
- Shadow passes
- Eye Shadow (not the one girls put)
- Android/Deviant Mind View [Not Always]
- Footstep Layer [Not Always]
- Cluster Forward Rendering
- 1.All Geo
- 2.Skin, Eye & Teeth
- 3.Emissive
- 4.Sky/Background
- 5.Specular Lighting (IBL)
- 6.SSSSS (Screen Space Subsurface Scattering)
- 7.Volumetric Scatter(Volumetric Light Volume) [Compute][Not Always]
- i. Depth Tile
- ii. Participating Media Properties/Material Voxelization (Scattering/Extinction)
- iii. Clear some RW buffer
- iv. Light Scattering or Froxel Light Scattering (Froxel Light Integration & Temporal Volumetric Integration)
- v. Write to some RW buffer
- vi. Scatter Radiance
- vii. Final Integration (Final Participating Media Volume. Integrate froxel scatter/extinction along view ray)
- viii. Particles Voxelization [Not Always]
- Transparency
- SSR [Compute]
- Refractivity Mask [Not Always]
- SSR Composite
- Composite Refractivity Mask [Not Always]
- Post Processing
- UI [Not Always]
- Object Picking [Compute]
- HDR/Gamma [Not Always]
- CAS
- Flip & Present
- Life of a Frame [Rendering Graph]
- Extra Stuff for Future Investigation
- Engine General Observations
- Epilogue
- Related Readings & Videos
Note: Today the 24th of November 2024 this was the last line to be added to this article. The draft for this article & investigation initially started on the 29th of January 2022, that makes it the longest investigation in the series!
Introduction
When the “Kara” PS3 demo video show up years ago, i was blown away by that showcase (well, everybody was), from the premise of the idea, to the visual quality ending up with the voice acting & the outstanding performance by Valorie Curry!! And i wanted to play that game whenever it comes out, regardless how it plays! At that time i did not know what is “Quantic Dream” or if they even made games before, it was one of these companies that is outside my radar, simply because they did not make any of the fancy action adventure titles that i used to play back then.
Wait for it…”Running in real time on a PlayStation®3″
Fast forward to the release campaign and the actual release of the game, i was kinda disappointed knowing that it is just a game where you press buttons to do QTEs, and that’s it, no deep gameplay, no fancy mechanics, no precise platforming, no coyote time no headshots & no car hijacking! So, i decided to not invest neither the time nor the money and not to watch any of it’s coverage, simply NOT MY TYPE! Because it is a boring game, where you just press the buttons that show up on your screen…UNTIL…
One day my wife picked up the game based on her colleagues recommendations, that was not really far away from release, i would say in 2 months window or so, and i fortunately (or unfortunately) was free during that evening & decided to sit on the couch while she was playing Detroit for the 1st time, and here was the gamer genre-shock! The game hooked me (from the Hostage chapter) specially that it was being played in Arabic (my mother tongue, and we almost never get Audio localization in Arabic), and i did not hesitate to start playing it in my profile the next evening! Sadly i then had to sell that PS4 while relocating to avoid region crap from Sony, and since then, i didn’t get a chance to replay that game again till it dropped on Steam!
i would say that i was unfortunate to try this game, because it opened the pandora box for me, of who made that game, and hence i discovered what is “Quantic Dream” and their impressive set of previous games (Heavy Rain & Beyond, that are thankfully on Steam too) and i started playing them. Not only that, but i started to reconsider similar games, such as all Supermassive Games (i played Until Dawn at launch, didn’t like it, but didn’t dislike it either! and maybe this was the game that gave me that impression for the QTE genre), and in my seeking to familiarize myself with that category of games, i was always aware of the praise to TT games that i always avoided playing because they looked liked badly shaded “comics” style game and not up to the contest of current gen fancy-shiny graphics (graphics is not everything boys, even while we talking here about graphics), i looked up for any TT games that was available by that time to play them too. Anyways, it eventually turned out that games falling in that genre is NOT just a “boring game, where you just press the buttons that show up on your screen”! These are games deeper than that, and they compensate the weak or lack of gameplay mechanics & buttons mastery with good story which could be covered with some nice acting, graphics, audio, performance capture and a lot more; but it remains, the writing is the solid core base for such games, and i find that makes it enough to invest the time in playing them. After beating Detroit for the 1st time, it felt for me like an IMDB 9.5/10 movie that has some interactivity elements where i sat on the director chair & i was able to change the course of the movie to get an ending that satisfies me! And that last unique interactivity element worth a 0.5 in movie rating, and that makes Detroit an IMDB 10/10.
This article been in the make for long time, if you’re following the series you might recall that in the Death Stranding article i mentioned that i’ve been working on a Vulkan based game pretty frames article (now you know it is Detroit) and that was side by side with the Resident Evil article (not to mention that Diablo IV article was made from scratch while Detroit’s still an in-progress draft).
From Death Stranding’s Pretty Frames
Anyways, it is a fact now that yet 3 entries in the series went live, while this Detroit one was a draft for long time! But why was that?!
- Frame Captures for Detroit was taking very long time to capture compared to other games (and even compared to projects i work on professionally in the AAA space). A Detroit capture could take more than several minutes up to 10 minutes freezing the computer!!
- The captures could be corrupt most of the time due to that previous reason, many times opening a capture won’t succeed and faced with a popup telling me that “File is corrupted”!
- If a capture is corrupt and i need to retake it, or if i needed an exact or specific capture, then i had to play the chapter again, usually from the beginning or nearest checkpoint (which usually is not that near), as it is not like other games where you can save at many places & keep copies of save files to swap, and it is not an open world where you can chill around until find a nice capture or conditions meeting the intention behind the capture m (bloom, lens flares, particles, wetness,…etc). and not only re-play chapters from scratch, but also at some cases i’ve to do exact QTE choices to get to the camera view or the scene that i wanted to capture from.
- Captures are huge in terms of disk space, but this could be due to the fact i played & captured most of the time on 4K. Average capture was between 4GB and 5GB (Compressed), and some captures unexpectedly passed 6GB (small scene/map and limited space).
- Loading a single capture was enough to inflate a huge amount of data into memory, resulting my PC to be at ~90% memory. Yes Chrome eating some, OS (Windows 11) eating some, but the big portion goes to the capture file (uncompressed) which was usually reaching 15G in memory! And to do such investigations, with other games i used to load multiple captures at the same time (4 or 5) to compare details, worst case i ever had is when i wasn’t able to load more than 2 captures only, but for Detroit, during the entire investigation, i had to load a single capture at a time.
- Of course if the memory was not the only reason to work on one capture at a time, it is the gpu memory and of course glorious Vulkan replay would be complaining with something like VK_ERROR_OUT_OF_DEVICE_MEMORY!
- Last but not least, i wanted this one to be 100% complete breakdown of the game with macro details for each step even if that step is not always there, and without leaving anything as a TODO that never get done!
Me & Detroit Fun Fact
Kara’s & Connor’s music themes in this game, became two of my all time favorite game OSTs, and they’ve been for long time in my playlist that i listen to during work (or while writing this article), so Kudos to Philip Sheppard and Nima Fakhrara for putting these extra joy to my ears : )
Me & Detroit Much More Fun Fact
After more than half way into this investigation (about the post processing section), i found out about couple of GDC talks of Detroit (links by the end of the article), and by the end of the investigation & after most of the article were written & media made & uploaded, i found out about another French talk which in its turn led me to a AMD series of articles on the GPUOpen website discussing the porting process of Detroit to PC/Vulkan. These resources are very informative resources (only GDC’s & the French talk so far that i watched), but they pop-up very late to me, and did not impact or affect what was already written (or what remains to be written), i like to put things from my perspective as a result of me blindly digging deeper & deeper inside the game binaries & the many captures i take. In fact, i’m glad that i did not find out these resources earlier, or before choosing this game or proceeding into this journey of digging it, i would’ve not considered this game if i found these talks earlier!
So if you see any misalignment between what i put here & these resources, you know why, these resources were never been part of the knowledge build-up process i had for this game. Also some of these resources (GDC talks & French talk) i think they’re more about the PS4 version of the game due to their release time, and of course there may be lots of technical differences either due to the API, hardware architecture or due to some tech decisions that came after the official release.
Configs
Still using my common gaming PC, which is made of AMD Ryzen 9 5950X 16-Core 3.40GHZ, 32G RAM, RTX3080 and playing on 4K (no HDR for all captures except the ones in the HDR section, as the game looked nice with none HDR).
Steam’s Build ID: 12158144
SKU Application Version: 01.00
Regarding the game graphics/video settings, i left it to the default of whatever steam build initially launches with
Despite the fact that you see the configs screen in English, but my captures were all including Arabic text for the game UI,
as i preferred to play & re-play the game in Arabic (SO MUCH FUN!!!).
i made sure to swap the language for the configs screenshot only.
More about Configs
The game also offers a GraphicsOptions.JSON
{
"GRAPHIC_OPTIONS": {
"VIDEO_OPTIONS": {
"FULLSCREEN_RESOLUTION_WIDTH": 3840,
"FULLSCREEN_RESOLUTION_HEIGHT": 2160,
"RESOLUTION_SCALING": 1.0,
"FRAME_RATE_LIMIT": 0,
"VSYNC": true,
"BRIGHTNESS": 0.0,
"HDR": true,
"CASSHARPEN": true
},
"ADVANCED_OPTIONS": {
"TEXTURE_QUALITY": 3,
"TEXTURE_FILTERING": 3,
"SHADOW_QUALITY": 3,
"MODEL_QUALITY": 3,
"MOTION_BLUR": 3,
"VOLUMETRIC_LIGHTING": 3,
"SCREEN_SPACE_REFLECTION": 3,
"SUB_SURFACE_SCATTERING": 1,
"DEPTH_OF_FIELD": 1,
"AMBIENT_OCCLUSION": 1,
"BLOOM": 1
},
"GPU_INFO": {
"Name": "NVIDIA GeForce RTX 3080",
"Driver": "556.12",
"Benchmark Score": 2900.206787109375,
"Benchmark Timing": 0.6206454038619995
}
}
}
As well as a WindowInfo.JSON
{
"WINDOW_OPTIONS": {
"MONITOR": 0,
"DISPLAY_MODE": 0,
"WINDOW_POSITION_X": 0,
"WINDOW_POSITION_Y": 0,
"WINDOW_RESOLUTION_WIDTH": 3840,
"WINDOW_RESOLUTION_HEIGHT": 2160
}
}
Also the game runs a shader pre-compilation screen at start for the first boot, it took quite sometime at the first time, but after re-installing the game it took about 4 minutes, and in total it cached 99453 pipelines.
Behind the Frame
Buckle up, and let’s visit Detroit in the year of 2038 inside my RTX 3080!
General Note
For the very tiny image resources, If you click on the images you open the original files in new tabs, which could be as small as 1*1 pixels. But, if you right click on images and “Open image in new tab”, you will open the upscaled detailed version.
Vulkan
i’m a developer who stuck (in a good way) with Khronos & Silicon Graphics(SGI) for a long time, there is no wonder in that! Since the dawn of graphics APIs and i was at the OpenGL (Open Graphics Library for short) side of the table. If you think our market in a fight today, you didn’t see a real fight then, the fight was super super hot regardless it was C against C++ as a game developer choice, or 3dfx’s Voodoo against Nvidia’s Rive or GeForce as graphics cards with better capabilities. Sega’s Dreamcast (NEC PowerVRSG) against Sony’s PlayStation(Graphics Synthesizer) as powerful 3d accelerated gaming consoles, and most notably, the graphics APIs like Direct3D against OpenGL and Glide (not to mention proprietary APIs which is always out of the public competition). As a hobbyist learning at my own back then, OpenGL, and only OpenGL made sense in the middle of all of that for me, it was hard to find resources, no way to travel to attend events, it was very very hard to get any support in that passion driven adventure, there were no home internet, no local courses or schools teaching these topic, importing tech books about these topics is a wish that never gonna happen & waiting for local translations for these books means getting a book with 5-10 years old info & technology, but the friendly & make-sense nature of OpenGL made it at least more friendly to learn at my own with lots of trial & error (of course a huge shout-out to gaming/entertainment magazines availability back then, these used to have some tech topics & sometimes tutorials on programming topics and it happened that OpenGL tutorials was much more frequent & much more detailed than Direct3D abstract tutorials,…at least that was my luck). So OpenGL at some point was basically the one thing that summarized my creative & fun hours spent on a computer!
Some full articles (not the ones in the gallery above) – Totally recommended to sneak a peak if you’ve the time!
Things have changed a lot year after another, the gap shrunk a lot, things died (i mean 3Dfx, Glide, Dreamcast), the mindset and idea of “graphics cards” evolved to be much more modular, new players became essential in the market (AMD, Vulkan, Xbox), and it became that an industry standard for a graphics person (hobbyist or professional) or a game engine to know & support all existing graphical APIs, you don’t even have to know multiple shading languages anymore; with things like SPIR-V or Microsoft’s Shader Conductor and other proprietary translators and/or shader compilers you write once on a language of your choice & these tools do the job (a clean job) for you 95% of the time, but i’ve always been preferring the Khronos’ deal, at least in my side/hobby projects (AAA companies always had another opinion, for the reasons that makes sense for them : $). Till the release of Vulkan!! In the first few months, i decided to fully jump into that early Vulkan boat, because i kinda knew that it is not an experimental thing, and eventually it will be the future of Khronos, where OpenGL will fade away at some point, and this shift is my next step with OpenGL, it just got a different name…a totally different one! i was lucky that during that time of a new API birth to be involved in multiple AAA projects that supported a Vulkan backend, from multiple recent Assassin’s Creed games (Origins, Odyssey, Valhalla, Fenyx Rising, and of course early Assassin’s Creed games (i, ii, iii,…) ports for the Nintendo Switch.. All that when Stadia was a thing i believe, each AC game or Anvil Engine based game i worked on had a Vulkan backend even if it is not really needed, who knows where the suit will decide to ship), to one of the most perf friendly games i worked on (didn’t succeed commercially unfortunately coz it was very very late to the battle royale trend) Hyper Scape, ending up with something i stuck with for a while like Skull & Bones (i’m not sure about the final shipping state of S&B as i left the project & the company several years ago, but around ~2017-2020 we had always a maintained Vulkan version of the game that compiles & runs on Windows side by side to the DX12 version).
But as you can see, i touched multiple AAA titles, they were all supporting Vulkan backend, but none seem to ship with Vulkan as main rendering backend! no surprise in a market that was seem to be dominated by (& totally saturated toward) Direct3D to choose DX12 as a successor for all new titles!
Year after another since the new generation of rendering APIs that requires more & more code, and yet all AAA fancy titles ships on D3D12, no wonder, it has been most of the time the case for D3D, at least till very recent years and till the day i got Detroit on steam, and launched it, and my lovely RivaTuner at the corner was saying it all out loud “Vulkan”!!
Enough ranting about my view of OpenGL/Vulkan vs DX, i would never stop! Let’s see some Detroit Vulkan related details.
Vulkan Extensions
For the sake of curiosity, when looking at any Vulkan based code/app, i always like to check what extensions are used to deliver, these things are easy to miss, and you can lose tracking of how many they’re or what they do, and nothing lets you know them better than seeing what been used in a good game or renderer that you liked. Here are some Vulkan extensions that i was able to find traces for while digging through this game. You may find some of them “inspirational” for your work as they help in reducing latency or improving performance in a way or another!
Vulkan Version
Seem to be Vulkan 1.1 (with GLSL 4.5), as there were some code path at the initialization function bleating about this exact version if not found.
Bindless
And the last note about Vulkan general notes, that the game seem to be using bindless texture resources (bad name i know), and you may’ve noticed this from the mention of VK_EXT_descriptor_indexing above. No much to say here, except that i like the technique & was glad to see it utilized here in a Vulkan game, in the past Pretty Frames (at least the ones that made it to written articles) i don’t think any of the games had bindless resources!
So without further ado, and before jumping into the nitty-gritty of the game’s frames breakdown, i would like to re-phrase the name of this article (sadly can’t do this at the main header for consistency with the previous articles):
“Behind the Pretty Frames: Vulkan Become Human“
Compute
There is no wonder that such a visually appealing game running in budget must be utilizing compute in a way or another. The game (if you don’t know yet) is mainly using a Clustered forward rendering approach, which means compute must be utilized for light culling & view frustum partitioning into cells. But it goes a lot more beyond that. i believe for such a game, if it was possible, they would’ve done the entire rendering pipeline in compute, as you will see later that majority of the pipeline is actually compute based and that includes shading.
To give you an idea about what compute usage in a typical Detroit frame (Cinematic or Gameplay, not difference as the game seem & look exactly the same, after all Detroit is a game-long cut-scene with some inputs), i’ll leave below just the summary for the compute utilization (in dispatch order). But the details of those usages i prefer to put at their correct order in the pipeline at the Draw section of the article.
Compute Dispatches Queue (in execution order)
- Video Frame Decoding (For specific surfaces such as TVs, Billboard displays, Screens, Tablets, ads,…etc.)
- Copy Skinning positions (Skinning & Blendshapes)
- SH(Spherical Harmonics)/Indirect Lighting Blending
- Light Clustering
- Rain Flow
- Procedural Caustics
- Depth Down Sample
- HBAO
- Clustered Forward Shading
- Footsteps
- Volumetric Scattering
- Averaging Luminance
- SSR
- Exposure
- TAA
- Motion Blur
- DOF
- Bloom
- Android/Deviant Mind View
- Color Grading
- Noise
- Final Frame Compositing
- Object Picking
And before leaving that section, i want to (for the first time in the series) put some more details about the workgroup sizes, number of local workgroup as well as the final predicted invocations count. These numbers are important and over the years i’ve seen some people do some art behind picking the numbers they choose (sometimes it is part of optimization phase to tweak these numbers as some numbers combinations would make good utilization where other numbers may not be perfect). Here are some of the most uncommon numbers for vkCmdDispatch & the workgroup size. i’ll leave you to judge the numbers!
The Workgroup Size
These ones below are not everything (indeed), but the most interesting ones.
Local Workgroup (from dispatch cmd) | Workgroup Size (within the shader) | Invocations | Usage |
---|---|---|---|
(32, 16, 1) | (32, 32, 1) | 524,288 | Video Frame Decoding |
(64, 32, 1) | (8, 8, 1) | 131,072 | Video Frame Decoding |
(8, 4, 1) | (8, 8, 1) | 2,048 | Video Frame Decoding |
(305, 1, 1) | (64, 1, 1) | 19,520 | Skinning & Blendshapes copying |
(47, 1, 1) | (64, 1, 1) | 3,008 | Skinning & Blendshapes copying |
(55, 1, 1) | (64, 1, 1) | 3,520 | Skinning & Blendshapes copying |
(644, 1, 1) | (64, 1, 1) | 41,216 | Skinning & Blendshapes copying |
(725, 1, 1) | (64, 1, 1) | 46,400 | Skinning & Blendshapes copying |
(3773, 1, 1) | (64, 1, 1) | 241,472 | Skinning & Blendshapes copying |
(15, 15, 3) | (8, 8, 1) | 43,200 | LPV |
(360, 1, 1) | (32, 1, 1) | 11,520 | LPV |
(1440, 1, 1) | (32, 1, 1) | 46,080 | LPV |
(64, 1, 1) | (32, 1, 1) | 4,048 | Light Clustering |
(1, 1, 1) | (64, 1, 1) | 64 | Footstep Extracting |
(30, 17, 64) | (8, 8, 1) | 2,088,960 | Volumetric Scattering |
(121, 68, 1) | (8, 8, 1) | 526,592 | SSR Prefilter main pass |
(240, 135, 1) | (8, 8, 1) | 2,073,600 | SSR’s TAA |
(1920, 1, 1) | (1, 544, 1) | 1,044,480 | Motion Blur |
(1, 1080, 1) | (964, 1, 1) | 1,041,120 | Motion Blur |
(1, 270, 1) | (960, 1, 1) | 259,200 | SAT |
(1, 480, 1) | (576, 1, 1) | 276,480 | SAT |
If you may asking why “Skinning & Blendshapes copying” differs, it is because each of them for a different skinned mesh (just in case you’re asking), and these ones in the table are not all of the variations, just few of them to showcase the idea.
Possibly we need to rephrase the article name once more…
“Behind the Pretty Frames: Compute Become Human“
Blue Noise
You may notice below during the frame that Blue Noise been relied on heavily in this game, and there is no wonder in that for such a beautiful and smoothed frames. The game utilizes a form of Bart Wronski’s BlueNoiseGenerator to denoise & smoothen things such these ones below.
Things that benefit from Blue Noise
- Shadows
- SSSSS (Screen Space Subsurface Scattering for short)
- HBAO/SSAO
- SSR
- Volumetric Scattering (for Volumetric Lighting)
..and may be more!
Take it like that, any thing has temporal element (a previous frame resources dependency) or any sort of noise in the output, it will be use using Blue Noise in a way or another.
And for one last time i think, because it is gonna get out of control!!! There is a new proposed name for this article…
“Behind the Pretty Frames: Blue Noise Become Human“
i promise, it was the last time to do that silly rename thing!
Detroit Vertex
Of course, like every other game & engine, vertex descriptions get unique & fancy. Below is probably not everything, but these are the most common ones i’ve been seeing during my investigation at 95% of the time, so i think Detroit get away with a handful number of unique vertex shaders.
Detroit Become Human’s Vertex Description – Super Simple Opaque Meshes (Like eye shadowing, no-wind grass, opaque hair,…etc.)
in_uv0 R16G16_FLOAT 0
Detroit Become Human’s Vertex Description – Simple Mesh (Like terrain, floors, walls, wood boards,…etc.)
in_color R8G8B8A8_UNORM 0
in_uv0 R16G16_FLOAT 4
Detroit Become Human’s Vertex Description – Foliage 1 affected by wind (small like grass)
in_lodPosition R16G16B16A16_FLOAT 0
in_windBranchData R16G16B16A16_FLOAT 8
in_windExtraDataFlag R16G16B16A16_FLOAT 16
in_LeafAnchorHint R16G16B16A16_FLOAT 24
in_uv0 R16G16_FLOAT 32
Detroit Become Human’s Vertex Description – Foliage 2 affected by wind (medium like long grass, bushes & small tree)
in_color R8G8B8A8_UNORM 0
in_lodPosition R16G16B16A16_FLOAT 4
in_windBranchData R16G16B16A16_FLOAT 12
in_windExtraDataFlag R16G16B16A16_FLOAT 20
in_LeafAnchorHint R16G16B16A16_FLOAT 28
in_uv0 R16G16_FLOAT 36
Detroit Become Human’s Vertex Description – Skinned Meshes (like characters’ head, arms,…etc. flesh)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 12
in_vfSelfOcclusionColor0 R8G8B8A8_UNORM 0
in_vfSelfOcclusionColor1 R16G16_FLOAT 4
in_uv0 R16G16_FLOAT 8
Detroit Become Human’s Vertex Description – Skinned Meshes (like characters’ clothing)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 12
in_color R8G8B8A8_UNORM 0
in_vfSelfOcclusionColor0 R8G8B8A8_UNORM 4
in_vfSelfOcclusionColor1 R16G16_FLOAT 8
in_uv0 R16G16B16A16_FLOAT 12
Detroit Become Human’s Vertex Description – Hair 1 (Eyebrow)
in_vfMsBindPosePosition R32G32B32_FLOAT 0
in_vfMsBindPoseNormal R10G10B10A2_SNORM 20
in_uv0 R16G16_FLOAT 0
Detroit Become Human’s Vertex Description – Hair 2 (Head’s hair, Beard,..etc.)
in_color R8G8B8A8_UNORM 0
in_uv0 R16G16B16A16_FLOAT 4
in_vfMsBindPosePosition R16G16B16A16_FLOAT 12
in_vfMsBindPoseNormal R16G16B16A16_FLOAT 20
You may or may not noticed that the vertices descriptions are missing common things such as the Positions and Normals of the vertices, this is not a typo or a mistake, as these info passed to each draw step as buffers part of the bindless data. Interesting…!!
Keep in mind that particles have their own unique buffer other than any other mesh types.
General Vertex Positions & Normals Buffers
struct unaligned_float3
{
float x;
float y;
float z;
}
struct g_rbVertices_Layout
{
unaligned_float3 g_rbVertices[];
}
struct g_rbNormals_Layout
{
int g_rbNormals[];
}
Particles Only Buffer
struct S_PARTICLE
{
float _fX;
float _fY;
float _fZ;
uint _uiSizeLife;
uint _uiSeedFree;
}
struct dyn_g_rbParticles_Layout
{
S_PARTICLE g_rbParticles[];
}
Frame
Heads up, despite the fact that you will be seeing render targets & backbuffers looks “normal”, but they were all in fact upside down due to the choice of the developer, and the final image get flipped before presenting to swapchain. This “upside-down” thing could be due to multiple reasons! The Vulkan coordinate system is top left towards bottom right, and (iirc) the PlayStation 4 & 5 APIs (let’s just call them the PlayStation APIs here🤐) are similar to Vulkan in that regard, so this is why i suggested it is the developer choice to be like that, they could’ve made it none-flipped if they wanted. It could be an indication that the game port started as OpenGL (it is bottom left towards top right) and this is remains of that? Or maybe it is something related to the formats reading? Or flip copies? Or could be as simple as wrong Y flipping in the vertex shader? Negative vs Positive viewport Hight value for the offscreen rendering?…etc. multiple common known areas can cause such a behavior (in glsl shader code or in viewport setup), but eventually it is not a big deal (but indeed little annoying during debugging).
So, keep in mind, that what i was seeing during the entire investigation was actually a vertically flipped images, and i flipped them back to look normal before uploading to the article for the sake of sanity!
Prepare Resources
Nothing really fancy during this long Copy/Clear pass at the start of a typical frame, just a handful amount of vkCmdCopyBufferToImage, vkCmdCopyImage and vkUpdateDescriptorSets for different buggers (like skinning ones) or image resources to copy image data (and/or mips) between resources (like previous frame’s Rain, Caustics, TAA edges, Motion Vectors,…etc.).
Video Frame Decoding [Compute][Not Always]
This compute dispatches only if there are any video displays around the frame, and boy Detroit is full of these, from holograms in streets, to TVs in bars, houses, offices & even public transportation busses. This is probably something executed by RAD’s Bink Video (not a custom video solution) as i was able to find some traces of Bink functions within the game executable.
The purpose of this dispatch is to get the video frame playback data into a texture resource so it can be used later to texture a TV surface or such, and because the solution used affords it, this texture gets some mips during this compute dispatch (11 mips so far for all cases i’ve investigated).
This can take place for a single or multiple videos, depends on the situation (like in a street for example, can process multiple videos for the billboards compared to an indoor level/frame).
keep reading to learn why i put gifs for the texture pyramids
You may (or may not) have noticed that there is a pop at the beginning of the mips gif, and that the first mip (mip 0) is slightly different frame (an earlier frame) than the rest of the mips in the chain (mip 1 to mip 10), and you are right if you noticed that, it is not a mistake in the gifs, it seems that possibly a bug in the video/bink code as it gets the frame X from the video to be the target frame to write to texture’s mip 0, but then get the frame X-1 for all the down mips. The independence of compute and the lack of barriers could sometimes result in similar things i believe, or it could be simpler than that, and it may be just a -1 or -=1 somewhere in mips data copy & write to images! i always try to keep myself distant from bink & wwise code in productions so i can’t confirm if this is a bink expected & desired behavior (for some anti-logical reason) or just another annoying bink bug!
Ironically, you don’t have to see a surface that displays a video in the frame in order to have this step processed, but we will come to this later @ the Engine General Observations section by the end of the article.
Copy Skinning positions [Compute][Not Always]
At most cases this compute executes for skinned characters/objects in order to keep the skin positions in memory from the previous frame so the game don’t re-do skinning from scratch. Nothing very special to show here, just a lot of floats…a lot of them!
The only observation to keep in mind that, not because you don’t see any characters or possibly skinned meshes (like the few example below), this step will be absent from the rendering graph (will discuss further in the Engine General Observations section), even without characters seen, skinning copy still taking place at many many cases.
And if it is, we shouldn’t copy skinning, or should we?
SH(Spherical Harmonics)/Indirect Lighting Blending [Compute][Not Always]
The game utilizes the Light Propagation Volumes (LPV) algorithm (similar to Crytek’s as a foundation) as a technique for achieving a good indirect light bounce. In order to achieve that, it stores the lighting information from the light sources in a 3D grid of Spherical Harmonics (SH) in order to get the best efficiency at runtime.
While this step is flagged as “Not Always”, this doesn’t mean indirect lighting is absent from parts or areas of the game, but i believe that this step is only taking place when the actual SH need to be re-written into new volume texture atlas in order to “blend” between two or more different probes grid volumes (baked in volume texture atlas), where in other areas of the game, seem to utilize right away the volume textures that is already baked beforehand (offline) without modifying or blending them through this compute step (they’re usually part of the bindless textures list).
At this step the engine will be blending between the different volumes baked lights information (color component coefficients) into the new volume texture atlas (type of StorageFloat3D Image of float4s so it stores 4 coefficients per probe) with different sizes that depends on the volume size which depends on the space size that the volume covers (you’ll see below), but the general rule is that is has 3 slices to represent one slice per lights color channel. These 3d textures are of 16bit precision via the format R16G16B16A16_FLOAT (well, it is the most used format in this game as you’ll see during the course of the article!).
The best example to showcase that, as it seem to be something that happens for a very brief moment, is when Markus is drawing the curtains in order to wake up Carl. While you are in the room & before you draw the curtains, the engine will be using the dark version of the Indirect Lighting baked volume textures (this compute is not running), and once the curtains starts being drawn, the compute kicks the blend, and as soon as the blend is full, and the room is lit, this compute is not running anymore, and the renderer will be using the baked versions of the bright version of the Indirect Lighting baked volume textures.
Of course these 3d texture are not really at their real size, as 54*54 would be very very tiny to show the details.
Keep in mind while each of these gifs represents a “single” channel, but each of them is RGBA & stores data in all 4 channels, this is why i did not put them in grayscale 1 channel.
And in action, from 100% dark, to blend two volume textures sets (via this compute step) ending up with 100% bright.
This type of blend doesn’t have to be an instant thing like that previous curtain example, here is another example, where if the player (Kara here) moved to an area that is between two different volumes, then you can have that blend compute happens pretty much every frame in order to blend between both volumes as long as she is between these two volumes & the propagation reaches her worldsapce position.
And if you think that the volume are the same ones as the previous example, no they’re not, it is common in such textures to look similar between different areas a game. Feel free below to compare them in new tabs if interested (these are for Kara’s example).
Volume 1
Volume 2
As mentioned earlier, this compute runs when there are 2 or more volumes influence is required so the blending can take place, here are these several frame below are from different areas of a similar level (mid-size level), but the results differ from frame to another based on the player & view position, but the shared thing between all these frames that they are requiring the same 3 volumes that covers the play area (not 2 volumes this time).
Feel free to check the 3d textures below if interested to observe the difference between the 4 frames that shares the same 3 volumes. Keep in mind that volumes used for this level is larger than previous examples, it is 117*177*3 (still R16G16B16A16_FLOAT).
Example 1 (Connor looking to the level)
Example 2 (Top view of the level)
Example 3 (Crowds by the crime scene)
And for the sake of curiosity, here is the struct that defines the data stored within these probes of the volume texture/grid.
Structs
struct S_LPV_TO_STACK
{
float4 _vColor;
float _fIntensity;
int _iTexSize;
uint _uiSHCoeffTexRIdx;
uint _uiSHCoeffTexGIdx;
uint _uiSHCoeffTexBIdx;
uint _uiPad0;
uint _uiPad1;
uint _uiPad2;
}
struct dyn_Stack1_Layout
{
S_LPV_TO_STACK Stack1[];
}
Light Clustering [Compute]
The game utilizes a clustering technique, which means that the view frustum get split into a 3d grid of clusters, and we use that to know which lights are going to be rendered, and which clusters these lights will influence. At this step (can call it Fill Cluster), each cluster does an intersection test against each light source’s radius and fill the cluster data with valid (not culled) light source indices via some glsl atomic memory functions such as atomicAdd. The compute does multiple dispatches, as a one dispatch per light source type. There are 4 light source types in Detroit, dispatches usually ran for the types 0, 1 & 2 (Spot, Point & Box) and not for the directional light source type (type 3). Eventually either the clusters that end up with no light source intersections, or light sources that end up with no cluster intersections, get denied & we don’t care about them for the rest of the frame.
Something like that…
From Real-time many-light management and shadows with clustered shading
Ola Olsson, Emil Persson, Markus Billeter – Siggraph 2015
No much visual impact at this point, but there are data output to be used later in shading color passes.
In Buffers
//Source Light
struct S_LIGHT
{
float4x4 _mVsToLs;
float4 _vfVsPos;
float4 _vfVsDir;
uint _uiType;
uint _uiVolumetricData;
uint _uiSceneZoneBits;
float _fSpHumbraOut_PrCUTranslate;
}
//Source Lights Descs Array
struct S_LIGHT_ARRAY
{
S_LIGHT[512] _aArray;
}
//Fill Cluster Constant Buffer
struct CLUSTERED_SHADING_ASSIGN_LIGHT_TO_CLUSTER
{
uint _uCurrentMipWidth;
uint _uCurrentMipHeight;
uint _uCurrentMipDepth;
uint _iPadding0;
uint _uCellWidth;
uint _uCellHeight;
uint _uCellDepth;
uint _uSizeBitShift;
uint _uLightType;
uint _uLightCount;
uint _uLightIndicesBufferSize;
uint _uPassIndex;
uint _uDebugMode;
uint _uMipIndex;
float _fCombinedEyeAdjustX;
float _fCombinedEyeAdjustZ;
float4[16] _avfApproxWorldCellSize;
}
//Cluster Info Constant Buffer
struct CLUSTER_INFO
{
float4x4 _mViewToClusterView;
float4x4 _mClusterProjMatrix;
float4x4 _mClusterInvProjMatrix;
float4 _vfLeftEyeToCombinedEyeVsDeltaXZ;
uint _uiFinestMipWidth;
uint _uiFinestMipHeight;
uint _uiFinestMipDepth;
uint _uiFinestMipWidthHeight;
float _fZSliceGeomFactor;
float _fRcpLogZSliceGeomFactor;
float _fDummy0;
float _fDummy1;
float _fNearP;
float _fInvNearP;
float _fFarP;
float _fClusterCoincidentWithCamera;
}
Out/Storage Buffers
//Shared
//Cluster Cell
struct S_CLUSTER_CELL
{
uint LightCount;
uint LightFirstIndex;
}
//Light to Cluster Projector
struct ProjectorMatrix
{
float4x4 mLightSpaceToClusterSpace;
}
//Source Light Indices
struct stc_SourceLightList_Layout
{
int SourceLightList[];
}
//Source Light Cluster Data
struct dyn_SourceLightCluster_Layout
{
S_CLUSTER_CELL SourceLightCluster[];
}
//Dest Light List Indices
struct stc_DestLightList_Layout
{
uint DestLightList[];
}
//Cluster Grid Data
struct stc_ClusterGrid_Layout
{
S_CLUSTER_CELL ClusterGrid[];
}
//Source Light Projector Matrices
struct dyn_SourceLightProjectorMatricesDescArray_Layout
{
ProjectorMatrix SourceLightProjectorMatricesDescArray[];
}
Procedural Textures
Ahead of time, some procedural textures created &/or updated with the efforts of a collaboration between some compute & fragment shaders. The resulted textures will be needed later in the frame to serve multiple visual enhancements (mostly to do distortion or faking surface normals).
1.Rain
Detroit is very rainy in that game (idk irl), and pretty much all the time the rain procedural textures are processing at the start of each frame, even while inside an interior set & regardless it is rainy or not outside (like inside Kamski’s house or even at the rooftop at the Hostage first mission).
How it works?
The rain procedural textures preparation is done in a set of passes that are back to back to each other as you’ll see below, but all in all the output of this step is a about 3 textures at most cases, one used for diffuse and the other ones used for water drops & flow (like on a glass window or some character face & body), these are usually in 256*256 of the format B8G8R8A8_UNORM and 9Mips. Let’s see how this works!
i.Rain Ripples
Around 8 different draws split into twos, updates the rain ripples texture (most likely starts as an empty texture at the begin of the level), with the help of a water drop diffuse sampler, new source of ripples (points) are drawn.
Each of these 8 draws is split into 2 draws, one to draw new ripples (i call it new ripples roots) and the other draw to update existing ripples flow (scaling).
If you can’t spot the roots for the new ripples, you indeed can do here, these are the exact 8 steps for drawing the ripples, you can observe the new roots every other draw. The first draw is in fact a “new roots” draw, but you can’t tell or observe that, because there is not previous draw to compare to, but it uses the Diffuse Sampler texture above, which is used for the new roots draw.
Procedural Rain Ripples Texture Constant Buffer
struct ProcTexRain
{
float _fTexOffsetX;
float _fTexOffsetY;
float _fAbsorption;
float _fDispersion;
float _fSharpness;
float _fFlowScale;
float _fFlowGain;
float _fFlowToRipples;
}
This output ripples texture (aka Rain Diffuse texture) while it looks off, but you understand it better if you look to it as a set of channels, like so:
Now you may be asking, why B & A is exactly the same?
In fact they’re not! Open them & look closer!
The A channel is used in the past few steps to accumulate or blend between the previous & next step in the animating ripples effect, and hence by the end of this step, it is possibly not needed anymore, right? Let’s see below!
ii.Rain Ripples Mips [Compute]
The rain ripples texture get mipmaps generated, to end up with total of 9 mips, from 256*256 to 1*1.
iii.Rain Dripples
The “Rain Dripples” is not really born as a separate texture, it is stored (for now) in the alpha channel of the Ripples/Diffuse texture, by the end of ripples step (the previous step), the info at the alpha channel of this rain diffuse texture is not needed anymore, so it gets cleared & replaced with something else here, this something else is the Rain Dripples. So all steps in this section takes place for the final RGBA texture we got from the previous step, BUT it happens in the A channel only, the RGB channels remains intact.
So, at this step drawing starts by clearing the alpha channel of the ripples texture to black. Then few rain ripples are randomly drawn to that render target’s alpha channel using the same water drop diffuse sampler used earlier.
iv.Rain Dripples Distortion & Sharpen [Compute]
Using a distortion texture, the Dripples (Alpha) is copied to it’s own new (and final) texture and slight distortion applied to it, this texture will act later as a normal map.
Rain Distortion Constant Buffer
struct RainDistortion
{
float _fDistortionFactor;
float _fSharpenNormals;
float _uiTextureWidth;
float _uiTextureHeight;
}
v.Rain Dripples Mips [Compute]
Just generate mipmaps for that normal map-y thing! to end up with the standard of the rain texture, a 9mips from 256*256 to 1*1.
vi.Rain Flow
The “Rain Flow” is not really a separate texture too, and not any different from the Rain Dripples (previous step), it is stored (for now) in the alpha channel of the ripples/diffuse texture (one more time), as we cleared the content of the ripple texture’s Alpha channel to draw dripples, we do the same thing to draw the flow, except that this time, we will copy over the dripples to the flow texture to make it much more…busy!
Keep in mind that for rain flow too, all steps in this section still taking place in the updated RGBA texture we got from the previous step, EXCEPT that it happens in the A channel only, the RGB channels remains intact here too.
So, few lines are drawn on Alpha channel of the ripples/diffuse texture, these lines shortly will be used as base for water flow. Drawing for these ugly lines done using the exact same water drop diffuse sampler used in the previous steps.
Now that lines texture data are copied to/with the water dripples, so both are ended up on top of each other in the alpha channel of the ripples/diffuse texture, as we don’t need this channel anymore to hold anything related to ripples, now on the alpha channel will be holding the “flow” info. Remember, we are still doing all that on the Alpha channel only, so RGB still not touched.
Now we got a base for the Rain Flow & it is ready for the next step.
Alpha Channel Note
Just because this Alpha channel been through a lot, to put all steps for the alpha channel of the Rain Ripples(Rain Diffuse) from holding the “Ripples” info, to hold the “Dripple” info, ending up with the “Flow” info. This channel served as a back blackboard for drafting things!
vii.Rain Flow Distortion & Sharpen [Compute]
Using a distortion texture (similar to Dripples), the Flow (Alpha) is copied to it’s own new (and final) texture and slight distortion applied to it, this texture will act later as a normal map.
Rain Distortion Constant Buffer
struct RainDistortion
{
float _fDistortionFactor;
float _fSharpenNormals;
float _uiTextureWidth;
float _uiTextureHeight;
}
viii.Rain Flow Mips [Compute]
Just generate mipmaps for that normal map-y thing! to end up with the standard of the rain texture, a 9mips from 256*256 to 1*1.
Put it all together
And to put all steps together with the final frame
It always rain, even when not needed!
As mentioned earlier that the rain procedural textures are processing at the start of each frame, even while in interior set regardless it is rainy or not outside (like inside a house or so). So rain procedural step happens all the time, but not all the time it draws meaningful procedural textures. And even if it process in writing some procedural textures, there is no guarantee that the output textures will be really used. What really defines if the rain procedural textures are processed to valid textures (regardless really used or not) is a flag/param _uiRainIsEnabled passed as part of the forward clustered PBR pass rain params struct (you can see it later as part of one of structs), but regardless this flag is On or Off, rain flow textures are processed & simulated most of the time at the beginning of each frame. Even at the game’s living main menu aka “Chloe”!
Here are some frames where there are no presence of rain, but still have rain flow textures (ripples too) processed 100% as if will show that effect at the final frame.
While we ended up with nice rain flow textures for these previous set of frames, but eventually the coloring passes down the road had the value _uiRainIsEnabled of the rain parameters struct set to 1. So it is not really used visually, but that flag made them simulate & produce correct flow textures!
And here are some frames of the other case where there are no presence of rain as well, and the process still take place, but at this case resulting an empty flow texture per frame. And for these frames the _uiRainIsEnabled of the rain parameters struct were set to 0.
Pseudo Rain Flow
Rain Flow textures (and sometimes Rain Dripple) are not exclusive to showcase rain effect/mode, but also used sometimes for none-rain purposes, things that is not that far away from being considered as rain. For example, when one of the protagonists get wet after coming out of pool or something, it is not rain, but it is a “water flow” on top of the body after all!
Rain, Compute, Fun!
When taking a GPU capture, games (any game) will usually go into a frozen state. At least game & render threads sleeps while something like audio thread keeps chilling in the background to run the last request/s it got from game thread before the game thread sleeps for a while, and this is expected, we need that mechanism so we can record all the API calls with all their data & stuff at that exact frame (data write/transfer out of question now) for a later re-play. Now the fun part, due to the dependency relationship between the frame drawing & the rain compute dispatches, which by it’s turn seem to be based on sampling the game clock time value sent by CPU, compute will jump in time once capturing frame is completed, and process all the frames we missed during the capture duration into the same rain flow texture within the first unfrozen frame, and that resulting in a “showering” or “sweating badly” distortion textures instead of rain distortion textures, which looks funny!
And it all make sense, as mentioned earlier that the generation of these flow & dripple maps are 100% done on the Alpha channel only, when you look to all the outputs of the rain procedural textures generation, you would notice the change in the Alpha channels only, and what is based on them (like the final normal maps), but something like the diffuse color, still look normal & not changed, because it is not based on the same matrices & it is done at it’s own100% fragment based steps (compute involved only mipmaps generation).
Here are the detailed outputs for that Markus showering rain example..
And here it is one of them in action
Just a full Video for that previous showring gif
and i truly don’t blame you if you thought about this meme!
2.Caustics
Yet another step that runs regardless if it is really needed or not to contribute in the frame. Here, in a much simpler process than Rain, the engine will work on simulating or generating texture used for caustics-like effects.
i.Draw Caustics Texture
A fragment shader draws some caustics using multiple intensity samplers in addition to a distortion texture. The output texture most of the time is a 256*256 but sometimes a higher resolution required (not sure based on what metrics) and hence the output caustics texture is 512*512.
ii.Caustics Texture Mips [Compute]
Generate mips of the caustics texture, to end up with 9 mips, from 256*256 to 1*1.
iii.Draw Caustics Texture 2 [Not Always]
An updated caustics texture, a second one and not sure for what use, as all the time the earlier generated caustics texture is the one used. Exact same intensity samplers are sued, but at this time the distortion texture is quite different.
This is not always taking place, as at many times only a single caustics texture is generated & used.
iv.Caustics Texture 2 Mips [Compute][Not Always]
Generate mips of the caustics texture, to end up with 9 mips, from 256*256 to 1*1.
Put it all together
And to put all steps together with the final frame
And for that, few parameters are required for the caustics drawing, just the amount of distortion & the speed of the pattern animation.
Caustics Constant Buffer
struct ProceduralCaustics
{
float _fDistorsionPower ;
float _fSpeed;
int _iPadding0;
int _iPadding1;
}
As mentioned earlier that caustics generation is always there, regardless it is need or not. Here are few examples of frames that had caustics textures generated while you can’t really spot any existence of caustics effect utilization!
Caustics itself is not easy to spot in the game i agree, regardless it is a gameplay or cut-scene. It works like a hidden ninja soldier, it adds a subtle effect you can feel, but you can’t really easily spot it all the time. But there are areas where it really shines, and in order to see the actual impact or contribution of a caustics texture into the final presented frame, here are some final frames before & after i do some shader modifications in order to remove the instructions block responsible for sampling & mixing the caustics texture into the affected areas (there is no graphical option in the game settings menu to toggle the effect, so had to improvise!), you can tell how much it adds to the final frames!
It really is adding some nice subtle touch!
Clear Previous ZPrepass/Depth
Just clear the previous’ frame ZPrepass output in order to write to it shortly for the current frame.
Clear Previous Motion Vectors
Nothing really fancy as well, just fill the motion vectors rendertarget (that contains previous frame motion vectors) with solid black color, so we can start drawing into it very soon after the ZPrepass.
Keep in mind, this is the main rendertarget to draw motion vectors that we clear here, the data that used to be in there was copied earlier at the start of the frame to another temporary target during the long copy/clear queue.
Z-Prepass/Depth Part 1 (Opaque/Static Geometry)
Z-Prepass became almost an industry standard, and a ton of engines & games adopted it by heart as a for granted shading optimization technique and there is no wonder to find a game such as Detroit having it, it is the type of game that strives for every precious fraction of a millisecond in order to deliver as realistic as possible photorealistic realtime frames.
Z-Prepass here was literally imitating the formula of hitting many birds with a single stone (not tiny stone though), it is not only benefit shading later to avoid fragments overdraw, but also utilized multiple other things such as hair rendering (and other cut-out / alpha-tested geometry), SSAO/HBAO, Shadow Maps Z-partitioning, SSR, Clustered light culling, Decals,…any many more!
Z-Prepass in Detroit is a full Z-Prepass but it neither fully done in a one go nor in a single pass, it takes few distinctive (and logical) steps to fully complete the Z-Prepass. At first step, with backface culling, and only the power of vertex shader, a just regular old friendly Z-Prepass is completed for all opaque geometry in the scene. The good sign here, that this pass is used vertex description that contains the vertices positions only, which is a good sign of granular optimizations.
The largest piece of the Z-Prepass is done here, let’s say 80% of the final Z-Prepass!
✔️Vertex Shader
❌Fragment Shader
Motion Vectors Part 1 (Opaque/Skinned Geometry)
After the 1st step of the Z-Prepass and through a fragment shader, the process of generating motion vectors data/texture that will be needed a lot later for a wide range of effects (TAA, SSR, Blur, …etc.). Geometry processed here is unique, and not processed yet in the previous step for the Z-Prepass. This is a dedicated render pass, the color output of this renderpass is the motion vectors texture, but the depth output is holding something else (next step).
The output here (3840*2160 of R16G16_FLOAT) is not including everything that moves indeed, but it seem to be for skinned meshes only (Characters, clothing, Trees, Grass,…etc.). There are some objects missing indeed from the motion vectors texture, but these coming later…a little bit very later than expected!
✔️Vertex Shader
✔️Fragment Shader
Z-Prepass/Depth Part 2 (Opaque/Skinned Geometry)
This is not a very individual step, it takes place as part of the previous step, so basically it is a single pass that does the motion vectors (outputs to color) + characters/skinned Z-prepass geometry (outputs to depth).
You can’t really consider this pass as a Z-Prepass because here will be writing via fragment shader in addition to the vertex shader, so output is not depth only, but i consider it part (or a step) of the Z-Prepass because it adds up to the depth output of the Z-Prepass in order to include more “opaque” geometry.
A lot smaller piece of the Z-Prepass is done here, let’s say 15% of the final Z-Prepass!
of course these % vary based on the space of screen occupied by characters, it is just for demonstration, but in average frame it would be near this number
✔️Vertex Shader
✔️Fragment Shader
Hair Accumulation
This is a very important step/pass for hair rendering. Once this short step is done, anything that is related to hair rendering would need the output of this step (this includes the very next step, which is the last step of the Z-Prepass steps).
With the help of this step, Detroit avoids fully transparent hair rendering, but at the same time keeping on a close hair rendering quality to fully transparent hair, and at the same time avoid fully alpha-tested hair for most of the time. This is done by splitting hair between two types, hopefully the larger portion of hairs to be rendered as Opaque, where the tinier portion (that will make you tricked) to be rendered as Transparent. The idea here for hair accumulation is simple, it is to allow using transparency and alpha-tests in order to reduce the amount of overdraw to almost near none (in terms of hair card). The end result is very recognizable hair cards, but it is partially translucent as huge part of it would be rendered as opaque cards.
So render a single hair mesh in two passes (Opaque and Translucent), but only the Hair Accumulation is the one going to decide which parts of the hair pixels going into which pass (Opaque or Translucent).
Shame on EA, BioWare and Veilguard graphics team!
If you worked in games or tech, you probably met at some point in your career that co-worker who comes in person and in private (not in meetings, not in public team chat) & asks you questions about their task as they can’t progress in it, and you answer them with all the details you know about the topic because you are a nice person. But you then at the next day in a big team meeting or a standup sync, when they asked about that task progress, they explain the exact same things you told them (even exact wording at some times), without crediting you or mention that you helped them through the topic, and sometimes they even go further a mile and would say that they “i figured it out” or “i did some research and found out that …”. You know that guy,…this is EA & BioWare here!
Today, rn, it is the 13th of November 2024, this Detroit’s Hair Accumulation section written long time ago, and the entire article itself is almost done, i’m just revising all sections trying to make wording clearer, moving paragraphs of same section where it may fit better & adding some extra footages, and getting ready to release it as soon as i can. Went to visit twitter for a break, and then an article came across my path from EA’s blog, it is about the Hair tech in Veilguard, and while i was reading it, i was opening my mouth shocked, it is very familiar, it is the exact same technique i broke down in this article months ago, and there is no problem in that at all, it is cool to see similar techniques in different games, the problem that shocked me was when i went through EA’s entire article, and earlier in it, they were (the Veilguard team) crediting themselves for the “development” of this “new” technique, and they never mention or reference anything, like Detroit’s GDC talk for example!!
No mention for previous works, no mention that this was made in other games before, no mention to any papers or GDC talks, no mention to anyone! Keep in mind, Detroit shipped with this technique 6 years ago, which means the tech is older than that release date (regardless innovated by Quantic Dreams or else).
Nothing can upset me more than not crediting others where they deserve crediting, or not being honest or transparent about what you do, or self crediting as pioneer on something you clearly mimicked from a REAL PIONEER! So, i don’t regret putting that header above! Such mentality is bad for such industry! Shame it comes from such big team!!
Note: this is not a hate letter or me trying a new way in burning bridges with EA. Nope! i have some friends and previous co-workers & some nice legit twitter friend who works under EA, i still love some games from EA, but this is simply me, i don’t like this type of crappy behavior, specially when i see such a thing in work/gaming related topic!
Not always outputting!
This step takes place all the time indeed, it is a core step for the Detroit’s rendering pipeline, but it does not really fully complete and output something useful all the time. From my observation, i can tell only some hair material or material flags for only hair of story main characters that you see most of the story time (not only playable, but characters such as Hank or North) would have accumulated hair output from this step, where other secondary (guest) characters (like Chloe, Daniel, Elijah, Todd, Amanda,..etc.) that don’t have much appearance at the course of the 15 hours of game story, would end up with empty Hair Accumulation renderpass attachments as they will just get away with alpha tested hair. For example, in all these frames below, the Hair Accumulation renderpass attachments were always solid black, despite the fact that there are all sorts & types of hair!
1.Clear previous Hair Accumulation
First step is to take the Hair Accumulation renderpass attachments from previous frame, and do clear them to solid black. Nothing fancy. Clearing the attachments from the previous frame data is something that happens all the time, even in cases where there are no Hair Accumulation to do for the in-progress frame (like these previous set of images), and it make sense to do clear all the time, it is for example for the sake of just in case this frame is a totally different camera view (camera cut).
2.New Hair Accumulation [Not Always]
Remember, all the upcoming efforts, is only to reduce the amount of transparent hair to the minimum-est minimum amount!
i.Tweaked Alpha (Color)
In a lower resolution attachments(1/4 of the target resolution), and with the use of hair cards alpha texture/s (size & format of that texture vary per hair type/mesh), the tweaked alpha transparency for specific character’s hair materials will be drawn. Let’s look at a capture that features some nice hair (Kara & Alice capture that we used in the previous step uses hair materials that doesn’t output a Tweaked Alpha).
iii.Opaques (Depth)
Using the same hair cards Alpha texture, do accumulate and when there is too much accumulation happens (certain coverage percentage), the pixels covered by these hair cards are considered as opaque, and thence drawn to the the output Depth render target (1/4 of the target resolution) as solid.
So to put it together
And here is another example that has some much motion (character & view), so you can observe the difference between the given Tweaked Alpha & Opaques & the output one. Also you can observe that the secondary character doesn’t have any Hair Accumulation processed, only main characters, remember!
Hairs alpha here is different, it is a 1024*512 – BC1_SRGB
Also keep in mind, it is not always a single hair alpha texture used for a single character, sometimes there are more textures if the case requires that, for example Hank (the funny detective), has in addition to the John Wick hair a long beard, and hence he uses two alpha textures to draw, which makes it 4 steps instead of two (beard tweaked alpha, hair tweaked alpha, beard opaques, and hair opaques).
And draw steps for Hank alons is something like that:
And the more valid hair materials that supports Hair Accumulation, the more steps will be taking place to produce the Tweaked & Opaques images (aka more main characters in the scene = more draw commands).
Now what didn’t really make sense, and i did not investigate further, that both steps (Tweaked Alpha and Opaques) are using the exact same couple of vertex & fragment shaders, so why done in two different draw commands per hair mesh?! Why not draw Tweaked + Opaque at once per mesh (once for beard, and then once more for hair)!?
Z-Prepass/Depth Part 3 (Alpha-Tested Geometry)
This is the final step for the Z-Prepass, to write the last few geometries that didn’t make it yet due to the obvious reasons (alpha-tests). Things processed here like hair,…or usually it is hair only! Things such as head hair, eyebrow, beard,…etc. A simple fragment shader runs, that does discards/kills() based on the given texture sampling (like these ones below for that Kara & Alice bed time scene). And of course, the alpha tests here are all done with the help of the hair accumulation textures that was generated previously at the previous step.
An finally the smallest piece of the Z-Prepass is done here, let’s say about 5% of the final Z-Prepass!
✔️Vertex Shader
✔️Fragment Shader
And to put this last couple of steps in perspective, from taking previous Hair Accumulation, to generate new ones, to use it to update the Z-Prepass with hair info, it is something like that (watch out, so hair focused captures, as this Kara & Alice bed-time capture not showing much hair accumulation details due to the material used for Kara’s & Alice’s hair)
Previous Tweaked Alpha, Previous Opaques, Hair Cards Alpha, Current Tweaked Alpha, Current Opaques, Previous Z-Prepass part 2 (Skinned), Z-Prepass part 3 (Alpha-Tested) and finally Swapchain
By now the entire Z-Prepass is ready, and to put all together, it is like that (3 passes altogether)
i know that some would consider this design as a partial Z-Prepass and it is not really a Full Z-Prepass as i mentioned at the beginning of that section, but i tend more to consider it as a full prepass due to the final output, yes it did involve fragment shader at some point, and not done in one go in a single pass, but this doesn’t negate the fact that the final Depth image was ready before any & every thing else. Its more about the intention behind it and the time it executes & the role it plays. Think about it like deferred’s gbuffer (some calls g-prepass), if it took place on multiple render passes, let’s say 3 passes, due to some reasons such as different shader bindings, would this change the fact that all these 3 passes are eventually represents the full g-prepass?
Anyway, because producing Z-Prepass was not just straight in one go, and there were some other few tangled steps in the middle, here are steps outputs from Z-Prepass 1st step, ending up with the full & final Z-Prepass for the frame.
And few more other example that has much more motion going on
the hair was either far enough or in a specific camera view, to be considered as fully Opaque
during the “Hair Accumulation” step.
Depth Texture
Taking the full & ready Z-Prepass Depth/Stencil attachment and convert it to a Depth Texture. So from D32S8 to D32, but still same size.
Depth Texture 1/2 + Low Res Depth
Using a fragment shader to down sample depth and output two different version of the 1/2 depth with different formats (needed at the next step for HBAO).
i wanted to leave the note here that this new Low Res depth (red guy) is made of 11 mips, as this is very important to keep in mind. Yet we draw to the mip 0 (1st mip) the 1920*1080 (1/2 original frame res) depth in R32_FLOAT format, while the rest of mips still holding whatever used to be since previous frame. This texture as mentioned is needed for HBAO and it only needs the mip 0 & mip 1, so yet the mip 1 that we didn’t modify, is still holding data from the previous frame (which is important for the HBAO implementation as you will see below).
Because the two mips are two different sizes, you may find hard time to find any different, and you may think that the 540p is just a scaled down (mip) version of the 1080p, but in fact it is not, here are the difference between both
As it may not be visible in such a frame (Kara & Alice) due to being very subtle scene in terms of movement (camera & characters) considering that during this moment there were a very very subtle camera movement towards Kara & Alice and characters were almost frozen, but this can be observed easily in other frames that has some more action & motion like frames captured from a chase or running sequence.
So it is clearly that yet, mip 1 is still holding old R32_FLOAT depth from the previous frame
HBAO [Compute][Not Always]
Only if AO is enabled at Video Settings (which is the case for all my captures) the game will go through this step. SSAO here seem to be processing one of the many unique-yet-close HBAO implementations, it looks for me a lot like the the Frostbite’s 2012 one called “Stable SSAO in Battlefield 3 with Selective Temporal Filtering”. Let’s see how this works!
1.Down Sample
First, we start with Downsample the given linear depth to1/2 or better to say it to 1/4 of the target resolution (considering the given linear depth is already 1/2 the target resolution from the previous step). So the depth taken from 1920*1080 (half done already) to 960*540, this new output is not stored in an individual texture, instead it is stored in the 2nd mip of the texture (mip 1), the mip that was still holding data in the previous step. The rest of the 11 mips remains solid black, and not utilized.
Depth Down Sample Constant Buffer
struct PostProcessingDownSample4xDepthMin
{
int2 _viSrcSize; //1920, 1080
int2 _viDestSize; //960, 540
float2 _vfSrcTexelSize;
int _iPadding0;
int _iPadding1;
}
2.HBAO & Classification Mask
Now with the Depth, Low Res Depth, Blue Noise as well as the previous frame’s HBAO output, the HBAO algorithm runs to output the Packed Depth & AO as well as a Classification Mask that is needed to stabilize the AO result from any possibly “trailing” or “flickering” effects.
Using the full target resolution depth here is important, so later when we do blur, we do it in full resolution, which will help in avoid bleeding across edges.
In this output “Classification Mask” , the black color means “Unstable” pixels where white color means “Stable” pixels. Unstable means it is possibly either going to trail or flicker.
HBAO Constant Buffer
struct PostProcessingHBAO
{
float4 _vfProjSetup;
float4 _vfProjCoef;
float4 _vfScreenRatio; //1.00, 1.00, 0.00, 0.00
int4 _viNeoScale; //1, 1, 2, 2
float _fDefRadius; //100.00
float _fDefNegInvSqrRadius; //-4.00
float _fDefSqrRadius; //10000.00
float _fDefMaxRadiusPixel; //85.00
float _fDefAngleBias; //0.2618
float _fDefTanAngleBias; //0.26795
float _fDefExponent; //1.00
float _fDefContrast; //1.60
float2 _vfDefFocalLen; //5.44595, 9.68169
float2 _vfDefInvFocalLen; //0.18362, 0.10329
float2 _vfDefResolution; //3840.00, 2160.00
float2 _vfDefInvResolution; //0.00026, 0.00046
float2 _vfDefAOResolution; //3840.00, 2160.00
float2 _vfDefAOInvResolution; //0.00026, 0.00046
float2 _vfDefAORealResolution; //1920.00, 1080.00
float _fUpsamplingFactor; //1.00
float _fDefHQDistance; //20.00
float _fFrameIndex;
float _fMaxRadiusMip0Squared; //16.00
float _fMaxRadiusMip1Squared; //64.00
int _iPadding0;
int _iMouseX;
int _iMouseY;
int _iPadding1;
int _iPadding2;
}
3.Mask Dilation
Taking the “Classification Mask” from the previous step, and stabilize it by dilating the unstable pixels.
From the values passed to this compute shader, you can see that is step between either 0 or 255 (also uses viewport size + 1 to avoid any artifacts by screen edges).
Mask Dilation Constant Buffer
struct DilateTiles
{
int2 _viViewportSize; //240, 135
int2 _viOutlinedViewportSize; //241, 136
int _iDilateMask; //255
int _iPadding0;
int _iPadding1;
int _iPadding2;
}
4.Blur & Selective Temporal Filtering
Blur (Grainy Blur) that is done in full resolution in order to avoid bleeding across edges.
Grainy Blur Constant Buffer
struct PostProcessingHBAOGrainyBlur
{
int2 _viViewportSize; //3840, 2160
float2 _vfInputTextureInvSize; //0.00026, 0.00046
float _fNoiseDelta; //0.25
int _iPadding0;
int _iPadding1;
int _iPadding2;
}
And to put it all together, the entire HBAO workflow
And here are few more examples with some variations & different scene conditions (snow blizzard, fast motion, large crowd, interior with a nice rug,..etc.)
Example 1
Example 2
Example 3
Additional AO Notes
1.Low Contribution
Don’t let the AO image fool you, they always does. The impact of the final AO image quite low and even much less on characters. While you may be able to spot the AO contribution by naked eye for things like buildings or walls (still low but noticeable when shuffle frames), it is nearly impossible to observe it for characters most of the time, even in close-ups.
Let’s take this close-up of Markus at the The Stratford Tower’s lobby.
With tiny shader change (don’t even have to fully reverse it, just modify where the HBAO texture is fetched), you can see the difference between having AO Enabled (my config’s default), Off and Exaggerated.
Not that much impact on characters visuals! It is very subtle, hers is another example that it can be observed much better, look by North’s neck or by the hair at the right side of the face (your left side).
Here is the original frame data
& the 3 different AO values as follow…
Heads-up… Color Spaces, Channels, Conversions & Stuff
During this HBAO sections I’ve altered some images slighting in order to look fine as a PNG. Images used to look very “Red” due to the used format, and at many cases the output itself was inverted, so i inverted them back. Here is some couple of examples:
AO Image
Dilation Mask
2.Not all Skies are made equal!
While looking thought some captures, i did notice the very clear existence of the sky sphere during the AO image processing. And at these cases there were some parts of the frame would be impacted by the far away sky sphere, usually meshes that are far as well (like edges of a building). Being horizon/depth based AO technique, this may be possible to an extent, but what was more interesting that the absence of sky from the AO processing at some other parts of the game, while it was very clear that there is sky sphere present at the final frame. Here are few examples (4 and 4) for frames where sky sphere contributes to the HBAO and for other frames where sky sphere was absent from HBAO but contributes to the final frame.
while the last four examples does not have a skybox during the AO processing
Sky sphere during HBAO can result in things like that, but not the end of the world indeed.
Shadow passes
Total of 8 passes at most. Half of them for clearing atlases (or parts of an atlas) and the other half is to draw new shadows. They’re as follow (in order of execution):
- 1 to 2 passes (most of the time 2) for Top View Sky Visibility.
- 1 pass for Directional Cascaded Shadow map.
- 1 pass for Point Lights Shadows.
- 1 pass for Spot Lights Shadows (including Close-up Shadows).
It is not always the case you get full shadow feature stack in a frame, for example in a daylight time exterior, you probably end up with just the Directional CSM, if in interior you either get the Sky Visibility + Local Lights(Spot and/or Point Lights), or many times the Local Lights(Spot and/or Point Lights) only for interior frames. And there are things that are in-between (like balcony, Night time exterior,…etc.), the point is, it is 3 types of outputs but not always all of them or same combination of them is present under the different conditions.
1.Clear
Before drawing to any of the shadow outputs, need to clear them first. Clearing is not done at full size (explaining below), but the important point here, that clearing usually done for the targets that the game need to draw to. So, if we’re in an interior scene that has CSM only, then we clear the CSM atlas only, which means less passes in general.
Considering this frame
i.Clear Top View Sky Visibility Atlas/es [Not Always]
ii.Clear Directional Cascaded Shadow Map [Not Always]
iii.Clear Spot Lights & Close-up Shadow Map [Not Always]
Tight Clearing
Clearing is not always very refined or not “literally” clearing in the way you expects the words to mean. The area or the defined viewport that we will be consuming to draw shadows of the current frame, is the only space that get cleared/freed from the target shadow atlas, and over the time with frames accumulating, this results in lots of left-overs which (i assume) would make debugging shadow maps quite confusing during development time.
Lots of things not only coming from previous frames or minutes of the current level,
but also things coming from previous levels too.
You may know, or may not, but clearing in this step is not really ‘Clearing’ as what usually “clear” means for a texture/attachment/rendertarget, as we speak here about depth attachment that uses depth format which makes things different. If you’re not familiar with what does that mean, let me explain it quickly:
Clearing here done simply by filling in a black color in the target rect area of a given “large” target shadow atlas (depth attachment) for the existing light sources, so it is not done via something like vkCmdClearColorImage as you may initially think based on reading the word “Clearing”, instead it is done via multiple vkCmdDraw (once per rect/viewport area of the shadow atlas). But what or how to draw?!
In a case like this, nothing easier than just calling a clear to empty everything, but The vkCmdClearColorImage is not used, simply because it “can’t” be used here due to the API specs, this is known & fine. And because this step not using any fragment shaders (remember it is shadow pass, so vertex shader only), drawing here is not really done in the sense of what the word “drawing” means, it is basically a triangle coordinates sent to the vertex shader, like a large enough triangle that covers the target rect/viewport area of the shadow atlas, and use tis triangle’s vertices X & Y position in addition to 0.00 as Z position (depth), and because we are working on specific area of the atlas as a “viewport” that is set by calling vkCmdSetViewport and vkCmdSetScissor, then that large triangle not drawing/painting anything outside the given area. So basically, in a simpler words, draw the depth of a triangle that fully covers the given rect of the atlas with a depth of Zero!
And in action, it is something like this
Red is the draw viewport (portion of the atlas),
and Yellow is the triangle to draw depth for within the given viewport.