Behind the Pretty Frames: Resident Evil

Introduction

Resident Evil…what a franchise…regardless if you’re old gamer or a new gamer, you can’t have a gaming life without trying at least a single RE game. Horror games have not been my favorite recently, but back in the day Resident Evil, Silent Hill & Alone in The Dark, been on the top of my list. Can’t tell you how many times i did scream while playing those low-poly-covered-with-linearly-interpolated-and-very-pixelated-textures games with their SUPER REALISTIC (at the time) level backgrounds & their amazingly brilliant Tank Control design that made me afraid to go out of the view…Oh boy, good old games! I can’t recall how many times i had to cut pages from the (very expensive to ship to my country) gaming magazines to decorate my room & my reading desk!

When RE7 released few years ago, it was a good opportunity to revisit that franchise after quite sometime of never playing any. It was great game, i did not complete it indeed because at that time it was around me everywhere, all people at office playing it all the time, during lunch, early morning and after work, anytime i just walk around, i find at least 2 playing it, even on the PSVR, so it was totally spoiled for me. BUT, at that time a Chinese friend who was learning Japanese in order to go to Japan and to work for Capcom (he did it eventually & joined Capcom’s teams couple of years after that situation) sent me an amazing article that is fully in Japanese. He knows that i’m always into any info about in-house engines, and he was hoping this article helps me to learn something new. And oh boy!! once i saw the article and went through the images & tried to google translate the text, i loved to learn more about that RE Engine (with fingers cross, i hoped to ship something with it) , i wanted to learn more about that thing that looks like a perfect balance between the simple to pick up engine with heavily dependance on Python, C# and VM scripting (such as Unity3d or Godot), and at the other hand that mastery of visual fidelity, best tooling and technology (like Frostbite, Snowdrop, Anvil, Northlight, Glacier, RAGE…etc.). I wanted to learn about that engine that has many rad ideas, things such as not packaging for target platform,…things such as modifying a gameplay C# code while the game already running and see the changes right away, that was interesting enough in 2017.. And hence the decision was taken, to try to hunt further info about the engine, and to try to breakdown any RE-Engine based game…But as it’s easy to get hyped, it’s even easier to lose interest due to the lack of time or resources…

Years passed, games released, RE Engine became more known to the public, at least as “name”, but still i did not go further than the Japanese article. Until one day i was browsing twitter, and saw an announcement for a new video from the folks of Digital Foundry about the latest patch for Resident Evil 2 Remake, i went through the video and then back to the tweet replies to go through the interesting comments in there.

And by only watching the video, and going through the replies to that tweet, was enough to put some alcohol on the old hidden wound called “dig more into RE Engine”…and hence, i decided to dig some RE games (mainly RE2, but RE3, RE8 & RE Resistance is a good candidate to do some compares).

As you might’ve noticed that the article name is just “Resident Evil”, and i did not pick a number. This is not a mistake, and it was in purpose. Where the article driving force was to check something in RE2 as mentioned above, but i did get RE3 as well as REVillage alongside, so i can look at all of them at the same time. When i started with RE2 i knew it was D3D11 game, but seem the latest update was not only adding raytracing, but a full D3D12 support, as well as it seem to be “migrating” the entire game to a newer version of the engine. And because of that (and it’s a little no-brainer), when you’ve 3 games using the same engine, all of them from the same company, all of them is the same franchise, and all of them are updated at roughly the same timeframe, this mean one thing, all of them are using the “exact” same technology or more to mean Engine version. There are defiantly some minor difference, but i believe starting RE7 ending up with the latest update for RE2 & RE3, pretty much exact same engine version is used, and there is no wonder if RE7 is patched in the future with a new D3D12 patch. But all in all, i do anticipate a patch someday for REVillage and REResistance someday soon.

Not only that, i do have gut feeling that, the RE4 announcement from capcom during the Summer Game Fest, has a huge contribution in that. If they’re working on the new RE4 for the current gen, with some new features, possibly raytracing and other fancy stuff, so why not migrate the current game (RE2, RE3 & Village) which running on an older fork of the same engine to the RE4 newer version of the engine, and keep all of them in sync, and at the same time test the latest version of the engine in an already released games instead of risking releasing RE4 with a broken engine….it’s just an assumption.

Anyways, you can say, if i’m not bounding myself to a format for the naming of those behind the frames articles, i would have named this one Behind the Pretty Frames: RE Engine.

Totally Not Important Note

Configs

I still captures from only one of my PCs, it is already challenging to capture from multiple games to compare at the same time, and i didn’t want to install all the 4 games and capture from them from 2 PC..that would never end this study! The PC i’m using still the same one from Elden Ring’s study, which is the RTX 3080Ryzen 5950x and 32G RAM. And the graphics settings is set as the screenshot below

i went with 4k 3840*2160 (UHD-1) not because i love and enjoy big resolutions, but because the game defaulted to it initially and i found it a good idea to test upscaling features (FSR or Interlaced). Also as you might notice that the Raytracing is disabled, but those settings in the screenshot is for the general frame study in default (most common) configs. Unfortunately when raytracing is enabled, capturing from GPU fails. So there won’t be much details about Ray Tracing yet. Later on in the article, there are dedicated sections to discuss things that are using slightly different graphics settings. So if there is any change made in the configs, it will be mentioned at it’s dedicated section of the article. Otherwise, everything was captured with the settings in the previous image.

Resolution Note

Because we’re on 4K and upscaling, so you don’t expect to see the native final resolution everywhere. There is “multiple” Render resolutions, but there is single Output Resolution. So there will be variations along the frame’s lifetime, where the 4k is mostly the final delivery, but most of the time, render targets are in one of those two resolutions:

2953 * 1661 – GBuffer, Velocity,…etc.
1476 * 830 – DOF, SSR, Bloom,…etc.

and of course:

3840 * 2160 – Final output

Still on my old habit, i do take multiple captures for multiple areas to cover, figure out and understand topics better, as one capture could hold more clear insight about something than another capture. Also i do take multiple of captures for the same capture as the same area/view, just in case.

There are gameplay as well as Cinematic captures, and as you might’ve noticed from earlier articles, that i’ll be more biased to refer to the captures from the cinematic sequences rather than captures from actual gameplay most of the time, not for anything but because game engines usually push the bar during runtime cinematics. Still both are at runtime, and both at the same engine, and both are using and having pretty similar render queues.

All captures are taken from the Build ID 8814181 which was released mid-late June 2022 (around the DF video). Thankfully i wanted to store captures before the game get any patches or updates, so i found a magical steam option that will delay any auto-updates until i launch the game (sadly there is not option to disable auto-updates), so i decided to enable that for the game, so at least when it updates i get a heads up, so i know the captures are from different config. BUT, i then decided to take as many captures as i want to cover as many things as i wish, and can’t tell you how glad i’m for doing that. Because by 5th of July when i tried to launch the game to take a small capture, i found it popping to me that it need to install an update first!

Behind the Frame

D3D12

At the far opposite from the last D3D12 game i did dig, RE seem to well utilize the API. Starting from hardware raytracing, to using a lot of compute, intention to use shading rates, indirect drawing (while indirect draw isn’t exclusive D3D12, but game like Elden Ring for example didn’t use!), AMD FidelityFX multiple features, bindless,… and many more D3D12 specific API commands that are not commonly used (at least in the past few games i’ve checking, including Elden Ring) .

And a heads-up, all shaders are compiled under SM6.0 .

Compute

Just a heads up, apart form compute being utilized among the frame draws, for post processing and some effects (check Draw section), compute is heavily utilized in many many other areas that is preparing for the frame or just contributing from behind the scenes without any direct draws. i love compute, and i love when i see game/engine is heavily utilizing that. So, i guess it’s the time, to use that meme from the God of War study.

RE Engine, you well deserved my compute meme!

To give you an idea about what compute usage in a typical RE Engine frame (Cinematic or Gameplay, not difference), i’ll leave below just the summary for the utilization in dispatch order. But the details of those usages i prefer to put in the correct order at the Draw Section.

Compute Dispatches Queue (in execution order)

  • Fast Clear/Copy
  • Skinning
  • GPU Particles
  • Light Culling
  • Histogram Adjustment
  • GPU Culling
  • Instancing
  • SSSSS
  • Velocity
  • HI-Z Tracing
  • Compress Depth Normal
  • AO
  • Particle Collision
  • Indirect Illumination
  • SSR
  • Deinterlace
  • Motion Blur
  • DOF
  • FSR
  • CAS

Frame

Resident Vertex

RE got it’s own share of the vertex descriptions variations. It’s not quite a few, it’s more than that, which is not something i’m big fan of (in general things that can get error prone), but it is what it is. Below are the only ones i was able to spot, there might be others that slipped from me, but those ones are the ones i’ve been seeing for days across all my captures.

Resident Evil’s Vertex Description – Most Meshes (Walls, Cars, Rocks, Buildings,…etc.)

Resident Evil’s Vertex Description – Skinned Meshes (Heads,…etc.)

Resident Evil’s Vertex Description – Skinned Meshes (Cables, Wires, Bottles,…etc.)

Resident Evil’s Vertex Description – Skinned Meshes (Hair, Eyelashes,…etc.)

Resident Evil’s Vertex Description – Skinned Meshes (Banners,…etc.)

Resident Evil’s Vertex Description – Characters/GPU-Skinned (Eyes, Hands, Shoes, Pants, Jacket, Shirt, Shirt Buttons,…etc.)

Resident Evil’s Vertex Description – Volumes (Screen Quad, Cube volumes, Decals Projector, Triggers/Hit Boxes,…etc.)

Copy & Clear

This is similar architecture to what been discussed earlier at Elden Ring article, except here it doesn’t occupy much of the frame time & and it issues much less commands. With maximum of around 3.5k of total commands of CopyDescriptorsSimpl, CreateUnorderedAccessView and CreateConstantBufferView to prepare.

Fast Clear/opy [Compute]

An early compute pass with a handful amount of dispatches that is doing copy & clear from a StructuredBuffer to RWStructuredBuffer… From a D3D12_RESOURCE_STATE_GENERIC_READ to a D3D12_RESOURCE_STATE_UNORDERED_ACCESS. The cool thing here that during all dispatches of the fast copy, the source is always the same buffer, but the target differs, that buffer (with it’s ID/Index) i can see it used pretty much everywhere across the frame, as if it is used as main buffer to keep all data needed between frames, and then the data copied over when needed (like now at the start of a new frame).

The size of that big boy storing everything is [8388608] integer elements (*4 per int = 33554432 = 33.5mb buffer)…just saying…

This is not the only “Fast Copy” compute pass across a RE frame, there are quite a few here & there, i won’t bother mentioning them, only wanted to mention that one, because it is the 1st one of that pass type, and because it is the longest one of them and the most “impactful” one.

Skinning [Compute]

GPU skinning in the compute is not something new, and been known for quite sometime. But despite that simple fact, we still don’t see it utilized heavily in games (i’m talking more of course about the AAA space, where performance vs “realism” quality is always a fight). It is an amazing surprise for me to find that RE Engine is doing that. i can’t even remember what was the last game i dig that used such a technique. The technique is simple and straightforward, but it is very very beneficial at many aspects (well, not every aspect). i won’t go in details about what are the advantages of this technique, but i would recommend checking the article written by János of Wicked Engine quite sometime ago about the implementation of similar technique in Wicked Engine.

This compute pass starts with a copy of buffers, i believe it’s the data from previous frame, so it can be utilized with current frame to (shortly after that pass) calculate velocities that will be needed later for many things such as post processors like TAA or Motion Blur.

With a good large list of buffers (Skinning Matrices, Instance World Info, Bindless Redirect Table, Input Byte Buffer, Weight List) the compute takes the params below to kick in the job (the buffers used vary based on the active sub-step of the skinning pass step).

CB CS Skinning

CB CS Blend Shape

CB CS Calculate Normal

Seems we off to a good start with that engine already!

GPU Particles [Compute]

Right away after GPU skinning (always) the GPU particles simulation updates kicks in. Yet this is only to simulate movement & updates. This is only one type of particles in the game, and yet it’s only sim, no rendering. Rendering this GPU particles (as expected) is happening very late, just right before post processing.

This happens in 3 different phases
– Calculate Node Stretch Billboard
– Single Emit[s]
– SingleUpdate[s]

For the purpose of sim-ing those GPU particles, a good respectful amount of buffers (Constant and UAVs) passed to all the dispatches of that (fairly long) compute pass.

Node Billboard Particles

Node Billboard Constant

Emitter Instances

Emitter

Extend Particle Vertex Constant

That is the particle related stuff, but there are also forces and environment related forces

CB Vortexel Turbulence

Vector Field

Global Vector Field Data

Global Vector Field

Field Attribute

And of course there are some 3d textures used for the Particle Flow, Velocity map, and Global Vector Field values (can say it’s a representation of a 3d volume in the 3d world).

Particle Flow 3DTexture – 64*64*64 – BC1_UNORM

Among other UAVs such as ParticleStructures, ParticleProperties, GPUParticleBuffer, which i was not able to fetch further info about.

Light Culling [Compute]

RE Engine is based on a “Clustered Deferred” rendering system in order to server the level design requirements and the “Metroidvania” play style of keep going back and forth between the rooms. And hence, light culling became important to the developers, which can only be made possible through “Clustered”. Not only that, but also adopting such a system will keep the quality bar high & consistent across all the platforms (previous gen & current gen), while allowing the game to run on stable (and good) framerate on things such as PSVR.

Light Info

Light Parameter SRV

Light Frustum UAV

Light Culling Param SRV

Scene Info

That’s not only everything, there are also buffers that were hard to fetch anything beyond their names, such as LightSphereSRV, LightCullingListUAV, LightCullingListCountUAV.

The final output of that compute pass is stored for later use, it is stored in a light culling volume (UAV) 3DTexture that is 32*80 with a 32 slices of the format R32_UINT. Below how it looks in atlas view (8 slices per row, total 4 rows each 80 pixels h) or animated view.

Histogram Adjustment [Compute] [Not always]

For Auto Exposure purposes, the RE Engine seem to be using a Matrix or Multizone Metering. In that method exposure get prioritized for the “defined” most important parts of the frame. For that purpose a mask is used. And because of the dark mode applied to the majority of the game, you would notice that the mask is quite similar in majority of the frames (if not exactly the same one) and it would seem like a “Vignette”.

I’ll leave below in readings section some good reads about that method that explains it (and other auto exposure methods) in detail, and explaining why it works perfectly for some games. Anyways, for that purpose, params passed to the shader is as follow

Hitogram Adjustment

Haze Compositor Parameter

As you might’ve noticed, majority of the captures (and the game) is relying on similar MultiZoneMetering, except that last frame in the previous images, it uses a pure white. Anyways, to give you a sense of the difference between those frames, the table below includes the shader values for each of the frames.

Values Table

The values in the table might make no sense right now, but those are the values used later(bright_rate, dark_rate & whtie_range) during the Tone Mapping step.

Totally Not Important Note

GPU Cluster Culling [Compute]

Yet another important pass that is a compute pass with nothing to show..As the name implies, it’s GPU cluster culling for the world content by doing frustum tests…a lot of them (or as the engine refer to it as MiniClusterFrustumTest)

Culling World

Instance Bounding Buffer

GPU Volume

Occlusion Culling (NDC)

A quite useful and long depth pass of draws that is doing a form of Depth based Culling (it seem to be a hybrid between Coverage Buffer Occlusion Culling, Depth Tile Culling & Depth Pyramid Culling”HiZ“), it seems to be taking inspiration from each type in one aspect. It takes place in two phases, the 1st and longest one is the depth testing, where basically the engine will be render occluder to zbuffer , the renderer refers to this step as MiniClusterOcclusionTest. Then the 2nd step is where the engine test AABB of entities around the map (or what left of the cluster culling to be more accurate) against zbuffer for occlusion culling, which possibly referred to as CullVolumeOccluder step.

Testing & drawing occluders of course is not happening in a sudden, and it takes quite sometime to do one by one.

The reason i did not bias more towards saying it’s a “Depth Pyramid Culling”, is the fact that the occluders depth target is not having any mips as it is known for that type of depth occluding, instead it store different values in the different samples! But also the fact of the existance of those “accumulateion” samples, i couldn’t either say it is 100% “Coverage Buffer Culling” or “Depth Reprojection”.

So, is it 4 samples instead of 4 mips!

Also keep in mind that the Occluder rendertarget above is “upscaled” for the sake of demonstration. But in reality, it is fairly small, the version at the side is the real one. i do prefer to always keep a linearly upscaled version, because it makes more sense for me when i think viewport “tile” wise.

If those occluder rendertargets make no sense for a moment, just compare them with the final swapchain, and then they make perfect sense. Or perhaps the gifs below makes it more clear.

Anyways, that’s not the first or the last inspiration from Crytek & CryEngine you would see today!

Instancing [Compute]

Instancing is a two distinctive phases process, the heavy part comes at first in this compute pass, followed by the drawing part in fragment shader as part of the next pass. The entire process can be referred to as “Compaction”, which is taking place in 3 sequential steps (compute shaders)
– Instancing Compaction
– Mini Cluster Compaction
– DrawIndirect Argument Fill

Just prepare instancing data/buffers through instance draw & multi draw structs, the interesting take here, it takes place for the deferred light sources as well, in order to prepare to the deferred pass coming shortly.

IDI

MDI

Light Parameter

Indirect Drawing

This drawing pass is not dedicated for instances only, it is a drawing pass for everything, but also here the 2nd part of the instancing process (which is drawing instances/patches) is taking place between other none-instancing draws.

Nothing better than the supermarket level to demonstrate instancing. The super market while it’s chaotic, but it’s full of many instances of objects, starting from lighters, soda, medicines, chocolate, gum, to milk bottles, to cereal boxes,….and so on. Those draws are patched in several groups of ExecuteIndirect, each is a combination of several IndirectDrawIndexed.

Instancing is every time a group of “same” object shows up, this is an ExecuteIndirect draw

While this pass could seem to be dedicated to “drawing instances” as it is following right away the instancing compute, but drawing instances is not happening all at once in a dedicated pass, drawing process is a mixed bag of instances and normal unique objects draw. So don’t get confused from the gif. Instances are the ones that show up in batches, where single objects are usual normal draw (this including Leon as a GPU skinned object).

Totally Not Important Interesting Note

GBuffer/Deferred

Geometry

When i started “casually” looking into RE games frame captures, i was not very keen that i’ll proceed in that or not. it happens many times that i lose interest. But because the last few games i checked (2 of them made it to articles already) i was not a big fan of the GBuffer utilization. It’s either a big-fat GBuffer, or it is manageable amount of targets, but most of them have solid black unused channels. Either way, those GBuffers was not resulting that WOW visual quality for the final frames. BUT…and it’s a big BUT, as soon as i saw the GBuffer for RE games, i decided to proceed with the captures till the end.

GBuffers in RE Engine based games are well utilized, very small amount of targets, well planned, and every channel is used. Not only that, with that 3 targets in the GBuffer (which is in good formats as well) the engine is able to deliver a very impressive visual quality for the final frames…

The one output i called VxyAoSss, is basically VelocityXY (R & G channesl) + AO (B channel) + SSS mask (A channel). Because that frame above doesn’t’ have much movement, in fact it have almost none, here is the breakdown of that rendertarget from a more dynamic frame

The Modified GBuffer render target is just brilliant! It tracks between prev & current frame, and possible ignore lighting some pixels..possibly.

The well utilization for rendertargets is not GBuffer only, just keep that note in mind…Pretty much every single and every channel is well occupied! You’ll notice that frequently!

Drawing for the deferred GBuffer taking very few inputs per object, only Color, normal & attributes/properties. In addition to a BlueNoise16 most of the time to be used for dithering when needed.

And of course, almost pretty much everything draws into the GBuffer, is drawn with “wetness” in consideration.

That later normal map (and it ‘s alpha) is animated through texture slices, so 32 slices makes 32 frames of animation, so wetness can be dynamic when needed.

Hair

Hair drawing usually comes as the last step of the deferred pass (usually, but not always, in fact in the previous Leon’s frame, his hair was drawn at the very start of the frame before most of the supermarket items, where his body drawn at the far end of the frame). And to be honest, i’m having hard time to believe that this EPIC BEARD is just made of couple of layers of hair cards… the amount of details and the individual strands looks outstanding & unique!!!

Decals

Decals is the last thing to draw in the deferred’s GBuffer color pass, step by step using a quad per decal (if not complex). Some of those decal quads are already combined together during the instancing step. So it’s quite popular to find a whole punch of decals show at one draw. Anyways, decals are in two different fashions

GBuffer Decals

GBuffer decals comes first, all of them, those are drawn to the GBuffer in a form of flat planes (all instanced already)

From those few draws of dirt on walls (perhaps hard to spot from this camera view), to graffiti, and even the street gutter. The PBR ready textures below used (color, normal & properties).

And of course, for better way to spot the deferred decals painting, a step by step is always better!

Volume Decals

Let’s take another example, that frame below

This example frame includes actually the previous type of decals (GBuffer Decals), as well as the “Volume Decals”. The Volume Decals are projected using a volume as the name implies, which is cubes most of the time. So, to better differentiate about the GBuffer Decals and that Volume Decals in that frame, it’s enough to see the mesh being projects

And of course, for such a frame, each type of decals using it’s own texture set

So, as an example containing both types of decals, this is how decals goes for the frame. Gbuffer first, then the Volume Decals

And if you didn’t notice how instancing & batching is doing great job for decals in RE Engine, perhaps more blood and foot steps will be a good evidence. It’s just 2 draws for everything! (much better than something like Elden Ring for example)

Far Cubemap

A cubemap draw on a plane behind everything in the horizon (bizarre) , that phase is referred to as “CubemapFarPlane2D“. It’s a single draw get’s an IBL2D texture. And because of the fact of being a drawn cubemap texture at the far horizon, won’t see any presence in the frames taking place in an interior area. At the other hand for example for an exterior area frames, it’s very obvious the difference.

Cubemap Setting

Fill Velocity [Compute]

While it is very very important step, and needed not only for Anti-Aliasing, but also anything that requires velocity down the road. Apart from the input & output of this stage, there isn’t clear indication of the process steps or the used params. We just end up with the X & Y velocity values in the R & G channels of the what-so-called VelocityXYAoSss rendertarget of the GBuffer.

SSSSS [Compute]

Using the Depth + the SSS attribute in materials/shaders, can write to GBuffer, a “mask” to represent the SSS applicable areas (need that mask for later at deferred pass) through a compute of “FastScreenSpaceSubsurfaceScattering

HI-Z Tracing [Compute] [Not always]

HI-Z Tracing is the method used for the Screen Space Reflections (SSR). Here the engine would generate the HI-Z (Hierarchical-Z buffer) which is needed later down in this page for SSR view generation. This is not always taking place, as SSR itself is not always part of a frame, as explained later that SSR only take place in some areas where (possibly) controlled by SSR volumes or such to define when to trace & render SSR and when to skip.

7 dispatches for that compute, each against one of the mips of the “Linear Depth” that came out from the “Fill Velocity” compute step above. Starting from highest mip level – 1, start takingthe the min or max of the 4 neighboring values from the original Z-buffer, and then keep it in a smaller buffer at 1/2 size

Hi Z Generate Info

AO

There are multiple ways to deliver AO in RE Engine, something to fit every user! By default there is SSAO (Screen Space Ambient Occlusion), but there is also a limited version of the SSAO called SSAO (Set Areas Only), where the effect is not everywhere all the time. There is also HBAO+ (NVIDIA Horizon Based Ambient Occlusion) as well as CACAO (AMD FidelityFX Combined Adaptive Compute Ambient), and of course there is the option for no AO at all (just disabled). i decided to go further than single configuration (which usually SSAO), and decided to take some captures to take a closer look at each of the 3 main types available in the game in macro details. So we can check the final quality of the different AO methods and there impact on a frame, as well as check some of the workload behind them.

HBAO+ (CoarseAO) [Not always]

Only when the AO method is set to HBAO+ (aka CoarseAO).

HBAO+ entirely taking place in fragment shaders invocations, and it happens in a few distinctive steps (6 to be exact)

1 – HBAO Plus Linear Depth

The first step is to convert the depth into linear depth (format change)

2 – Deinterleave Depth

Taking that linear depth from the previous step, and make a 16 slice/cascades version of it

And of course, because the new linear depth is 1/4 and in 16 slices, if put all together, it will be a big image (atlas) that is the same size as the original linear depth.

HBAO Plus Per Pass Constants

3 – CoarseAO

AO generation in multiple passes (16 sequential draws) using the normals and the 1/4 res deintereaved depth (with it’s 16 slices), in order to draw the output HBAO jitters in form of slices of the texture. Generation happens with a unique jitter value per pass.

Because the output is 16 slices, so in an atlas view of 4 slices per row, it would look like the atlas below.

CB HBAO

4 – Reinterleave AO

Using the Linear Depth we had originally at the first step of the HBAO+, with the help of the AO output we got from the previous step, we can end up with the what so called “AO Depth Texture” rendertarget.

The AO is only occupy the R channel only.

Also as you might’ve noticed that the output here is a rendertarget made of 16 slices, but only the first slice is occupied, and other 15 slices are filled in black. Which is not good, and it is an area of improvement.

5 – Blur X

Taking the AO Depth Texture, a blurring pass on the X axis takes place

Same case here, not only regarding making use of single channel, but also the rendertarget resulted by Blur X is made of 16 slices, but only the first slice is occupied, and other 15 slices are filled in black. Which is not good as well.

6 – Blur Y

In the last step, blurring takes place on the Y axis, and finally adding the AO (R channel) to one of the accompanied rendertargets of the GBuffer, which i referred to earlier as VelocityXYAoSss (in it’s B channel)

So basically, to make is clear what does the last few steps are exactly doing without the needless fancy colors, it’s something like that

SSAO [Compute] [Not always]

Only when the AO method is set to SSAO.

SSAO is yet RE’s least AO method in terms of computing and steps, it is done through through 3 distinctive steps that is all taking place in compute.

1 – Interleave Normal Depth

The 1st step takes the Depth + the NormalXY & generate the “Compressed Depth Normal” or “Normal-Depth Map” rendertarget, names could vary regarding where you learn it! This is basically “view space” normals & depth values for each pixel is drawn to that new rendertarget.

2 – Interleave SSAO

The SSAO shader itself. It calculates an “ambient access value” for every pixel (AO Image), so it can be used later for the lighting calculation.

SAO AO

3 – Interleave Resolve

Same as HBAO+, we do scale & blurring at this step, because as you noticed we used several samples (the 4*4 from 1st step), which resulting in a little noisy AO Image. This could be solved earlier by taking more samples, but it’s not wise choice, it’s cheaper to take less samples, and then blurring than taking twice the samples count.

And of course, after doing do, composite that AO Image into the B channel of a rendertarget that is already accompanied with the GBuffer…the VelocityXYAoSss.

So basically, also without the fancy colors, this is the in & out of that last step (blurring the ugly AO Image)

CACAO [Compute] [Not always]

Only when the AO method is set to CACAO.

CACAO is quite a process, it takes a fairly large number of steps compared to the previous 2 methods. And it is entirely taking place in compute.

1 – Prepare Downsampled Depths And Mips

2 – Prepare Downsampled Normals From Input Normals

3 – Generate Q3 Base

This step run 4 times, once per jitter slice in order to generate the AO importance.

4 – Generate Importance Map

5 – Postprocess Importance Map A

6 – Postprocess Importance Map B

7 – Edge Sensitive Blur 1 (Generate Q3)

This step run 4 times, once per jitter slice in order to generate the Edge Sensitive Blur.

8 – Edge Sensitive Blur 2

Just as the previous step, this step run 4 times as well to do almost the exact same thing.

9 – Upscale Bilateral 5×5 Smart

i like how consistent it is…RE always end up storing the AO in the VelocityXYAoSss’s B channel regardless the used AO method!! Tidy, Consistent & Cool!

There is only one buffer/struct of params passed around to all dispatches across the entire process. Ironically the CACAO’s constant buffer referred to as SSAOConstantBuffer where the struct name itself is CACAOConsts!

SSAO Constants Buffer

Particle Depth Emitters [Compute]

Depth based colliding particles. Nothing visual to see here except the Depth image, which been seen a thousand time already. But there are some interesting buffer & structured buffers going around the dispatches such as EmitterInstances, ParticleStructures, ParticleProperites, and ItmesBuffer.

Emitter Instances

Indirect Illumination [Compute]

Indirect Illumination, aka Global Illumination, aka GI. Nothing fancy to highlight, except the cubemaps!

Below are few examples of GI phase

Totally Not Important Note

Cubemaps leaving some questions.
First, why an exterior frame have totally black cubemap where an interior frame have usual cubemap sky?
Second, why not using one Cubemap input?
Third, let’s assume there is a strong reason behind two different cubemaps for IBL/GI/Lighting, why on earth making a full 128 slices/entries, where only need to use maximum about 20?

Apart from that, here are most of the params passed to the dispatches during all the GI phases

Environment Info

Light Info

Checker Board Info

Tetra Coordinate

Sparse Light Probes List

Shadow Pass[es]

A depth only shadow pass for direct light sources. Usually this is a single pass at most cases, but there are cases (such as “some” cinematics) where it is more than a single depth pass (usually 2, but can go more). The one distinctive thing in this pass (which i like) is that it is entirely using Indirect Draw for the entire pass, which was not the case in other games (let’s say with Elden Ring which was using DrawIndexedInstansed || DrawInstanced for the shadow passes). The other interesting thing here, is that where many games tending to store in a big atlas, RE is tending more to store those in a 3DTexture of 32 Slices with fixed size of 2048*2048 in the format R32_TYPELESS…FAN!!!!!

Here are some examples for the different shadow passes cases

The reason i liked it like that, it is very organized, very friendly and very easy to debug & to optimize if needed. i agree that “all roads lead to Rome”, but why take a bumpy road if you can take a good smooth one!
While i do like what we’ve in hand here, but at the same time, i see the >1 passe in some cinematics is not necessary, and can be a good area of improvement. You can barely notice the difference between the different outputs.

Lighting Billboard Particles [Compute]

A quite long compute pass, that is dedicated to the “PreCalculateLighting” and then “CalculateNodeStretchBillboardLighting” and “CalculateGpuBillboardLighting” for the particles. No fancy inputs here, all inputs we are yet familiar with (same outputs & inputs from the GI phase) are inputs in this phase.

Light Parameter SRV

Shadow Parameter SRV

Tetrahedron Transform

Among other structs that was already mentioned previously with particles (Emitter, EmitterInstance, ParticleStructure) and GI (BSP, BVH, SpareLightPropesList,…etc.).

Deferred Pass

1.Deferred Lighting

Just deferred fragment shader that does what a deferred frag shader does!

2.Deferred Projection Spotlight [Not always]

This is not always present, it depends on the current map/level and the existence of spot lights. There is no difference, it just makes the shader invoke couple more times to process spotlights contribution.

3.Fast SSSSS Apply

This step in reality is not actually a part of the Deferred pass, but i wanted to list it here as part of the deferred pass for 2 simple facts, 1st it takes place right away after the deferred invocations. 2nd, without the Fast-sssss, the output of the deferred pass is not actually a complete output, and some pixels (skin & such) is still missing for color info, or perhaps better to say it “still missing for some light calculations”, and the completion of this Fast-sssss for me means the completion of the deferred pass.

Step A – Apply SSSSS

Step B – Composite With Deferred Output

SSR [Compute] [Not always]

RE Engine is utilizing (what i believe) a Hi-Z Screen-Space Cone-Traced Reflections, which is quite interesting technique, if you are more into beyond the sneak peak below, feel free to check Yasin Uludag’s paper here, or in the reading section by the end of this article.

SSR steps are done only in some areas, as if there is “SSR volumes” or a per “Room/Map” settings that can control that feature. So rendering graph won’t always have this SSR series of dispatches. Those dispatches vary between regular Dispatches as well as Indirect Dispatch. SSR pass goes into the following steps

1 – SSR Indirect Clear

Clear buffers. Nothing interesting, but its essential for a clean start.

2 – Small Diffuse Default

Using the GBuffer’s attachments (Color, Metallic, Normals, Roughness) in addition to the Gl Specular from the Indirect Lighting phase, and the Hi-Z output from earlier and the Diffuse Image from the Deferred pass, the shader sometimes will keep other rendertargets around, but they do nothing. But this is only “sometimes”. The output will be new diffuse version that is Filtered Diffuse (8 mips).