Diagnosing a level's performance problems and tuning it to run more smoothly
Is the game running very slowly when you try to play your level? Are your players or playtesters complaining about lag, low framerate, or long load times? If so, then maybe it's time to optimize your level to try to make it run more efficiently.
Optimization is the process of diagnosing problems and streamlining performance across a game or level. It is often very complicated and specific, and is far out of the scope of this level design book. Much like mapping, the only real way to get better at optimizing is to optimize a lot of games, and learn from your experience.
When working on a team, much of these performance considerations will stretch far beyond the level designer's responsibilities.
For example, if pathfinding two NPCs brings the game down to 5 FPS, then that optimization is the engine / AI programmer's job. However, if the level designer is attempting to create a massive battle with dozens of NPCs, and the game was never supposed to support that type of encounter, then clearly the level designer might be more at fault.
Large commercial studios hire tech artists and tech level designers, whose primary responsibilities may involve optimizing art assets and levels. But on a smaller dev team, optimization is basically everyone's responsibility.
"Not too early, and not too late."
Optimizing too early is "premature optimization" and it is a waste of time because your project will change drastically. Why optimize something the player may never even see?
But how will you know if something will be in the final version? Probably only when it works well... and sometimes you can make that judgment call only after you try to optimize it.
Ideally, optimization is something you plan from the beginning. As early as pre-production, you should be able to answer these questions about your project's technical scope:
- What is the target platform? For example, making an open world game for a phone is much more complicated than for desktop gaming rigs.
- How big is the level? A large open world city or RPG continent will need to be built in smaller chunks that are carefully loaded / unloaded during the game and blockout accordingly.
- How dense is the level? A realistic outdoor world might require a lot of expensive overlapping details, while a stylized handpainted world might be more sparse and simplified.
- How open is the level? A narrow indoor level with frequent gates can make better use of occlusion culling, while a big open outdoor landscape is difficult to divide into segments.
- What is the camera system? For example, a first person camera can see long sightlines, so dense levels will likely need some kind of occlusion culling. A third person fixed camera angle means you have strong control with frustum culling, and if a particular camera angle is laggy, then you can simply reposition the camera or turn the camera away.
Video games run on computers with limited resources. Laggy low framerates happen when your project is overusing its budget for one or more of these resource types:
- Games are made of files are loaded into memory. When you start the game, it moves core files into a faster type of short term memory called RAM. If there's too many files to load into RAM, then load times will be very long.
- If your level uses lots of different detailed models, textures, sounds, or animations, then it might be overrunning its memory budget.
- The CPU is the main part of your computer that runs the file system, controls, game logic, AI, and audio. If you have too many NPCs who are navigating a big complicated city, then maybe the CPU has to work too hard to simulate all their AI. Or imagine you had thousands of insects in the virtual city, and you were constantly creating / deleting insects every second.
- If your level features many different simulated objects, then it might be CPU-bound.
- The GPU (Graphics Processing Unit) is responsible for rendering an image on the screen. It must gather all the visible 2D sprites and 3D models and process them with a special GPU program called a shader. When there's too many objects to draw or when the shader is very complicated, then the GPU takes longer to render the camera view.
- If your level features visually dense objects, or even thousands of simple objects, then it might be GPU-bound.
When talking about budgets, game developers extend this financial metaphor:
- Expensive = consumes a lot of memory or processing time
- example: "Wow, that shader makes our game run at 10 FPS, it's so expensive!"
- Cheap = uses very little memory or processing time
- example: "When the player is 10 meters away, we swap to a cheap water shader."
A game can run slowly for any number of reasons: memory, CPU, GPU, etc. To figure out what part of a game is responsible for the slowdown, we must measure what the game engine is doing -- this is called profiling.
Never try to optimize on faith or a hunch, because you could be wrong about what's making the game run slowly. Profiling should be the first step of any optimization pass! Fortunately, modern game engines feature a robust suite of easy-to-use profiling tools.
- if you're CPU bound, use the Profiler and look for which code functions are running very often or seem to be time-consuming; if you're not a programmer, you should talk to one
- if you're GPU bound, open the GPU Profiler or even the Frame Debugger, and try to figure out if too much is rendered at the wrong time or place, or if culling / batching / instancing is breaking somewhere
- in Unreal, go to Project Settings → General Settings → Framerate and disable Smooth Framerate, which snaps framerate to certain increments and will mislead you
- while playtesting, in the console type
stat unitgraphto show basic CPU and GPU times
- if you're CPU bound, you may need to use Unreal Insights, maybe you have too many actors or some of your Blueprint calls are expensive
- if you're GPU bound, in the console type
stat scenerenderingand see where your draw calls and framerate are going
(TODO: Unreal image)
Unless there's a lot complicated gameplay systems or scripting happening, most game levels are usually GPU-bound: there's too much work for the graphics card to perform.
When profiling GPU performance, the main stat to watch is the draw call, a rendering command sent to the GPU. More draw calls is usually worse. You want this number smaller.
Approximate draw call budget
Old mobile phones, Mobile VR (Oculus Quest)
Modern mobile phones
Low spec desktop, VR desktop
PS4 / XB1 / Switch
Mid spec desktop
PS5 / Xbox Series X
High spec gamer desktop
You need to optimize now
To reduce draw calls, engines implement some form of (a) batching or (b) instancing.
Batching is when the CPU combines multiple draw calls into a single draw call.
Imagine a level with 10 different trees; we can combine them all into one mega tree, and send it to the GPU as one draw call. This method usually requires some pre-calculation and mindful construction, but it works across multiple different meshes and objects.
Instancing is when the GPU repeats the same draw call multiple times.
Imagine a level with 10 identical trees; we can tell the GPU to repeat the same rendering work for each tree. This method requires little level design work or prep, but it only works for exact copies of the same mesh. Great for a modular workflow, which uses many copies of the same modular kit tiles.
In game dev, "instance" also means "copy" or "clone" of any game object or data. Don't confuse this general usage of instance vs. GPU instancing.
lights, turn off shadows... why shadows are so expensive / extra draw calls
Batching shared materials: texture atlas is a large composite texture made of many smaller textures, usually arranged in a square grid. Much like a 2D spritesheet made of many parts, "packing" many textures into a single larger texture is a useful optimization to reduce draw calls and efficiently use memory. The drawback is that the resulting assets are less flexible to use (UVs correspond to a small area, textures cannot be tiled easily) or might even degrade performance if this manual optimization
distance culling, frustum culling, LODs... but if you have lots of objects, overhead of LOD isn't worth it (Naughty Dog)
vis portals, manual culling
- opaque shaders are almost always better than alpha transparency... modeling out a complex shape is better than alpha mask transparency, the extra polycount is negligible for the draw call or batching anyway
- what is alpha test + on most hardware, alpha test is much faster than alpha blend... smooth jagged edges with alpha to coverage
- on some mobile devices, alpha blend is faster than alpha test
- for foliage, fit geo to the opaque alpha + merge multiple meshes + minimize overlap + use opaque core + avoid using generic object data structure (don't hand place each grass clump)
spawn less AI
TODO: particles and transparency
Forward rendering means the graphics card (GPU) processes each object in a straightforward manner, rendering and lighting every 3D object separately. Here you should usually avoid overlapping lights, and use as few light sources as possible.
Deferred renderers delay lighting calculations until after it collects all the objects together, and then lights all visible pixels at once. Here you can use many overlapping lights and sources, but transparent objects might look bad.
Engines often use both techniques at the same time, or mix and match different parts.
(image: forward vs deferred)
dev view of spheric harmonic (SH) light probes for "New York City" in Overwatch 2, image by Bruce Wilkie (via PlayOverwatch.com)
Optimizing games is a complex process that involves every discipline and developer. It is far outside of the scope of this book. Nevertheless, we've tried to give a basic idea of optimizing 3D levels in common engines like Unity and Unreal:
- 1.look at performance stats / profilers to see what's slow, DO NOT JUST GUESS
- 2.if it's CPU bound, maybe there's too many active objects or NPCs, or the level scripting is misbehaving, or the level is just too big
- reduce the number of active objects or NPCs; redesign encounters
- debug your level scripting; ask others for help
- this also might be a general game code problem, and nothing to do with you
- 3.if it's GPU bound, maybe there's too many lights or env art objects, or the materials / shaders are too complicated, or the level is just too big
- reduce lights, use simpler lighting modes, turn off shadow casting
- combine multiple objects with the same material using batching
- combine multiple objects with the same material and mesh using GPU instancing
- consider using some sort of occlusion culling / visibility system
- simplify materials, use fewer texture samplers and rendering commands
- this also might be a general game code problem, and nothing to do with you
- "Optimizing and Profiling Games with Unreal Engine 4" (2016) by Vincent Loignon is an old article but still a useful intro to the general optimization process.
- "UE4 - Overview of Static Mesh Optimization Options" (2017) by Bob Cober is also a bit old but very useful for environment artists and level designers in particular.
- "Unreal Art Optimization: Measuring Performance" is part of a work-in-progress tech artist book specifically for Unreal devs.
- "UE4 Graphics Profiling: Measuring Performance" (YouTube) by Tech Art Aid is maybe the most useful intro video tutorial about profiling in Unreal.
- GDC 2019: "Porting Your Title to Oculus Quest" (YouTube) by Alex Silkin seems like a VR dev talk, but actually it's more like an Unreal performance optimization talk. Silkin discusses specific workflows and bottlenecks applicable for any Unreal game.