Optimization

Diagnosing a level's performance problems and tuning it to run more smoothly

Is the game running very slowly when you try to play your level? Are your players or playtesters complaining about lag, low framerate, or long load times? If so, then maybe it's time to optimize your level to try to make it run more efficiently.

What is optimization?

Optimization is the process of diagnosing problems and streamlining performance across a game or level. It is often very complicated and specific, and is far out of the scope of this level design book. Much like mapping, the only real way to get better at optimizing is to optimize a lot of games, and learn from your experience.

Whose job is it?

When working on a team, much of these performance considerations will stretch far beyond the level designer's responsibilities.

For example, if pathfinding two NPCs brings the game down to 5 FPS, then that optimization is the engine / AI programmer's job. However, if the level designer is attempting to create a massive battle with dozens of NPCs, and the game was never supposed to support that type of encounter, then clearly the level designer might be more at fault.

Large commercial studios hire tech artists and tech level designers, whose primary responsibilities may involve optimizing art assets and levels. But on a smaller dev team, optimization is basically everyone's responsibility.

When to optimize

"Not too early, and not too late."

Optimizing too early is "premature optimization" and it is a waste of time because your project will change drastically. Why optimize something the player may never even see?

But how will you know if something will be in the final version? Probably only when it works well... and sometimes you can make that judgment call only after you try to optimize it.

Planning for optimization

Ideally, optimization is something you plan from the beginning. As early as pre-production, you should be able to answer these questions about your project's technical scope:

  • What is the target platform? For example, making an open world game for a phone is much more complicated than for desktop gaming rigs.

  • How big is the level? A large open world city or RPG continent will need to be built in smaller chunks that are carefully loaded / unloaded during the game and blockout accordingly.

  • How dense is the level? A realistic outdoor world might require a lot of expensive overlapping details, while a stylized handpainted world might be more sparse and simplified.

  • How open is the level? A narrow indoor level with frequent gates can make better use of occlusion culling, while a big open outdoor landscape is difficult to divide into segments.

  • What is the camera system? For example, a first person camera can see long sightlines, so dense levels will likely need some kind of occlusion culling. A third person fixed camera angle means you have strong control with frustum culling, and if a particular camera angle is laggy, then you can simply reposition the camera or turn the camera away.

Performance budget

Video games run on computers with limited resources. Laggy low framerates happen when your project is overusing its budget for one or more of these resource types:

  • Games are made of files are loaded into memory. When you start the game, it moves core files into a faster type of short term memory called RAM. If there's too many files to load into RAM, then load times will be very long.

    • If your level uses lots of different detailed models, textures, sounds, or animations, then it might be overrunning its memory budget.

  • The CPU is the main part of your computer that runs the file system, controls, game logic, AI, and audio. If you have too many NPCs who are navigating a big complicated city, then maybe the CPU has to work too hard to simulate all their AI. Or imagine you had thousands of insects in the virtual city, and you were constantly creating / deleting insects every second.

    • If your level features many different simulated objects, then it might be CPU-bound.

  • The GPU (Graphics Processing Unit) is responsible for rendering an image on the screen. It must gather all the visible 2D sprites and 3D models and process them with a special GPU program called a shader. When there's too many objects to draw or when the shader is very complicated, then the GPU takes longer to render the camera view.

    • If your level features visually dense objects, or even thousands of simple objects, then it might be GPU-bound.

When talking about budgets, game developers extend this financial metaphor:

  • Expensive = consumes a lot of memory or processing time

    • example: "Wow, that shader makes our game run at 10 FPS, it's so expensive!"

  • Cheap = uses very little memory or processing time

    • example: "When the player is 10 meters away, we swap to a cheap water shader."

Profiling

A game can run slowly for any number of reasons: memory, CPU, GPU, etc. To figure out what part of a game is responsible for the slowdown, we must measure what the game engine is doing -- this is called profiling.

Never try to optimize on faith or a hunch, because you could be wrong about what's making the game run slowly. Profiling should be the first step of any optimization pass! Fortunately, modern game engines feature a robust suite of easy-to-use profiling tools.

How to profile a level in Unity

  • playtest with the Stats panel in the Game tab for a basic direction for where to look

  • if you're CPU bound, use the Profiler and look for which code functions are running very often or seem to be time-consuming; if you're not a programmer, you should talk to one

  • if you're GPU bound, open the GPU Profiler or even the Frame Debugger, and try to figure out if too much is rendered at the wrong time or place, or if culling / batching / instancing is breaking somewhere

How to profile a level in Unreal

  • in Unreal, go to Project Settings β†’ General Settings β†’ Framerate and disable Smooth Framerate, which snaps framerate to certain increments and will mislead you

  • while playtesting, in the console type stat unitgraph to show basic CPU and GPU times

  • if you're CPU bound, you may need to use Unreal Insights, maybe you have too many actors or some of your Blueprint calls are expensive

  • if you're GPU bound, in the console type stat scenerendering and see where your draw calls and framerate are going

(TODO: Unreal image)

Optimizing graphics

Draw calls

Unless there's a lot complicated gameplay systems or scripting happening, most game levels are usually GPU-bound: there's too much work for the graphics card to perform.

When profiling GPU performance, the main stat to watch is the draw call, a rendering command sent to the GPU. More draw calls is usually worse. You want this number smaller.

Device class

Approximate draw call budget

Old mobile phones, Mobile VR (Oculus Quest)

50-100

Modern mobile phones

100-400

Low spec desktop, VR desktop

500-1000

PS4 / XB1 / Switch

~1000

Mid spec desktop

~2000

PS5 / Xbox Series X

~3000

High spec gamer desktop

~5000

You need to optimize now

~10000+

To reduce draw calls, engines implement some form of (a) batching or (b) instancing.

Batching is when the CPU combines multiple draw calls into a single draw call.

Imagine a level with 10 different trees; we can combine them all into one mega tree, and send it to the GPU as one draw call. This method usually requires some pre-calculation and mindful construction, but it works across multiple different meshes and objects.

see Unity: Static Batching, Unreal: Hierarchical Level Of Detail, Godot: Batching

Instancing is when the GPU repeats the same draw call multiple times.

Imagine a level with 10 identical trees; we can tell the GPU to repeat the same rendering work for each tree. This method requires little level design work or prep, but it only works for exact copies of the same mesh. Great for a modular workflow, which uses many copies of the same modular kit tiles.

see Unity: GPU Instancing, Unreal: Hierarchical Instance Static Mesh

In game dev, "instance" also means "copy" or "clone" of any game object or data. Don't confuse this general usage of instance vs. GPU instancing.

HOW TO OPTIMIZE DRAW CALLS

lights, turn off shadows... why shadows are so expensive / extra draw calls

Batching shared materials: texture atlas is a large composite texture made of many smaller textures, usually arranged in a square grid. Much like a 2D spritesheet made of many parts, "packing" many textures into a single larger texture is a useful optimization to reduce draw calls and efficiently use memory. The drawback is that the resulting assets are less flexible to use (UVs correspond to a small area, textures cannot be tiled easily) or might even degrade performance if this manual optimization

distance culling, frustum culling, LODs... but if you have lots of objects, overhead of LOD isn't worth it (Naughty Dog)

occlusion culling

vis portals, manual culling

shader optimization:

  • opaque shaders are almost always better than alpha transparency... modeling out a complex shape is better than alpha mask transparency, the extra polycount is negligible for the draw call or batching anyway

  • what is alpha test + on most hardware, alpha test is much faster than alpha blend... smooth jagged edges with alpha to coverage

  • on some mobile devices, alpha blend is faster than alpha test

  • for foliage, fit geo to the opaque alpha + merge multiple meshes + minimize overlap + use opaque core + avoid using generic object data structure (don't hand place each grass clump)

spawn less AI

Fill rate

TODO: particles and transparency

Lighting

Forward rendering means the graphics card (GPU) processes each object in a straightforward manner, rendering and lighting every 3D object separately. Here you should usually avoid overlapping lights, and use as few light sources as possible.

Deferred renderers delay lighting calculations until after it collects all the objects together, and then lights all visible pixels at once. Here you can use many overlapping lights and sources, but transparent objects might look bad.

Engines often use both techniques at the same time, or mix and match different parts.

(image: forward vs deferred)

To review...

Optimizing games is a complex process that involves every discipline and developer. It is far outside of the scope of this book. Nevertheless, we've tried to give a basic idea of optimizing 3D levels in common engines like Unity and Unreal:

  1. look at performance stats / profilers to see what's slow, DO NOT JUST GUESS

  2. if it's CPU bound, maybe there's too many active objects or NPCs, or the level scripting is misbehaving, or the level is just too big

    • reduce the number of active objects or NPCs; redesign encounters

    • debug your level scripting; ask others for help

    • this also might be a general game code problem, and nothing to do with you

  3. if it's GPU bound, maybe there's too many lights or env art objects, or the materials / shaders are too complicated, or the level is just too big

    • reduce lights, use simpler lighting modes, turn off shadow casting

    • combine multiple objects with the same material using batching

    • combine multiple objects with the same material and mesh using GPU instancing

    • consider using some sort of occlusion culling / visibility system

    • simplify materials, use fewer texture samplers and rendering commands

    • this also might be a general game code problem, and nothing to do with you

Further reading: Unity optimization

Further reading: Unreal optimization

Last updated