a good class - part 3
note: this is part 3 in a series of posts about oop. part 1, part 2.
ok, so the first step in data-oriented design is this one:
from array import array
with this, we will use tightly packed c structures to store our data.
next is this array cheatsheet:
| Code | Range |
|---|---|
| b | [-127, 127] |
| B | [0, 255] |
| H | [0, 65535] |
| L | [0, 4 BILLIONS] |
| Q | you don’t even |
some things worth noting here:
- lowercase letter for signed, uppercase for unsigned (so signed short is ‘h’).
- ‘b’ takes the same space as ‘B’ (1 byte).
- i like to use mostly uppercase.
that’s because one of the strengths of dod is that you can store the index of an entity as a reference to it. and i don’t want negative indexes ever.
some absences worth noting:
- ‘i’ for int has platform-dependent size, which kinda breaks dod’s predictability, so i don’t use them.
- floating points calculations are ‘f’ and ‘d’. try not to use them if you can (if you do physics, you might have to, else just scale and normalize everything).
- python also has ‘u’ and ‘w’ for text characters, but try not to use strings if a human isn’t gonna read it.
now, some things you should really note:
- ‘B’ gives you 256 values. do you realize how big that is? you can probably encode all types of your game monsters in one byte.
- no, really, do you see how insane this is?
- with two bytes, you raise to 65535. you’re mostly covered.
- okok, ‘Q’? [0, 18446744073709551615]. 18 billion billion, motherfucker. eight bytes.
- the cpu cache line is 64 bytes. this means an array of 16 longs can be brought to the cpu in one trip.
so, for example, if you have 10 players in a game, you can encode every one of their properties on 4 billion values, and treat each transformation in one cpu roundtrip.
if you think about it, it’s insane.
ok, now let’s see how i tried to do dod on a real-worl example.
this is from everybody codes 2025 day 10 so go check it out.
i’ll just show the commented code:
from array import array
data = '''...SSS.......
.S......S.SS.
..S....S...S.
..........SS.
..SSSS...S...
.....SS..S..S
SS....D.S....
S.S..S..S....
....S.......S
.SSS..SS.....
.........S...
.......S....S
SS.....S..S..'''
data = data.splitlines() # formatting the data
h, w = len(data), len(data[0])
deepl = {'.': 0, 'D': 1, 'S': 2} # changing chars to int values
# prepare encoding, one array per property
grid = {
'x': array('B', [0] * h * w),
'y': array('B', [0] * h * w),
'cell': array('B', [0] * h * w),
'moves': array('H'), # we need more than 255 for the moves (21*21) so 65535 it is!
'count': h * w # so we don't have to call len() each time
}
data.reverse()
# actual encoding
for i, row in enumerate(data):
for j, cell in enumerate(row):
grid['x'][w*i + j] = j
grid['y'][w*i + j] = i
grid['cell'][w*i + j] = deepl[cell]
if cell == 'D':
grid['moves'].append(w*i + j)
knight_moves = {
# small b cause its signed
'x': array("b", [-2, -2, -1, -1, 1, 1, 2, 2]),
'y': array("b", [1, -1, 2, -2, 2, -2, 1, -1]),
'count': 8
}
dragon_moves = 4
for _ in range(dragon_moves):
# we'll be modifying the array during the loop but we don't even care lololol
start_len = len(grid['moves'])
for i in range(start_len):
index = grid['moves'][i] # the index **is** the entity
x = grid['x'][index]
y = grid['y'][index]
for knight_index in range(knight_moves['count']):
dx = knight_moves['x'][knight_index]
dy = knight_moves['y'][knight_index]
nx, ny = x + dx, y + dy
if 0 <= nx < w and 0 <= ny < h:
test_index = w * ny + nx
if test_index not in grid['moves']:
grid['moves'].append(test_index)
eaten = 0
# i did some tests
# sorting is not that costly
# but can greatly improve performance
# maybe the cpu optimization prefers to read arrays in order
grid['moves'] = sorted(grid['moves'])
for index in grid['moves']:
if grid['cell'][index] == 2:
eaten += 1
print(eaten)
so the key point here is that the index is the entity. this means you can store indexes in an array as a very effective way to reference entity.
in short, you need to become an array master.
some more tips that were revealed to me by prophecy:
- never use booleans, use existence-based arrays instead
- (basically, don’t use an “alive” boolean but make two arrays alive_entities and dead_entities)
- learn how minecraft works
- use generational indexes
- use sparse sets
- (which are swapback with generational indexes)
- learn how minecraft works
- track and optimize your sparse sets
- read the first three chapters of existential programming
- learn how age of empire works
- advanced: learn how to do pathfinding dod-style
- learn b trees and quad trees
- advanced: learn the disruptor pattern
- (which is an ephemeral buffer with a fixed size)
- sparse set + disruptor = very fast
- games are all about reusing buffers
- relations are not a problem with generational indexes
- learn how minecraft works
- advanced: learn profile guided optimization
i should report back on my findings, but you do the homework too.
finally, do remind that everything is not a nail to hammer dod on.
for example, in the above code,
i should have use grid['moves] = set()
instead of an O(n) check every time
(but i’m sure you noticed).
another example is if you have billions of entities (like particles), maybe use seeded shaders over time.
optimize for the right part of the computer.
now that we saw arrays, what if we want to persist the data?
how do you do proper dod with sqlite?
should you use orms?
head on to part 4.