The dataset file format
How a dataset .csv is laid out: four reserved header rows, then the data.
Last updated · 5 min read · Docs / Datasets
How the file is structured
A dataset is a plain .csv file: a grid of comma-separated values. Each column is a single variable — a design input you control, or a result a simulation produced — and each row beneath the headers is one candidate building.
The first four rows are reserved: they describe each column rather than holding data. The actual data begins on row 5 and continues to the end of the file.
The four reserved rows must appear in this exact order, and every column must supply a value for all four. Think of them as the column's declaration: its name, whether it is an input or an output, its data type, and how its values should be interpreted.
The four reserved header rows
| Row | Header | Allowed values | Purpose |
|---|---|---|---|
1 | Name | Short descriptive text | The display name of the column. |
2 | Usage | in or out | Whether the column is an input or an output. |
3 | Type | NUMBER or STRING | The underlying data type of the column's values. |
4 | Unit | Depends on Type (see below) | How the column's values are interpreted and presented. |
- Row 1 — Name
- A short, descriptive name for the column. It should not consist of numbers only, and most non-punctuation symbols are not allowed. Each column must have a name, and a clear one pays off later: the name is what you will see on every axis and filter in the viewer.
- Row 2 — Usage
- Either
inorout(case-sensitive). Useinfor design inputs you control andoutfor the results produced by a simulation. The distinction lets the toolkit treat the levers you set apart from the outcomes they drive. - Row 3 — Type
- Either
NUMBERorSTRING(case-sensitive). This decides whether the column's values are read as numbers or as text, and it constrains which units row 4 may declare. - Row 4 — Unit
- The Unit value depends on the Type declared in row 3, and it must follow the rules for a URL slug. See choosing a unit below for the full breakdown.
Choosing a unit
The unit in row 4 tells the toolkit how to read and present a column's values. Which units are allowed depends entirely on the Type declared in row 3.
Number units
When the column is a NUMBER, the Unit must be the slug of a known unit in the database. Available number units include: fraction, fraction-low-p, fraction-medium-p, fraction-high-p, fraction-max-p, fraction-integer, ratio, u-value, r-value, inch-f, integer, iteration, eci, ghgi, kwh, kg-co2, degrees-c.
String units
When the column is a STRING, the Unit must be exactly one of three reserved values. These are reserved for STRING columns and cannot be used on a NUMBER column:
enum— a small, fixed set of labels (for example a quality rating).graph— a profile encoded as breakpoints, such as a daily heating schedule.image— a reference to a stored asset.
Slug rules
Every unit value is a slug — the same lower-case, URL-safe form used elsewhere in the toolkit. A slug must satisfy all of the following:
- Only lower-case letters (
a-z), digits (0-9), and the hyphen (-). - A hyphen may not be the first or last character (
e-xis fine;-exandex-are not). - Hyphens may not be directly adjacent (
ex-a-bis fine;ex--abis not).
Example
A valid file with four columns. The first four rows are the reserved headers; the data starts on row 5.
| # | Col 1 | Col 2 | Col 3 | Col 4 |
|---|---|---|---|---|
| 1 | HP Fraction | Window Quality | TEDI | Heating Profile |
| 2 | in | in | out | out |
| 3 | NUMBER | STRING | NUMBER | STRING |
| 4 | fraction | enum | eci | graph |
| 5 | 0.25 | Low | 42.1 | [0|12|28] | [290|269|248] |
| 6 | 0.50 | High | 31.7 | [0|14|30] | [285|260|240] |
graph unit encoding a daily profile.Reading the columns left to right: a numeric heat-pump fraction, a string quality rating drawn from a fixed set, a numeric energy result, and a graph-encoded heating profile. Two of them are inputs (in) and two are outputs (out).
Rules for the data rows
From row 5 onward, every value must satisfy the column it belongs to:
- No value may be empty.
- A value in a
NUMBERcolumn must be convertible to a number. - A value in a
STRINGcolumn must not be convertible to a number. - An
enumcolumn may only contain a limited number of distinct values. - Each column should contain more than one distinct value — a column where every value is identical carries no information to analyze.
File size and shape
The file as a whole also has size and shape limits:
- It must meet a minimum number of data rows.
- It must meet a minimum number of columns, and stay under the maximum number of columns.
- It must be no larger than 10 MB on disk.
These keep a dataset large enough to be worth analyzing and small enough to stay responsive in the viewer.