The dataset file format

How a dataset .csv is laid out: four reserved header rows, then the data.

Last updated · 5 min read · Docs / Datasets


How the file is structured

A dataset is a plain .csv file: a grid of comma-separated values. Each column is a single variable — a design input you control, or a result a simulation produced — and each row beneath the headers is one candidate building.

The first four rows are reserved: they describe each column rather than holding data. The actual data begins on row 5 and continues to the end of the file.

The four reserved rows must appear in this exact order, and every column must supply a value for all four. Think of them as the column's declaration: its name, whether it is an input or an output, its data type, and how its values should be interpreted.


The four reserved header rows

RowHeaderAllowed valuesPurpose
1NameShort descriptive textThe display name of the column.
2Usagein or outWhether the column is an input or an output.
3TypeNUMBER or STRINGThe underlying data type of the column's values.
4UnitDepends on Type (see below)How the column's values are interpreted and presented.
Row 1Name
A short, descriptive name for the column. It should not consist of numbers only, and most non-punctuation symbols are not allowed. Each column must have a name, and a clear one pays off later: the name is what you will see on every axis and filter in the viewer.
Row 2Usage
Either in or out (case-sensitive). Use in for design inputs you control and out for the results produced by a simulation. The distinction lets the toolkit treat the levers you set apart from the outcomes they drive.
Row 3Type
Either NUMBER or STRING (case-sensitive). This decides whether the column's values are read as numbers or as text, and it constrains which units row 4 may declare.
Row 4Unit
The Unit value depends on the Type declared in row 3, and it must follow the rules for a URL slug. See choosing a unit below for the full breakdown.

Choosing a unit

The unit in row 4 tells the toolkit how to read and present a column's values. Which units are allowed depends entirely on the Type declared in row 3.

Number units

When the column is a NUMBER, the Unit must be the slug of a known unit in the database. Available number units include: fraction, fraction-low-p, fraction-medium-p, fraction-high-p, fraction-max-p, fraction-integer, ratio, u-value, r-value, inch-f, integer, iteration, eci, ghgi, kwh, kg-co2, degrees-c.

String units

When the column is a STRING, the Unit must be exactly one of three reserved values. These are reserved for STRING columns and cannot be used on a NUMBER column:

  • enum — a small, fixed set of labels (for example a quality rating).
  • graph — a profile encoded as breakpoints, such as a daily heating schedule.
  • image — a reference to a stored asset.

Slug rules

Every unit value is a slug — the same lower-case, URL-safe form used elsewhere in the toolkit. A slug must satisfy all of the following:

  • Only lower-case letters (a-z), digits (0-9), and the hyphen (-).
  • A hyphen may not be the first or last character (e-x is fine; -ex and ex- are not).
  • Hyphens may not be directly adjacent (ex-a-b is fine; ex--ab is not).

Example

A valid file with four columns. The first four rows are the reserved headers; the data starts on row 5.

#Col 1Col 2Col 3Col 4
1HP FractionWindow QualityTEDIHeating Profile
2ininoutout
3NUMBERSTRINGNUMBERSTRING
4fractionenumecigraph
50.25Low42.1[0|12|28] | [290|269|248]
60.50High31.7[0|14|30] | [285|260|240]
A four-column dataset. Rows 1–4 are the reserved headers; rows 5–6 hold the data. The fourth column is a graph unit encoding a daily profile.

Reading the columns left to right: a numeric heat-pump fraction, a string quality rating drawn from a fixed set, a numeric energy result, and a graph-encoded heating profile. Two of them are inputs (in) and two are outputs (out).


Rules for the data rows

From row 5 onward, every value must satisfy the column it belongs to:

  • No value may be empty.
  • A value in a NUMBER column must be convertible to a number.
  • A value in a STRING column must not be convertible to a number.
  • An enum column may only contain a limited number of distinct values.
  • Each column should contain more than one distinct value — a column where every value is identical carries no information to analyze.

File size and shape

The file as a whole also has size and shape limits:

  • It must meet a minimum number of data rows.
  • It must meet a minimum number of columns, and stay under the maximum number of columns.
  • It must be no larger than 10 MB on disk.

These keep a dataset large enough to be worth analyzing and small enough to stay responsive in the viewer.

Maintained by Aaron Clausen.
Questions or corrections: aclausen@dialogdesign.ca

© DIALOG · Green Toolkit documentation