CSV Files and Conventions

Many of the data tables exposed by this Python module are read from CSV files contained in vc2_data_tables/csv/*.csv. You are free to use these files directly rather than via the vc2_data_tables module.

In general, the CSV files mimic the format which appears in the VC-2 standards documents and can be opened in common spreadsheet packages and are intended to be relatively human-readable.

Tip

After opening the CSV files included with this module with a spreadsheet package it is often helpful to reset the width of all columns to a fixed size. This is because may files will contain comments which cause certain columns to be assigned very large sizes (see section about comments below).

Warning

Take care when modifying these CSV files some spreadsheet packages contain unhelpful ‘smart’ features which can mangle cells containing values matching certain formats (e.g. anything which looks like a date). Also watch out for trailing spaces in cells.

The sections below define the specific conventions used by the CSV files in this package.

CSV Dialect

The CSV provided use the Microsoft Excel CSV dialect and are encoded using UTF-8. See Python’s csv module for more background on CSV formatting dialects.

Headings

In all CSV files, the first non-empty/comment row contains headings for the values beneath. The meanings of each column should be self-explantory, but further hints may be given in the vc2_data_tables API documentation associated with each table.

Note

The vc2_data_tables software expects certain column names to be used but is insensitive to column order and will ignore any unexpected extra columns.

Comments

To aid human consumption of the CSV data files, human-readable comments are used. All rows in a CSV file containing only empty cells or cells whose values start with a hash (#) character should be considered to be comments and omitted.

The following illustrates how comments may be used in a CSV.

Not

a

comment!

# A comment

# Another comment

Not

a

comment!

123

abc

# Not a comment

Warning

A cell starting with a hash (#) on a row which contains data is not a comment (see final row of the example above). Comments may only occur on rows which otherwise do not contain data.

Ditto

In some tables, values from previous rows may be repeated using ‘ditto’, written as " or ''. For example:

Number

Above 3?

1

No

2

3

4

Yes

5

6

Warning

Excel and other spreadsheet packages often replace ‘straight’ quote characters with their ‘curly’ unicode variants. Take care when manually parsing CSV files containing ditto to also handle these variants.

Lists

Occasionally table cells may contain a comma separated list of values. These commas are escaped according to the Excel CSV convention of enclosing the whole cell value in double quotes.