CSV Files and Conventions¶
Many of the data tables exposed by this Python module are read from CSV files
contained in vc2_data_tables/csv/*.csv
. You are free to use these files
directly rather than via the vc2_data_tables
module.
In general, the CSV files mimic the format which appears in the VC-2 standards documents and can be opened in common spreadsheet packages and are intended to be relatively human-readable.
Tip
After opening the CSV files included with this module with a spreadsheet package it is often helpful to reset the width of all columns to a fixed size. This is because may files will contain comments which cause certain columns to be assigned very large sizes (see section about comments below).
Warning
Take care when modifying these CSV files some spreadsheet packages contain unhelpful ‘smart’ features which can mangle cells containing values matching certain formats (e.g. anything which looks like a date). Also watch out for trailing spaces in cells.
The sections below define the specific conventions used by the CSV files in this package.
CSV Dialect¶
The CSV provided use the Microsoft Excel CSV dialect and are encoded using
UTF-8. See Python’s csv
module for more background on CSV formatting
dialects.
Headings¶
In all CSV files, the first non-empty/comment row contains headings for the
values beneath. The meanings of each column should be self-explantory, but
further hints may be given in the vc2_data_tables
API documentation
associated with each table.
Note
The vc2_data_tables
software expects certain column names to be
used but is insensitive to column order and will ignore any unexpected
extra columns.
Ditto¶
In some tables, values from previous rows may be repeated using ‘ditto’,
written as "
or ''
. For example:
Number |
Above 3? |
---|---|
1 |
No |
2 |
“ |
3 |
“ |
4 |
Yes |
5 |
“ |
6 |
“ |
Warning
Excel and other spreadsheet packages often replace ‘straight’ quote characters with their ‘curly’ unicode variants. Take care when manually parsing CSV files containing ditto to also handle these variants.
Lists¶
Occasionally table cells may contain a comma separated list of values. These commas are escaped according to the Excel CSV convention of enclosing the whole cell value in double quotes.
Comments¶
To aid human consumption of the CSV data files, human-readable comments are used. All rows in a CSV file containing only empty cells or cells whose values start with a hash (
#
) character should be considered to be comments and omitted.The following illustrates how comments may be used in a CSV.
Not
a
comment!
# A comment
# Another comment
Not
a
comment!
123
abc
# Not a comment
Warning
A cell starting with a hash (
#
) on a row which contains data is not a comment (see final row of the example above). Comments may only occur on rows which otherwise do not contain data.