How ttml-validator works internally
Desired characteristics of the validator
Key goals for the ttml-validator include:
Be as exhaustive as possible in finding issues, i.e. do not stop at the first problem found.
Log all the faults found, and where checks were completed without problem, log success too.
Allow for onward processing of the results so that patterns across large datasets can be identified.
Be as permissive as possible in the input, and attempt to derive the likely intent, and document that, before moving on.
Be extensible to accommodate new document types.
Check for potential TTML errors that are suggested or implied but not explicitly defined, for example IDREFS attributes referencing elements for which no behaviour is defined.
Validation algorithm
For validation run the validator does:
Load an appropriate
constraintSetfor the type of document being validated. This contains a list of pre-parsing checks and a list of post-XML-parsing checks that will be used later, as well as a method for summarising the results.Initiate a
validationLoggerto capture the results of the validation run.Initiate a
contextdictionary to allow checks to pass information down the line.Load the input bytes.
Iterate through the
preParseChecksin the constraint set, running each against those bytes. This process can modify the input bytes prior to passing it forward to the next check, for example to ensure that it has the expected encoding, or to strip out illegal bytes.Attempt to parse the processed byte stream as an XML document, using the Python
ElementTreelibrary. This approach is generic for all XML documents, as opposed to using a bindings-based approach which generally stops immediately if the input document cannot be mapped to the binding.Iterate through the
xmlChecksin the constraint set, running each against the parsed document objects. These checks typically do not modify the input element tree.Write out the validation log to the output file.
Summarise the overall document validity using the
constraintSet.Exit with an appropriate code representing whether the document was valid or not.
Supported profiles of TTML
The ttml-validator currently validates two profiles of TTML2:
EBU-TT-D and IMSC Text profile, including constraints of the BBC Subtitle Guidelines