class UCSchema

#include "ucschema.h"

Overview

The static methods within the UCSchema class provides a means to evaluate a given UniversalContainer and determine if it meets a pre-defined set of criteria. These criteria specify the type of the data held by the container and the possible values it may take on. Two key ideas are helpful in understanding these functions. The first is the notion of a contract, which defines the constraints which a particular UniversalContainer must satisfy. The second is the notion of a schema, which is a contract which has been bound to an identifying name and stored within the global library. To use the UCSchema functions, you define a set of schema using a simple language, and then add them to this global library. You may then check UniversalContainers against these schema by name.

An Example Schema

car ==> { "model" : string "plate" : string("\w\w\w\d\d\d\d") "year" : integer(1900:) "milage" : real(0.0:250000.0) "used" : boolean "smogcode" : character } boat ==> { "length" : real(5.0:), "displacement" : real, "plate" : string("WV \d\d\d\d\d") } owner ==> { "name" : string("[A-Z][a-z]* [A-Z][a-z]*") "age" : integer(16:75) } dmvrecord ==> { "vehicle" : #group car boat #endgroup "owners" : [ #type : owner #size : integer(1:) ] }

Describing a Schema

A schema is described in a block of formated plain text, which is then sent to one of the add_contract_to_library methods to add the schema described to the global library. A schema always has the form:

schemaname ==> contract
where schemaname is an identifier and contract is a valid contract. Valid identifiers are formed by an alphabetic character followed by one or more alpha-numeric characters. Contracts are a combination of type information and constraints that must be meet by a UniversalContainer. A contract can specify scalars, dictionaries, or arrays. Contracts may also describe a group of possible contracts, satisfying any one of which will cause the contract to be satisfied. Additional, each contract may have an optional tag, identifying data which must match a provided symbol table.

Schema Contracts

When a valid schema identifier appears in place of a contract, it is interpreted as the contract to mean the contract associated with the schema at the time when the contract is checked. This allows schemas to be referenced from within other schemas or contracts. When the contract parser encounters such an identifier it is not checked, instead a contract is created with a reference to the identifier. Only when that contract is later checked against a UniversalContainer is the global library checked for a matching contract. Consequently, schema names do not have to a be forward declared.

Scalar Contracts

A scalar contract describes a contract on a scalar data type or a string. These contracts consist of a keyword that declares the type, and an optional restriction on the possible values that the data may take on specified in parentheses.

ContractNotes
integer(MIN:MAX) The integer keyword specifies that the data under consideration must be of type Integer. If parentheses are present, then one or both of MIN and MAX must be specified. If only one value is specified it is interpreted as the minimum value for the data, unless it is preceded by a colon (:), in which case it is interpreted as a maximum value for the data.
real(MIN:MAX) The real keyword specifies that the data under consideration must be of type Real. If parentheses are present, then one or both of MIN and MAX must be specified. If only one value is specified it is interpreted as the minimum value for the data, unless it is preceded by a colon (:), in which case it is interpreted as a maximum value for the data.
character(MIN:MAX) The character keyword specifies that the data under consideration must be of type Integer. If parentheses are present, then one or both of MIN and MAX must be specified. If only one value is specified it is interpreted as the minimum value for the data, unless it is preceded by a colon (:), in which case it is interpreted as a maximum value for the data. For a character, valid minimum and maximum values are un-escaped characters.
boolean(true | false) The boolean keyword specifies that the data under consideration must be of type boolean. If parentheses are present, then the literal "true" or "false" must also appear to constrain the value to one of those two values. This feature is intended to be used with group contracts (explained below), which allow for a schema to be satisfied if a particular data item conforms to one of several different sub-schema.
string(regex) The string keyword specifies that the data under consideration must be of a string. If parentheses are present, then they must contain a regular expression which the string must match in order to satisfy the contract. The underlying regex checking is done using the system regex routines described in regex.h, see man regex and re_format for a description of the dialect.

Map Contracts

A map schema contract specifies constraints on a map. Fields where the key value is separated from the contract by a colon are required and must appear in order for the contract to be satisfied. Fields where the key value is separated by the contract by a question mark are optional. They do not have to appear, but if they do they must satisfy the contract. The order in which fields appear does not matter. Key values are double quoted strings. A map schema takes the form :

{ "required_key" : contract, "optional_key" ? contact ... }

Array Contracts

Array contracts allow for a variety of constraints to be placed on an array and its contents, expressed as a series of sub-contracts. The #type contract specifies a contract that all elements must satisfy. The #exists contract specifies a contract that must be satisfied by at least one element of the array. The #size contract must be an integer contract, and it is applied to the number of elements in the array, rather than the contents. It also possible to place constraints on particular elements of an array. Instead of one of the preceding keywords, a contract may be associated with an integer. The contract is then applied to that zero-indexed position in the array.

An array contract takes the form :

[ #type: contract, #size: integer contract, #exists: contract integer : contract ]

Group Contracts

Group contracts provide a mechanism for dealing with UniversalContainers that might be satisfied by one of several possible sub-contracts. It can be used to specify a constrained set of types that will satisfy the constraints. Alternatively, if there are interdependencies between data values in a satisfying object, group contracts can be used to specify the possible variations.

Group contracts take the form :
#group contract contract ... #endgroup

Tagged Contracts

Any valid contract may be tagged, by prefixing it with a name enclosed in angle brackets. If the name of the tag is not found in the symbol table, the value of the satisfying object is cloned into a symbol table. If the given name is found in the symbol table, the contract is instead checked for equality with UniversalContainer which previously satisfied that tag. By using the versions of the compare and compare_and_throw that allow for a symbol table to passed in, it is possible to preload the symbol table with the desired values, or to preserve the values across multiple calls to the compare routines. When using the compare routines without a symbol table an internal symbol table is still maintained, filled, and checked for the duration of the call.

UCSchema Static Methods

static void UCSchema::add_contract_to_library(Buffer* buf)

static void UCSchema::add_contract_to_library(char* str)

These functions take a block of plain text that describes one or more schema and parses it, adding the results to the global library.

static unsigned compare(std::string const& contract, const UniversalContainer& uc)

static unsigned compare(std::string const& contract, const UniversalContainer& uc, UniversalContainer& symbol_table)

Compares the container uc against the named contract, and returns 0 if uc matches the contract. A non-zero return value indicates one or more violations of the contract; depending on which bits of the return code are set. Note that in the case of arrays and maps, there is no indication of which element caused a particular flag to be set, and there may be multiple violations of the same type. The following table details the various masks which can be ANDed with the return code to determine which errors occurred.

If symbol_table is provided, it must be a map or null. If any values of the map are set any tagged contracts within the schema are checked against those values. Other tagged values are inserted into the symbol_table, which is turned into a map if it comes in as null.

ucc_NO_SUCH_TYPE No type with the name given by contract has been registered with the library.
ucc_IMPROPER_TYPE The container or one or more of its elements has the wrong data type.
ucc_CONSTRAINT_VIOLATION The value of the container violates the constraints specified in the contract. This is also the flag that signals an array that has violated its size constraints, or for which some element has violated that exists constraint.
ucc_EXTRA_MAP_ELEMENT A map contains one or more elements that are not specified in as required or optional.
ucc_MISSING_REQUIRED_MAP_ELEMENT A map is missing an element that is marked as required.
ucc_MISSING_REQUIRED_ARRAY_ELEMENT An array does not contain an element that is specified in the exists constraint.
ucc_STRING_DOES_NOT_MATCH Some string does not match the given regex expression.

static void compare_and_throw(std::string const& contract, const UniversalContainer& uc, unsigned conditions = 0XFFFFFFFF)

static void compare_and_throw(std::string const& contract, const UniversalContainer& uc, UniversalContainer& symbol_table, unsigned conditions = 0XFFFFFFFF)

These functions operate similar to the corresponding compare functions, but instead of returning a value they will throw an exception. This exception will be a UniversalContainer with a map type, similar to other exceptions thrown by the library. Two additional keys will be present. The key compare_results contains the value that would have been returned by compare. The key violations contains an array of strings explaining the various errors.

The conditions field is ORed with the results of the compare. If the results are non-zero the exception is thrown. This can be used to allow some conditions, such as the presence of extra map elements, to not throw an exception.

Public, non-static methods of UCContract

UCContract contains a number of public, non-static methods. These methods are not documented or supported, but are unlikely to undergo much further evolution. If a new way to specify schema is required, such as with JSONSchema, these methods would be a good place to start.