Type Checking Edit

Using schemas to enhance the Rego type checker

You can provide one or more input schema files and/or data schema files to opa eval to improve static type checking and get more precise error reports as you develop Rego code.

The -s flag can be used to upload schemas for input and data documents in JSON Schema format. You can either load a single JSON schema file for the input document or directory of schema files.

-s, --schema string set schema file path or directory path

Passing a single file with -s

When a single file is passed, it is a schema file associated with the input document globally. This means that for all rules in all packages, the input has a type derived from that schema. There is no constraint on the name of the file, it could be anything.

Example:

opa eval data.envoy.authz.allow -i opa-schema-examples/envoy/input.json -d opa-schema-examples/envoy/policy.rego -s opa-schema-examples/envoy/schemas/my-schema.json

Passing a directory with -s

When a directory path is passed, annotations will be used in the code to indicate what expressions map to what schemas (see below). Both input schema files and data schema files can be provided in the same directory, with different names. The directory of schemas may have any sub-directories. Notice that when a directory is passed the input document does not have a schema associated with it globally. This must also be indicated via an annotation.

Example:

opa eval data.kubernetes.admission -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego -s opa-schema-examples/kubernetes/schemas

Schemas can also be provided for policy and data files loaded via opa eval --bundle

Example:

opa eval data.kubernetes.admission -i opa-schema-examples/kubernetes/input.json -b opa-schema-examples/bundle.tar.gz -s opa-schema-examples/kubernetes/schemas

Samples provided at: https://github.com/aavarghese/opa-schema-examples/

Usage scenario with a single schema file

Consider the following Rego code, which assumes as input a Kubernetes admission review. For resources that are Pods, it checks that the image name starts with a specific prefix.

pod.rego

package kubernetes.admission

deny[msg] {
  input.request.kind.kinds == "Pod"
  image := input.request.object.spec.containers[_].image
  not startswith(image, "hooli.com/")
  msg := sprintf("image '%v' comes from untrusted registry", [image])
}

Notice that this code has a typo in it: input.request.kind.kinds is undefined and should have been input.request.kind.kind.

Consider the following input document:

input.json

{
    "kind": "AdmissionReview",
    "request": {
      "kind": {
        "kind": "Pod",
        "version": "v1"
      },
      "object": {
        "metadata": {
          "name": "myapp"
        },
        "spec": {
          "containers": [
            {
              "image": "nginx",
              "name": "nginx-frontend"
            },
            {
              "image": "mysql",
              "name": "mysql-backend"
            }
          ]
        }
      }
    }
  }

Clearly there are 2 image names that are in violation of the policy. However, when we evaluate the erroneous Rego code against this input we obtain:

% opa eval --format pretty -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego
[]

The empty value returned is indistinguishable from a situation where the input did not violate the policy. This error is therefore causing the policy not to catch violating inputs appropriately.

If we fix the Rego code and change input.request.kind.kinds to input.request.kind.kind, then we obtain the expected result:

[
"image 'nginx' comes from untrusted registry",
"image 'mysql' comes from untrusted registry"
]

With this feature, it is possible to pass a schema to opa eval, written in JSON Schema. Consider the admission review schema provided at: https://github.com/aavarghese/opa-schema-examples/blob/main/kubernetes/schemas/input.json

We can pass this schema to the evaluator as follows:

% opa eval --format pretty -i opa-schema-examples/kubernetes/input.json -d opa-schema-examples/kubernetes/policy.rego -s opa-schema-examples/kubernetes/schemas/input.json

With the erroneous Rego code, we now obtain the following type error:

1 error occurred: ../../aavarghese/opa-schema-examples/kubernetes/policy.rego:5: rego_type_error: undefined ref: input.request.kind.kinds
  input.request.kind.kinds
                     ^
                     have: "kinds"
                     want (one of): ["kind" "version"]

This indicates the error to the Rego developer right away, without having the need to observe the results of runs on actual data, thereby improving productivity.

Schema annotations

When passing a directory of schemas to opa eval, schema annotations become handy to associate a Rego expression with a corresponding schema within a given scope:

# METADATA
# schemas:
#   - <path-to-value>:<path-to-schema>
#   ...
#   - <path-to-value>:<path-to-schema>
allow {
  ...
}

The annotation must be specified as YAML within a comment block that must start with # METADATA. Also, every line in the comment block containing the annotation must start at Column 1 in the module/file, or otherwise, they will be ignored.

🚨 OPA will attempt to parse the YAML document in comments following the initial # METADATA comment. If the YAML document cannot be parsed, OPA will return an error. If you need to include additional comments between the comment block and the next statement, include a blank line immediately after the comment block containing the YAML document. This tells OPA that the comment block containing the YAML document is finished

The schemas field specifies an array associating schemas to data values. Paths must start with input or data (i.e., they must be fully-qualified.)

The type checker derives a Rego Object type for the schema and an appropriate entry is added to the type environment before type checking the rule. This entry is removed upon exit from the rule.

Example:

Consider the following Rego code which checks if an operation is allowed by a user, given an ACL data document:

package policy

import data.acl

default allow = false

# METADATA
# schemas:
#   - input: schema.input
#   - data.acl: schema["acl-schema"]
allow {
        access = data.acl["alice"]
        access[_] == input.operation
}

allow {
        access = data.acl["bob"]
        access[_] == input.operation
}

Consider a directory named mySchemasDir with the following structure, provided via opa eval --schema opa-schema-examples/mySchemasDir

mySchemasDir/
├── input.json
└── acl-schema.json

For actual code samples, see https://github.com/aavarghese/opa-schema-examples/tree/main/acl.

In the first allow rule above, the input document has the schema input.json, and data.acl has the schema acl-schema.json. Note that we use the relative path inside the mySchemasDir directory to identify a schema, omit the .json suffix, and use the global variable schema to stand for the top-level of the directory. Schemas in annotations are proper Rego references. So schema.input is also valid, but schema.acl-schema is not.

If we had the expression data.acl.foo in this rule, it would result in a type error because the schema contained in acl-schema.json only defines object properties "alice" and "bob" in the ACL data document.

On the other hand, this annotation does not constrain other paths under data. What it says is that we know the type of data.acl statically, but not that of other paths. So for example, data.foo is not a type error and gets assigned the type Any.

Note that the second allow rule doesn’t have a METADATA comment block attached to it, and hence will not be type checked with any schemas.

On a different note, schema annotations can also be added to policy files part of a bundle package loaded via opa eval --bundle alongwith the --schema parameter for type checking a set of *.rego policy files.

Annotation Scopes

Annotations can be defined at the rule or package level. The scope field on the annotation determines how the schema annotation will be applied. If the scope field is omitted, it defaults to the scope for the statement that immediately follows the annotation. The scope values that are currently supported are:

  • rule - applies to the individual rule statement
  • document - applies to all of the rules with the same name in the same package
  • package - applies to all of the rules in the package
  • subpackages - applies to all of the rules in the package and all subpackages (recursively)

In case of overlap, schema annotations override each other as follows:

rule overrides document
document overrides package
package overrides subpackages

The following sections explain how the different scopes work.

Rule and Document Scopes

In the example above, the second rule does not include an annotation so type checking of the second rule would not take schemas into account. To enable type checking on the second (or other rules in the same file) we could specify the annotation multiple times:

# METADATA
# scope: rule
# schemas:
#   - input: schema.input
#   - data.acl: schema["acl-schema"]
allow {
        access = data.acl["alice"]
        access[_] == input.operation
}

# METADATA
# scope: rule
# schemas:
#   - input: schema.input
#   - data.acl: schema["acl-schema"]
allow {
        access = data.acl["bob"]
        access[_] == input.operation
}

This is obviously redundant and error prone. To avoid this problem, we can define the annotation once on a rule with scope document:

# METADATA
# scope: document
# schemas:
#   - input: schema.input
#   - data.acl: schema["acl-schema"]
allow {
        access = data.acl["alice"]
        access[_] == input.operation
}

allow {
        access = data.acl["bob"]
        access[_] == input.operation
}

In this example, the annotation with document scope has the same affect as the two rule scoped annotations in the previous example.

Since the document scope annotation applies to all rules with the same name in the same package (which can span multiple files) and there is no ordering across files in the same package, document scope annotations can only be specified once per rule set. The document scope annotation can be applied to any rule in the set (i.e., ordering does not matter.)

Package and Subpackage Scopes

Annotations can be defined at the package level and then applied to all rules within the package:

# METADATA
# scope: package
# schemas:
#   - input: schema.input
#   - data.acl: schema["acl-schema"]
package example

allow {
        access = data.acl["alice"]
        access[_] == input.operation
}

allow {
        access = data.acl["bob"]
        access[_] == input.operation
}

package scoped schema annotations are useful when all rules in the same package operate on the same input structure. In some cases, when policies are organized into many sub-packages, it is useful to declare schemas recursively for them using the subpackages scope. For example:

# METADTA
# scope: subpackages
# schemas:
# - input: schema.input
package kubernetes.admission

This snippet would declare the top-level schema for input for the kubernetes.admission package as well as all subpackages. If admission control rules were defined inside packages like kubernetes.admission.workloads.pods, they would be able to pickup that one schema declaration.

Overriding

JSON Schemas are often incomplete specifications of the format of data. For example, a Kubernetes Admission Review resource has a field object which can contain any other Kubernetes resource. A schema for Admission Review has a generic type object for that field that has no further specification. To allow more precise type checking in such cases, we support overriding existing schemas.

Consider the following example:

package kubernetes.admission

# METADATA
# scope: rule
# schemas:
# - input: schema.input
# - input.request.object: schema.kubernetes.pod
deny[msg] {
        input.request.kind.kind == "Pod"
        image := input.request.object.spec.containers[_].image
        not startswith(image, "hooli.com/")
        msg := sprintf("image '%v' comes from untrusted registry", [image])
}

In this example, the input is associated with an Admission Review schema, and furthermore input.request.object is set to have the schema of a Kubernetes Pod. In effect, the second schema annotation overrides the first one. Overriding is a schema transformation feature and combines existing schemas. In this case, we are combining the Admission Review schema with that of a Pod.

Notice that the order of schema annotations matter for overriding to work correctly.

Given a schema annotation, if a prefix of the path already has a type in the environment, then the annotation has the effect of merging and overriding the existing type with the type derived from the schema. In the example above, the prefix input already has a type in the type environment, so the second annotation overrides this existing type. Overriding affects the type of the longest prefix that already has a type. If no such prefix exists, the new path and type are added to the type environment for the scope of the rule.

In general, consider the existing Rego type:

object{a: object{b: object{c: C, d: D, e: E}}}

If we override this type with the following type (derived from a schema annotation of the form a.b.e: schema-for-E1):

object{a: object{b: object{e: E1}}}

It results in the following type:

object{a: object{b: object{c: C, d: D, e: E1}}}

Notice that b still has its fields c and d, so overriding has a merging effect as well. Moreover, the type of expression a.b.e is now E1 instead of E.

We can also use overriding to add new paths to an existing type, so if we override the initial type with the following:

object{a: object{b: object{f: F}}}

we obtain the following type:

object{a: object{b: object{c: C, d: D, e: E, f: F}}}

We use schemas to enhance the type checking capability of OPA, and not to validate the input and data documents against desired schemas. This burden is still on the user and care must be taken when using overriding to ensure that the input and data provided are sensible and validated against the transformed schemas.

Multiple input schemas

It is sometimes useful to have different input schemas for different rules in the same package. This can be achieved as illustrated by the following example:

package policy

import data.acl

default allow = false

# METADATA
# scope: rule
# schemas:
#  - input: schema["input"]
#  - data.acl: schema["acl-schema"]
allow {
        access = data.acl[input.user]
        access[_] == input.operation
}

# METADATA for whocan rule
# scope: rule
# schemas:
#   - input: schema["whocan-input-schema"]
#   - data.acl: schema["acl-schema"]
whocan[user] {
        access = acl[user]
        access[_] == input.operation
}

The directory that is passed to opa eval is the following:

mySchemasDir/
├── input.json
└── acl-schema.json
└── whocan-input-schema.json

In this example, we associate the schema input.json with the input document in the rule allow, and the schema whocan-input-schema.json with the input document for the rule whocan.

Translating schemas to Rego types and dynamicity

Rego has a gradual type system meaning that types can be partially known statically. For example, an object could have certain fields whose types are known and others that are unknown statically. OPA type checks what it knows statically and leaves the unknown parts to be type checked at runtime. An OPA object type has two parts: the static part with the type information known statically, and a dynamic part, which can be nil (meaning everything is known statically) or non-nil and indicating what is unknown.

When we derive a type from a schema, we try to match what is known and unknown in the schema. For example, an object that has no specified fields becomes the Rego type Object{Any: Any}. However, currently additionalProperties and additionalItems are ignored. When a schema is fully specified, we derive a type with its dynamic part set to nil, meaning that we take a strict interpretation in order to get the most out of static type checking. This is the case even if additionalProperties is set to true in the schema. In the future, we will take this feature into account when deriving Rego types.

When overriding existing types, the dynamicity of the overridden prefix is preserved.

Limitations

Currently this feature admits schemas written in JSON Schema but does not support every feature available in this format. In particular the following features are not yet supported:

  • additional properties for objects
  • pattern properties for objects
  • additional items for arrays
  • contains for arrays
  • allOf, anyOf, oneOf, not
  • enum
  • if/then/else

A note of caution: overriding is a powerful capability that must be used carefully. For example, the user is allowed to write:

# METADATA
# scope: rule
# schema:
#  - data: schema["some-schema"]

In this case, we are overriding the root of all documents to have some schema. Since all Rego code lives under data as virtual documents, this in practice renders all of them inaccessible (resulting in type errors). Similarly, assigning a schema to a package name is not a good idea and can cause problems. Care must also be taken when defining overrides so that the transformation of schemas is sensible and data can be validated against the transformed schema.

References

For more examples, please see https://github.com/aavarghese/opa-schema-examples

This contains samples for Envoy, Kubernetes, and Terraform including corresponding JSON Schemas.

For a reference on JSON Schema please see: http://json-schema.org/understanding-json-schema/reference/index.html

For a tool that generates JSON Schema from JSON samples, please see: https://jsonschema.net/home