Introduction

The Open Policy Agent (OPA, pronounced “oh-pa”) is an open source, general-purpose policy engine that unifies policy enforcement across the stack. OPA provides a high-level declarative language that let’s you specify policy as code and simple APIs to offload policy decision-making from your software. You can use OPA to enforce policies in microservices, Kubernetes, CI/CD pipelines, API gateways, and more.

Read this page to learn about the core concepts in OPA’s policy language (Rego) as well as how to download, run, and integrate OPA.

Overview

OPA decouples policy decision-making from policy enforcement. When your software needs to make policy decisions it queries OPA and supplies structured data (e.g., JSON) as input. OPA accepts arbitrary structured data as input.

Policy Decoupling

OPA generates policy decisions by evaluating the query input and against policies and data. OPA and Rego are domain-agnostic so you can describe almost any kind of invariant in your policies. For example:

  • Which users can access which resources.
  • Which subnets egress traffic is allowed to.
  • Which clusters a workload must be deployed to.
  • Which registries binaries can be downloaded from.
  • Which OS capabilities a container can execute with.
  • Which times of day the system can be accessed at.

Policy decisions are not limited to simple yes/no or allow/deny answers. Like query inputs, your policies can generate arbitrary structured data as output.

Let’s look at an example.

Example

Imagine you work for an organization with the following system:

Example System

There are three kinds of components in the system:

  • Servers expose zero or more protocols (e.g., http, ssh, etc.)
  • Networks connect servers and can be public or private. Public networks are connected to the Internet.
  • Ports attach servers to networks.

All of the servers, networks, and ports are provisioned by a script. The script receives a JSON representation of the system as input:

{
    "servers": [
        {"id": "app", "protocols": ["https", "ssh"], "ports": ["p1", "p2", "p3"]},
        {"id": "db", "protocols": ["mysql"], "ports": ["p3"]},
        {"id": "cache", "protocols": ["memcache"], "ports": ["p3"]},
        {"id": "ci", "protocols": ["http"], "ports": ["p1", "p2"]},
        {"id": "busybox", "protocols": ["telnet"], "ports": ["p1"]}
    ],
    "networks": [
        {"id": "net1", "public": false},
        {"id": "net2", "public": false},
        {"id": "net3", "public": true},
        {"id": "net4", "public": true}
    ],
    "ports": [
        {"id": "p1", "network": "net1"},
        {"id": "p2", "network": "net3"},
        {"id": "p3", "network": "net2"}
    ]
}

Earlier in the day your boss told you about a new security policy that has to be implemented:

1. Servers reachable from the Internet must not expose the insecure 'http' protocol.
2. Servers are not allowed to expose the 'telnet' protocol.

The policy needs to be enforced when servers, networks, and ports are provisioned and the compliance team wants to periodically audit the system to find servers that violate the policy.

Your boss has asked you to determine if OPA would be a good fit for implementing the policy.

Rego

OPA policies are expressed in a high-level declarative language called Rego. Rego is purpose-built for expressing policies over complex hierarchical data structures. For detailed information on Rego see the Policy Language documentation.

💡 The examples below are interactive! If you edit the input data above containing servers, networks, and ports, the output will change below. Similarly, if you edit the queries or rules in the examples below the output will change. As you read through this section, try changing the input, queries, and rules and observe the difference in output.

References

When OPA evaluates policies it binds data provided in the query to a global variable called input. You can refer to data in the input using the . (dot) operator.

input.servers
[
  {
    "id": "app",
    "ports": [
      "p1",
      "p2",
      "p3"
    ],
    "protocols": [
      "https",
      "ssh"
    ]
  },
  {
    "id": "db",
    "ports": [
      "p3"
    ],
    "protocols": [
      "mysql"
    ]
  },
  {
    "id": "cache",
    "ports": [
      "p3"
    ],
    "protocols": [
      "memcache"
    ]
  },
  {
    "id": "ci",
    "ports": [
      "p1",
      "p2"
    ],
    "protocols": [
      "http"
    ]
  },
  {
    "id": "busybox",
    "ports": [
      "p1"
    ],
    "protocols": [
      "telnet"
    ]
  }
]

To refer to array elements you can use the familiar square-bracket syntax:

input.servers[0].protocols[0]
"https"

💡 You can use the same square bracket syntax if keys contain other than [a-zA-Z0-9_]. E.g., input["foo~bar"].

If you refer to a value that does not exist, OPA returns undefined. Undefined means that OPA was not able to find any results.

input.deadbeef
undefined decision

Expressions (Logical AND)

To produce policy decisions in Rego you write expressions against input and other data.

input.servers[0].id == "app"
true

OPA includes a set of built-in functions you can use to perform common operations like string manipulation, regular expression matching, arithmetic, aggregation, and more.

count(input.servers[0].protocols) >= 1
true

For a complete list of built-in functions supported in OPA out-of-the-box see the Policy Reference page.

Multiple expressions are joined together with the ; (AND) operator. For queries to produce results, all of the expressions in the query must be true or defined. The order of expressions does not matter.

input.servers[0].id == "app"; input.servers[0].protocols[0] == "https"
true

You can omit the ; (AND) operator by splitting expressions across multiple lines. The following query has the same meaning as the previous one:

input.servers[0].id == "app"
input.servers[0].protocols[0] == "https"
true

If any of the expressions in the query are not true (or defined) the result is undefined. In the example below, the second expression is false:

input.servers[0].id == "app"
input.servers[0].protocols[0] == "telnet"
undefined decision

Variables

You can store values in intermediate variables using the := (assignment) operator. Variables can be referenced just like input.

s := input.servers[0]
s.id == "app"
p := s.protocols[0]
p == "https"
+---------+-------------------------------------------------------------------+
|    p    |                                 s                                 |
+---------+-------------------------------------------------------------------+
| "https" | {"id":"app","ports":["p1","p2","p3"],"protocols":["https","ssh"]} |
+---------+-------------------------------------------------------------------+

When OPA evaluates expressions, it finds values for the variables that make all of the expressions true. If there are no variable assignments that make all of the expressions true, the result is undefined.

s := input.servers[0]
s.id == "app"
s.protocols[1] == "telnet"
undefined decision

Variables are immutable. OPA reports an error if you try to assign the same variable twice.

s := input.servers[0]
s := input.servers[1]
1 error occurred: 2:1: rego_compile_error: var s assigned above

OPA must be able to enumerate the values for all variables in all expressions. If OPA cannot enumerate the values of a variable in any expression, OPA will report an error.

x := 1
x != y  # y has not been assigned a value
2 errors occurred:
2:1: rego_unsafe_var_error: var y is unsafe
2:1: rego_unsafe_var_error: var _ is unsafe

Iteration

Like other declarative languages (e.g., SQL), Rego does not have an explicit loop or iteration construct. Instead, iteration happens implicitly when you inject variables into expressions.

To understand how iteration works in Rego, imagine you need to check if any networks are public. Recall that the networks are supplied inside an array:

input.networks
[
  {
    "id": "net1",
    "public": false
  },
  {
    "id": "net2",
    "public": false
  },
  {
    "id": "net3",
    "public": true
  },
  {
    "id": "net4",
    "public": true
  }
]

One option would be to test each network in the input:

input.networks[0].public == true
false
input.networks[1].public == true
false
input.networks[2].public == true
true

This approach is problematic because there may be too many networks to list statically, or more importantly, the number of networks may not be known in advance.

In Rego, the solution is to substitute the array index with a variable.

some i; input.networks[i].public == true
+---+
| i |
+---+
| 2 |
| 3 |
+---+

Now the query asks for values of i that make the overall expression true. When you substitute variables in references, OPA automatically finds variable assignments that satisfy all of the expressions in the query. Just like intermediate variables, OPA returns the values of the variables.

You can substitute as many variables as you want. For example, to find out if any servers expose the insecure "http" protocol you could write:

some i, j; input.servers[i].protocols[j] == "http"
+---+---+
| i | j |
+---+---+
| 3 | 0 |
+---+---+

If variables appear multiple times the assignments satisfy all of the expressions. For example, to find the ids of ports connected to public networks, you could write:

some i, j
id := input.ports[i].id
input.ports[i].network == input.networks[j].id
input.networks[j].public
+---+------+---+
| i |  id  | j |
+---+------+---+
| 1 | "p2" | 2 |
+---+------+---+

Providing good names for variables can be hard. If you only refer to the variable once, you can replace it with the special _ (wildcard variable) operator. Conceptually, each instance of _ is a unique variable.

input.servers[_].protocols[_] == "http"
true

Just like references that refer to non-existent fields or expressions that fail to match, if OPA is unable to find any variable assignments that satisfy all of the expressions, the result is undefined.

some i; input.servers[i].protocols[i] == "ssh"  # there is no assignment of i that satisfies the expression
undefined decision

Rules

Rego lets you encapsulate and reuse logic with rules. Rules are just if-then logic statements. Rules can either be complete or incremental.

Complete Rules

Complete rules are if-then statements that assign a single value to a variable. For example:

package example.rules

any_public_networks = true {  # is true if...
    net := input.networks[_]  # some network exists and..
    net.public                # it is public.
}

Every rule consists of a head and a body. In Rego we say the rule head is true if the rule body is true for some set of variable assignments. In the example above any_public_networks = true is the head and net := input.networks[_]; net.public is the body.

You can query for the value generated by rules just like any other value:

any_public_networks
true

All values generated by rules can be queried via the global data variable.

data.example.rules.any_public_networks
true

💡 You can query the value of any rule loaded into OPA by referring to it with an absolute path. The path of a rule is always: data.<package-path>.<rule-name>.

If you omit the = <value> part of the rule head the value defaults to true. You could rewrite the example above as follows without changing the meaning:

package example.rules

any_public_networks {
    net := input.networks[_]
    net.public
}

To define constants, omit the rule body. When you omit the rule body it defaults to true. Since the rule body is true, the rule head is always true/defined.

package example.constants

pi = 3.14

Constants defined like this can be queried just like any other values:

pi > 3
true

If OPA cannot find variable assignments that satisfy the rule body, we say that the rule is undefined. For example, if the input provided to OPA does not include a public network then any_public_networks will be undefined (which is not the same as false.) Below, OPA is given a different set of input networks (none of which are public):

{
    "networks": [
        {"id": "n1", "public": false},
        {"id": "n2", "public": false}
    ]
}
any_public_networks
undefined decision

Incremental Rules

Incremental rules are if-then statements that generate a set of values and assign that set to a variable. For example:

package example.rules

public_network[net.id] {      # net.id is in the public_network set if...
    net := input.networks[_]  # some network exists and...
    net.public                # it is public.
}

In the example above public_network[net.id] is the rule head and net := input.networks[_]; net.public is the rule body. You can query for the entire set of values just like any other value:

public_network
[
  "net3",
  "net4"
]

You can iterate over the set of values by referencing the set elements with a variable:

some n; public_network[n]
+--------+-------------------+
|   n    | public_network[n] |
+--------+-------------------+
| "net3" | "net3"            |
| "net4" | "net4"            |
+--------+-------------------+

Lastly, you can check if a value exists in the set using the same syntax:

public_network["net3"]
"net3"

Logical OR

When you join multiple expressions together in a query you are expressing logical AND. To express logical OR in Rego you define multiple rules with the same name. Let’s look at an example.

Imagine you wanted to know if any servers expose protocols that give clients shell access. To determine this you could define a complete rule that declares shell_accessible to be true if any servers expose the "telnet" or "ssh" protocols:

package example.logical_or

default shell_accessible = false

shell_accessible = true {
    input.servers[_].protocols[_] == "telnet"
}

shell_accessible = true {
    input.servers[_].protocols[_] == "ssh"
}
{
    "servers": [
        {
            "id": "busybox",
            "protocols": ["http", "telnet"]
        },
        {
            "id": "web",
            "protocols": ["https"]
        }
    ]
}
shell_accessible
true

💡 The default keyword tells OPA to assign a value to the variable if all of the other rules with the same name are undefined.

When you use logical OR with incremental rules, each rule definition contributes to the set of values assigned to the variable. For example, the example above could be modified to generate a set of servers that expose "telnet" or "ssh".

package example.logical_or

shell_accessible[server.id] {
    server := input.servers[_]
    server.protocols[_] == "telnet"
}

shell_accessible[server.id] {
    server := input.servers[_]
    server.protocols[_] == "ssh"
}
{
    "servers": [
        {
            "id": "busybox",
            "protocols": ["http", "telnet"]
        },
        {
            "id": "db",
            "protocols": ["mysql", "ssh"]
        },
        {
            "id": "web",
            "protocols": ["https"]
        }
    ]
}
shell_accessible
[
  "busybox",
  "db"
]

Putting It Together

The sections above explain the core concepts in Rego. To put it all together let’s review the desired policy (in English):

1. Servers reachable from the Internet must not expose the insecure 'http' protocol.
2. Servers are not allowed to expose the 'telnet' protocol.

At a high-level the policy needs to identify servers that violate some conditions. To implement this policy we could define rules called violation that generate a set of servers that are in violation.

For example:

package example

allow = true {                                      # allow is true if...
    count(violation) == 0                           # there are zero violations.
}

violation[server.id] {                              # a server is in the violation set if...
    some server
    public_server[server]                           # it exists in the 'public_server' set and...
    server.protocols[_] == "http"                   # it contains the insecure "http" protocol.
}

violation[server.id] {                              # a server is in the violation set if...
    server := input.servers[_]                      # it exists in the input.servers collection and...
    server.protocols[_] == "telnet"                 # it contains the "telnet" protocol.
}

public_server[server] {                             # a server exists in the public_server set if...
    some i, j
    server := input.servers[_]                      # it exists in the input.servers collection and...
    server.ports[_] == input.ports[i].id            # it references a port in the input.ports collection and...
    input.ports[i].network == input.networks[j].id  # the port references a network in the input.networks collection and...
    input.networks[j].public                        # the network is public.
}
some x; violation[x]
+-----------+--------------+
|     x     | violation[x] |
+-----------+--------------+
| "ci"      | "ci"         |
| "busybox" | "busybox"    |
+-----------+--------------+

Running OPA

This section explains how you can query OPA directly and interact with it on your own machine.

1. Download OPA

To get started download an OPA binary for your platform from GitHub releases:

On macOS (64-bit):

curl -L -o opa https://openpolicyagent.org/downloads/latest/opa_darwin_amd64

On Linux (64-bit):

curl -L -o opa https://openpolicyagent.org/downloads/latest/opa_linux_amd64

Windows users can obtain the OPA executable from GitHub Releases. The steps below are the same for Windows users except the executable name will be different.

Set permissions on the OPA executable:

chmod 755 ./opa

You can also download and run OPA via Docker. The latest stable image tag is openpolicyagent/opa:latest.

2. Try opa eval

The simplest way to interact with OPA is via the command-line using the opa eval sub-command. opa eval is a swiss-army knife that you can use to evaluate arbitrary Rego expressions and policies. opa eval supports a large number of options for controlling evaluation. Commonly used flags include:

FlagShortDescription
--bundle-bLoad a bundle file or directory into OPA. This flag can be repeated.
--data-dLoad policy or data files into OPA. This flag can be repeated.
--input-iLoad a data file and use it as input. This flag cannot be repeated.
--format-fSet the output format to use. The default is json and is intended for programmatic use. The pretty format emits more human-readable output.
--failn/aExit with a non-zero exit code if the query is undefined.
--fail-definedn/aExit with a non-zero exit code if the query is not undefined.

For example:

input.json:

{
    "servers": [
        {"id": "app", "protocols": ["https", "ssh"], "ports": ["p1", "p2", "p3"]},
        {"id": "db", "protocols": ["mysql"], "ports": ["p3"]},
        {"id": "cache", "protocols": ["memcache"], "ports": ["p3"]},
        {"id": "ci", "protocols": ["http"], "ports": ["p1", "p2"]},
        {"id": "busybox", "protocols": ["telnet"], "ports": ["p1"]}
    ],
    "networks": [
        {"id": "net1", "public": false},
        {"id": "net2", "public": false},
        {"id": "net3", "public": true},
        {"id": "net4", "public": true}
    ],
    "ports": [
        {"id": "p1", "network": "n1"},
        {"id": "p2", "network": "n3"},
        {"id": "p3", "network": "n2"}
    ]
}

example.rego:

package example

allow = true {                                      # allow is true if...
    count(violation) == 0                           # there are zero violations.
}

violation[server.id] {                              # a server is in the violation set if...
    some server
    public_server[server]                           # it exists in the 'public_server' set and...
    server.protocols[_] == "http"                   # it contains the insecure "http" protocol.
}

violation[server.id] {                              # a server is in the violation set if...
    server := input.servers[_]                      # it exists in the input.servers collection and...
    server.protocols[_] == "telnet"                 # it contains the "telnet" protocol.
}

public_server[server] {                             # a server exists in the public_server set if...
    some i, j
    server := input.servers[_]                      # it exists in the input.servers collection and...
    server.ports[_] == input.ports[i].id            # it references a port in the input.ports collection and...
    input.ports[i].network == input.networks[j].id  # the port references a network in the input.networks collection and...
    input.networks[j].public                        # the network is public.
}
# Evaluate a trivial expression.
./opa eval '1*2+3'

# Evaluate a policy on the command line.
./opa eval -i input.json -d example.rego 'data.example.violation[x]'

# Evaluate a policy on the command line and use the exit code.
./opa eval --fail-defined -i input.json -d example.rego 'data.example.violation[x]'
echo $?

3. Try opa run (interactive)

OPA includes an interactive shell or REPL (Read-Eval-Print-Loop). You can use the REPL to experiment with policies and prototype new ones.

To start the REPL just:

./opa run

When you enter statements in the REPL, OPA evaluates them and prints the result.

> true
true
> 3.14
3.14
> ["hello", "world"]
[
  "hello",
  "world"
]

Most REPLs let you define variables that you can reference later on. OPA allows you to do something similar. For example, you can define a pi constant as follows:

> pi := 3.14

Once “pi” is defined, you query for the value and write expressions in terms of it:

> pi
3.14
> pi > 3
true

Quit out of the REPL by pressing Control-D or typing exit:

> exit

You can load policy and data files into the REPL by passing them on the command line. By default, JSON and YAML files are rooted under data.

opa run input.json

Run a few queries to poke around the data:

> data.servers[0].protocols[1]
> data.servers[i].protocols[j]
> net := data.networks[_]; net.public

To set a data file as the input document in the REPL prefix the file path:

opa run example.rego repl.input:input.json
> data.example.public_servers[s]

💡 Prefixing file paths with a reference controls where file is loaded under data. By convention, the REPL sets the input document that queries see by reading data.repl.input each time a statement is evaluated. See help input for details in the REPL.

Quit out of the REPL by pressing Control-D or typing exit:

> exit

4. Try opa run (server)

To integrate with OPA you can run it as a server and execute queries over HTTP. You can start OPA as a server with -s or --server:

./opa run --server

By default OPA listens for HTTP connections on 0.0.0.0:8181. See opa run --help for a list of options to change the listening address, enable TLS, and more.

Inside of another terminal use curl (or a similar tool) to access OPA’s HTTP API. When you query the /v1/data HTTP API you must wrap input data inside of a JSON object:

{
    "input": <value>
}

Create a copy the input file for sending via curl:

cat <<EOF > v1-data-input.json
{
    "input": $(cat input.json)
}
EOF

Execute a few curl requests and inspect the output:

curl localhost:8181/v1/data/example/violation -d @v1-data-input.json -H 'Content-Type: application/json'
curl localhost:8181/v1/data/example/allow -d @v1-data-input.json -H 'Content-Type: application/json'

By default data.system.main is used to serve policy queries without a path. When you execute queries without providing a path, you do not have to wrap the input. If the data.system.main decision is undefined it is treated as an error:

curl localhost:8181 -i -d @input.json -H 'Content-Type: application/json'

You can restart OPA and configure to use any decision as the default decision:

./opa run --server --set=default_decision=example/allow

Re-run the last curl command from above:

curl localhost:8181 -i -d @input.json -H 'Content-Type: application/json'

5. Try OPA as a Go library

OPA can be embedded inside Go programs as a library. The simplest way to embed OPA as a library is to import the github.com/open-policy-agent/opa/rego package.

import "github.com/open-policy-agent/opa/rego"

Call the rego.New function to create an object that can be prepared or evaluated:

r := rego.New(
    rego.Query("x = data.example.allow"),
    rego.Load([]string{"./example.rego"}, nil))

The rego.Rego supports several options that let you customize evaluation. See the GoDoc page for details. After constructing a new rego.Rego object you can call PrepareForEval() to obtain an executable query. If PrepareForEval() fails it indicates one of the options passed to the rego.New() call was invalid (e.g., parse error, compile error, etc.)

ctx := context.Background()
query, err := r.PrepareForEval(ctx)
if err != nil {
    // handle error
}

The prepared query object can be cached in-memory, shared across multiple goroutines, and invoked repeatedly with different inputs. Call Eval() to execute the prepared query.

bs, err := ioutil.ReadFile("./input.json")
if err != nil {
    // handle error
}

var input interface{}

if err := json.Unmarshal(bs, &input); err != nil {
    // handle error
}

rs, err := query.Eval(ctx, rego.EvalInput(input))
if err != nil {
    // handle error
}

The policy decision is contained in the results returned by the Eval() call. You can inspect the decision and handle it accordingly:

// In this example we expect a single result (stored in the variable 'x').
fmt.Println("Result:", rs[0].Bindings["x"])

You can combine the steps above into a simple command-line program that evaluates policies and outputs the result:

main.go:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"github.com/open-policy-agent/opa/rego"
)

func main() {

	ctx := context.Background()

	// Construct a Rego object that can be prepared or evaluated.
	r := rego.New(
		rego.Query(os.Args[2]),
		rego.Load([]string{os.Args[1]}, nil))

	// Create a prepared query that can be evaluated.
	query, err := r.PrepareForEval(ctx)
	if err != nil {
		log.Fatal(err)
	}

	// Load the input document from stdin.
	var input interface{}
	dec := json.NewDecoder(os.Stdin)
	dec.UseNumber()
	if err := dec.Decode(&input); err != nil {
		log.Fatal(err)
	}

	// Execute the prepared query.
	rs, err := query.Eval(ctx, rego.EvalInput(input))
	if err != nil {
		log.Fatal(err)
	}

    // Do something with the result.
	fmt.Println(rs)
}

Run the code above as follows:

go run main.go example.rego 'data.example.violation' < input.json

Next Steps

Congratulations on making it through the introduction to OPA. If you made it this far you have learned the core concepts behind OPA’s policy language as well as how to get OPA and run it on your own.

If you have more questions about how to write policies in Rego check out:

If you want to try OPA for a specific use case check out:

  • The Kubernetes page for how to use OPA as an admission controller in Kubernetes.
  • The Envoy page for how to use OPA as an external authorizer with Envoy.
  • The Terraform page for how to use OPA to validate Terraform plans.