Ringer Technical Documentation

Overview

Ringer is a database anonymisation and cloning service. It can automatically fetch a copy of your production database and fully anonymise it. From there, you can create unlimited independent copies of the database for development, testing and more.

Ringer runs as a single binary, typically on a dedicated server, with access to a ZFS disk large enough to fit your production database.

Getting started

To start, click Register at the top of this page to sign up for the beta. We will contact you to discuss your requirements (e.g. database size, identity provider, infrastructure platform). We'll give you all the resources you need to get Ringer set up in your infrastructure. We also offer support services and we can even host Ringer for you if you prefer.

Once you have Ringer running in your infrastructure, it's time to import your database. Visit the Ringer dashboard in your browser, log in and click Bases. Click Create new, enter a name to identify your database, and click Save. Once the database is created, click Edit to configure it for import. Ringer currently supports automatic imports from Google Cloud SQL and Amazon RDS. We will be adding support for more providers soon. If your database is hosted on a diferent provider, get in touch! We can build support for your use case quickly. If you have a custom database setup (e.g. you are running your database cluster yourself) you can also manually push a copy of your database to Ringer.

For instructions on importing from an Amazon RDS or Aurora database, see here.

Once you've entered the details of your source database, click Save and then click Start import. Ringer will start importing your database in the background. When it is done, the status of your database in the dashboard will change to Ready.

At this point you can create clones of your database. Click Clones in the sidebar and then click Create new. Once the clone is created, you can use the configuration details shown or the Ringer CLI to connect to the clone. This clone is a full-size, independent, anonymised copy of your production database.

You can create as many clones of your database as you want. When you're done with them, just delete them.

Generating an initial anonymisation config

When Ringer imports a database it automatically anonymises all columns of all tables, with the exception of primary keys. Often there are columns that don't need anonymisation because they don't contain sensitive data. You will also probably want to customise the generation of data for columns, for example to generate realistic names and addresses. This is all done by editing the anonymisation configuration for the database.

You can see the default configuration for your database in the dashboard. Ringer automatically generates a basic configuration which skips foreign key columns and sets a seed for the random generator. It also resets all passwords for database roles. In the next section, we'll look at how to customise this config.

Skipping a column

Let's say you have a database "main", containing a table "users" with a "created_at" column that you don't need to anonymise. You can configure this to be skipped with the following config:

rules.main.users.created_at.skip = true

Customising anonymisation

By default, Ringer will anonymise columns based on their type only. For a text column, this means Ringer will generate random characters. Let's say your "users" table has a "name" column. To generate more realistic anonymised values, you can specify the following:

rules.main.users.name.string.generator = "person/name"

Anonymisation rules are keyed by their data type. In this case, that's string. We then specify the generator to use when anonymising values in this column. Ringer uses generators, which provide a powerful way of generating practically any form of data.

As well as generators, Ringer has config options for common use cases. For example, let's say your "users" table has an "access_token" column with a specific format such as "AT<8 digits>". You can generate values satisfying this format by specifying a regular expression:

rules.main.users.access_token.string.regex = 'AT\d{8}'

As a final example, let's configure the "registered_at" and "metadata" columns. The former should be a timestamp between 2010 and 2030, and the latter should be a JSON object with the structure { "signup_source": "organic", "search_bucket": "default" }. At the same time, we'll show how to consolidate multiple rules together (just using standard TOML syntax).

[rules.main.users]
registered_at.timestamp = { after = "2010-01-01", before = "2030-01-01" }
metadata.json.object.keys = { signup_source = "organic", search_bucket = "default" }

The full reference for Ringer's anonymisation rules is here.

Running your own Ringer server instance

The easiest way to use Ringer is with our hosted service. But if you want to run Ringer yourself, that's no problem. We will provide detailed instructions and plenty of help to get you set up. The Ringer server is a single binary with minimal runtime dependencies. It only needs access to a ZFS file system with enough space to store your databases.

Anonymisation configuration reference

Ringer’s anonymisation configuration file has the following format:

`[rules]`

This section defines the anonymisation rules on a per-column basis. Each rule is keyed by the database name, table name, and finally the column name. For example, the rule for column "my_column" of table "my_table" in database "my_db" can be written as:

[rules.my_db.my_table.my_column]
# config here

Or alternatively:

rules.my_db.my_table.my_column.config_here = ...

As most anonymisation rules are nested underneath their type, it is common to use the following pattern:

[rules.my_db.my_table]
# for longer config
my_column_1.{type} = {
  ...
}
# for shorter config
my_column_2.{type}.short_config = ...

`[rules.{database}.{table}.{column}.skip]`

If set to true, this column will be skipped during anonymisation. This is typically used for columns that don't contain sensitive data. The default value is false, meaning the column will be anonymised.

[rules.my_db.my_table.my_column]
skip = true

`[rules.{database}.{table}.{column}.string]`

This section defines anonymisation rules for string columns. All fields are optional.

[rules.my_db.my_table.my_column.string]
prefix = "some-prefix"
suffix = "some-suffix"
regex = "[a-z]+"
min_length = 5
max_length = 10

prefix and suffix specify the prefix and suffix to apply to the generated value. regex specifies a regular expression that matches the generated value. min_length and max_length specify the minimum and maximum length of the generated value (excluding prefix and suffix).

To generate domain-specific strings, you can use a generator. This is configured with the generator field, which should be the name of a Polygen generator. For example, to generate a random email address, you can write:

[rules.my_db.my_table.my_column.string]
generator = "internet/email"

For a full list of supported generators, see the Polygen docs.

`[rules.{database}.{table}.{column}.int]`

This section defines anonymisation rules for integer columns. All fields are optional.

[rules.my_db.my_table.my_column.int]
min = -50
max = 9999

min and max specify the minimum and maximum value of the generated integer.

`[rules.{database}.{table}.{column}.float]`

This section defines anonymisation rules for float columns. All fields are optional.

[rules.my_db.my_table.my_column.float]
min = -1.23
max = 4.56

min and max specify the minimum and maximum value of the generated float.

`[rules.{database}.{table}.{column}.numeric]`

This section defines anonymisation rules for numeric columns. All fields are optional.

[rules.my_db.my_table.my_column.numeric]
min = -1.23
max = 4.56

min and max specify the minimum and maximum value of the generated numeric value.

`[rules.{database}.{table}.{column}.bytes]`

This section defines anonymisation rules for byte columns. All fields are optional.

[rules.my_db.my_table.my_column.bytes]
min_length = 20
max_length = 75

min_length and max_length specify the minimum and maximum length of the generated value.

`[rules.{database}.{table}.{column}.timestamp]`

This section defines anonymisation rules for timestamp columns. All fields are optional.

[rules.my_db.my_table.my_column.timestamp]
before = "2030-01-01T00:00:00Z"
after = "2020-01-01T00:00:00Z"

before and after specify the maximum and minimum values of the generated timestamp.

`[rules.{database}.{table}.{column}.date]`

This section defines anonymisation rules for date columns. All fields are optional.

[rules.my_db.my_table.my_column.date]
before = "2030-01-01"
after = "2020-01-01"

before and after specify the maximum and minimum values of the generated date.

`[rules.{database}.{table}.{column}.time]`

Anonymisation for time columns is not currently configurable.

`[rules.{database}.{table}.{column}.interval]`

Anonymisation for interval columns is not currently configurable.

Literal values

To specify a literal value for a column, just set the column to that value. For example:

[rules.my_db.my_table]
my_column = "some-value"
# or
my_column = true
# or
my_column = 123

`[rules.{database}.{table}.{column}.json]`

This section defines anonymisation rules for JSON columns. The object field is mandatory, with itself one mandatory field, keys, which is a map of attribute names to literal values.

Nested JSON objects are not yet supported. Neither are randomly generated attribute names or values.

[rules.my_db.my_table.my_column.json]
object.keys = { my_field_1 = "some-value", my_field_2 = 123 }

`rules.{database}.{table}.{column}.enum`

This section defines anonymisation rules for enumerations. This is useful when a column can contain one of several specific values, for example if it stores the "state" of an entity in your domain. The enum field should be set to an array of literal values. Generated values will be picked randomly from this array.

[rules.my_db.my_table.my_column]
enum = ["red", "green", "blue"]

`seed`

This field specifies the seed for the random number generator. We recommend setting this value so that anonymisation is deterministic. This can enable Ringer to optimise database imports in some circumstances.

seed = 184362742

`[roles]`

This section specifies settings for database roles, AKA users. Role settings are keyed by the role name.

`roles.{role name}.password`

This field specifies the password for the role. Ringer will set this password during anonymisation. This ensures that any existing password is overwritten, and allows you to log in as that role when working with a clone.

[roles.my_user]
password = "the-password"