Manage Expectations
An Expectation is a verifiable assertion about your data. Expectations make implicit assumptions about your data explicit, and they provide a flexible, declarative language for describing expected behavior. They can help you better understand your data and help you improve data quality.
Prerequisites
- Python version 3.8 to 3.11.
- An installation of GX 1.0.
- Recommended. A preconfigured Data Source and Data Asset connected to your data.
- Recommended. A preconfigured Data Source and Data Asset connected to your data.
Create an Expectation
- Procedure
- Sample code
- Import the
expectations
module from the GX Core library:
import great_expectations.expectations as gxe
- Initialize an Expectation class with the required parameters for that Expectation:
expectation = gxe.ExpectColumnValuesToBeInSet(
column="passenger_count", value_set=[1, 2, 3, 4, 5]
)
The expectations
module contains the core Expectations classes available in GX. The specific parameters you provide when initializing an Expectation are determined by the Expectation class.
You can view available Expectations and the parameters they take in the Expectation Gallery.
import great_expectations.expectations as gxe
expectation = gxe.ExpectColumnValuesToBeInSet(
column="passenger_count", value_set=[1, 2, 3, 4, 5]
)
Test an Expectation
- Procedure
- Sample code
- Retrieve a Batch of data to test the Expectation against.
In this procedure the variable batch
is your Batch of data.
- Get the Expectation to test. This could be a newly created Expectation, an Expectation retrieved from an Expectation Suite, or a pre-existing Expectation from your code.
In this procedure the variable expectation
is your Expectation to test.
- Validate the Expectation against the Batch:
validation_result = batch.validate(expectation)
-
Optional. Modify the Expectation and test it again.
-
Optional. Add the Expectation to an Expectation Suite.
Expectations do not persist between Python sessions unless they are saved as part of an Expectation Suite.
import great_expectations as gx
import great_expectations.expectations as gxe
context = gx.get_context()
data_asset = context.get_datasource("my_datasource").get_asset("my_asset")
batch =
expectation = gxe.ExpectColumnValuesToBeInSet(
column="passenger_count", value_set=[1, 2, 3, 4, 5]
)
validation_result = batch.validate(expectation)
Modify an Expectation
- Procedure
- Sample code
- Get the Expectation to modify. This could be a newly created Expectation that you wish to adjust, an Expectation retrieved from an Expectation Suite, or a pre-existing Expectation from your code.
In this procedure the variable expectation
is the Expectation you're modifying.
- Modify the Expectation's attributes:
expectation.value_set = [1, 2, 3, 4, 5]
The specific attributes that can be modified correspond to the parameters used to initialize the Expectation. You can view available Expectations and the parameters they take in the Expectation Gallery.
- Optional. Test the modified Expectation against a Batch of data.
Repeat this step until the results from testing the Expectation correspond to the desired results for your specific use case and data.
- Optional. If the Expectation belongs to an Expectation Suite, save the changes to the Expectation Suite:
expectation.save()
expectation.save()
is explicitly used to update the configuration of an Expectation in an Expectation Suite.
An Expectation Suite continues to use the Expectation's original values unless you save your modifications. However, you can test your modified Expectation without saving any changes to its Expectation Suite. This allows you to explicitly decide if you want to keep or discard your changes after testing.
The command expectation.save()
fails if the Expectation is not part of an Expectation Suite.
import great_expectations as gx
import great_expectations.expectations as gxe
from great_expectations.core.expectation_suite import ExpectationSuite
context = gx.get_context()
suite = context.suites.add(ExpectationSuite(name="my_expectation_suite"))
expectation = suite.add_expectation(
gxe.ExpectColumnValuesToBeInSet(column="passenger_count", value_set=[1, 2])
)
expectation.value_set = [1, 2, 3, 4, 5]
expectation.save()
Customize an Expectation Class
- Procedure
- Sample code
- Choose and import a base Expectation class:
from great_expectations.expectations import ExpectColumnValueToBeBetween
You can customize any of the core Expectation classes in GX. You can view the available Expectations and their functionality in the Expectation Gallery.
- Create a new Expectation class that inherits the base Expectation class.
The core Expectations in GX have names descriptive of their functionality. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case:
class ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
- Override the Expectation's attributes with new default values:
class ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 0
max_value: int = 6
The attributes that can be overriden correspond to the parameters required by the base Expectation. These can be referenced from the Expectation Gallery.
- Customize the rendering of the new Expectation when displayed in Data Docs:
class ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 0
max_value: int = 6
render_text: str = "There should be between **0** and **6** passengers."
The render_text
attribute contains the text describing the customized Expectation when your results are rendered into Data Docs. You can format the text with Markdown syntax.
from great_expectations.expectations import ExpectColumnValueToBeBetween
class ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 0
max_value: int = 6
render_text: str = "There should be between **0** and **6** passengers."
Next steps
- Create Custom SQL Expectations
- Manage Expectation Suites