Create an Expectation
An Expectation is a verifiable assertion about your data. Expectations make implicit assumptions about your data explicit, and they provide a flexible, declarative language for describing expected behavior. They can help you better understand your data and help you improve data quality.
Prerequisites
- Procedure
- Sample code
-
Choose an Expectation to create.
GX comes with many built in Expectations to cover your data quality needs. You can find a catalog of these Expectations in the Expectation Gallery. When browsing the Expectation Gallery you can filter the available Expectations by the data quality issue they address and by the Data Sources they support. There is also a search bar that will let you filter Expectations by matching text in their name or docstring.
In your code, you will find the classes for Expectations in the
expectations
module:Pythonfrom great_expectations import expectations as gxe
-
Determine the Expectation's parameters
To determine the parameters your Expectation uses to evaluate data, reference the Expectation's entry in the Expectation Gallery. Under the Args section you will find a list of parameters that are necessary for the Expectation to be evaluated, along with the a description of the value that should be provided.
Parameters that indicate a column, list of columns, or a table must be provided when the Expectation is created. The value in these parameters is used to differentiate instances of the same Expectation class. All other parameters can be set when the Expectation is created or be assigned a dictionary lookup that will allow them to be set at runtime.
-
Optional. Determine the Expectation's other parameters
In addition to the parameters that are required for an Expectation to evaluate data all Expectations also support some standard parameters that determine how strictly Expectations are evaluated and permit the addition of metadata. In the Expectations Gallery these are found under each Expectation's Other Parameters section.
These parameters are:
Parameter Purpose meta
A dictionary of user-supplied meta-data to store with an Expectation. This dictionary can be used to add notes about the purpose and intended use of an Expectation. mostly
A special argument that allows for fuzzy validation of ColumnMapExpectation
s andMultiColumnMapExpectation
s based on a percentage of successfully validated rows. If the percentage is high enough, the Expectation will return asuccess
value oftrue
. -
Create the Expectation.
Using the Expectation class you picked and the parameters you determined when referencing the Expectation Gallery, you can create your Expectation.
- Preset parameters
- Runtime parameters
In this example the
ExpectColumnMaxToBeBetween
Expectation is created and all of its parameters are defined in advance while leavingstrict_min
andstrict_max
as their default values:Pythonpreset_expectation = gxe.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value=4,
max_value=6
)Runtime parameters are provided by passing a dictionary to the
expectation_parameters
argument of a Checkpoint'srun()
method.To indicate which key in the
expectation_parameters
dictionary corresponds to a given parameter in an Expectation you define a lookup as the value of the parameter when the Expectation is created. This is done by passing in a dictionary with the key$PARAMETER
when the Expectation is created. The value associated with the$PARAMETER
key is the lookup used to find the parameter in the runtime dictionary.In this example,
ExpectColumnMaxToBeBetween
is created for both thepassenger_count
and thefare
fields, and the values formin_value
andmax_value
in each Expectation will be passed in at runtime. To differentiate between the parameters for each Expectation a more specific key is set for finding each parameter in the runtimeexpectation_parameters
dictionary:Pythonpassenger_expectation = gxe.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value={"$PARAMETER": "expect_passenger_max_to_be_above"},
max_value={"$PARAMETER": "expect_passenger_max_to_be_below"}
)
fare_expectation = gxe.ExpectColumnMaxToBeBetween(
column="fare",
min_value={"$PARAMETER": "expect_fare_max_to_be_above"},
max_value={"$PARAMETER": "expect_fare_max_to_be_below"}
)The runtime
expectation_parameters
dictionary for the above example would look like:Pythonruntime_expectation_parameters = {
"expect_passenger_max_to_be_above": 4,
"expect_passenger_max_to_be_below": 6,
"expect_fare_max_to_be_above": 10.00,
"expect_fare_max_to_be_below": 500.00
}
# All Expectations are found in the `expectations` module:
from great_expectations import expectations as gxe
# This Expectation has all values set in advance:
preset_expectation = gxe.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value=1,
max_value=6
)
# In this case, two Expectations are created that will be passed
# parameters at runtime, and unique lookups are defined for each
# Expectations' parameters.
passenger_expectation = gxe.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value={"$PARAMETER": "expect_passenger_max_to_be_above"},
max_value={"$PARAMETER": "expect_passenger_max_to_be_below"}
)
fare_expectation = gxe.ExpectColumnMaxToBeBetween(
column="fare",
min_value={"$PARAMETER": "expect_fare_max_to_be_above"},
max_value={"$PARAMETER": "expect_fare_max_to_be_below"}
)
# A dictionary containing the parameters for both of the above
# Expectations would look like:
runtime_expectation_parameters = {
"expect_passenger_max_to_be_above": 4,
"expect_passenger_max_to_be_below": 6,
"expect_fare_max_to_be_above": 10.00,
"expect_fare_max_to_be_below": 500.00
}