Data Validation using Pydantic Models
Validate Data, prevent script failures.
In the realm of automation, scripts often thrive on the variables they receive. These variables determine the actions the script will perform. However, if a script encounters a variable in a format or data type it doesn't expect, it might throw an error with a message that's about as clear as mud. This is where data validation comes into play.
Validating the data passed to a script is like giving it a road map to success. It ensures that the script knows what to expect and how to handle it. Whether the data is coming from another script or an end device, validation helps prevent those cryptic error messages and keeps your automation journey smooth sailing.
What is Data Validation?
Data validation is like the gatekeeper of your data world—it's all about ensuring that the data you're dealing with is accurate, reliable, and fits the requirements of whatever you're trying to do with it. Think of it as quality control for your data before you start using it in your programs or analyses. There are various ways to validate data depending on what you need it for and what rules it needs to follow. And that's where pydantic swoops in to save the day!
In this post, we'll dive into how pydantic can be your trusty sidekick in the world of data validation. We'll explore how it works and why it's such a handy tool to have in your toolkit.
Example
Imagine you're tasked with automating the process of adding network objects to a firewall, specifically a Palo Alto Networks firewall. These network objects could represent things like IP addresses, subnets, or ranges of addresses.
Here's a snippet of what that might look like:
from rich import print
import requests
import json
data = [
{"name": "test1", "ip": "1.1.1.1/32", "type": "ip-netmask"},
{"name": "test2", "ip": "google.com", "type": "fqdn"},
{"name": "test3", "ip": "1.1.1.30-1.1.1.20", "type": "ip-range"},
]
for object in data:
url = f"https://192.168.1.41:443/restapi/v10.1/Objects/Addresses?location=vsys&vsys=vsys1&name={object['name']}"
payload = json.dumps({
"entry": [
{
"@name": f"{object['name']}",
f"{object['type']}": f"{object['ip']}"
}
]
})
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-PAN-KEY': 'LUFRPT05SDExaFNseXkwZDZtUk9kNmRxYnhhWFAySUk9Vm8yQThKYVdNYzhzdGNMTkxzZlQxSC85SDhEUEkwWVBrajdKTStYUGZrQ3hpYkUrRnFBN3JtT1BWdnRKQjhxMA==',
'Cookie': 'PHPSESSID=db0278ee49c9ace2f10e9cdd667aaa36'
}
response = requests.request("POST", url, headers=headers, data=payload, verify=False)
print(response.text)
Executing the above script, results in a partial failure. It results in the successful creation of the network object test1
and test2
. However, when it comes to test3
things take a turn for the worse. The firewall refuses to create the network object and throws an error message that looks something like this:
{"code":3,"message":"Invalid
Object","details":[{"@type":"CauseInfo","causes":[{"code":12,"module":"panui_mgmt","description":"Invalid Object: test3
-> ip-range 1.1.1.30-1.1.1.20 range start IP is higher than range end IP. test3 -> ip-range is invalid."}]}]}
This error message is the firewall's way of saying, "Hey, I can't work with this! Give me a proper IPv4 address." It's a clear indication that the data being passed to the firewall doesn't meet its requirements.
So, while our script may have partially succeeded in creating some network objects, it ultimately falls short due to the invalid data.
The Solution
To steer clear of those pesky partial failures, it's crucial to validate our dataset before it even gets near our script—especially when we're handing it off to the Palo Alto Networks API to create network objects.
Enter pydantic, our trusty ally in the world of data validation. We can craft a pydantic model to ensure our data meets the grade before it ever interacts with the script. Pydantic isn't just limited to defining types; it's also adept at performing conditional checks on our data. This means we can set up rules and conditions that our data must meet in order to pass validation. It's like having a built-in guardrail to ensure our data stays on the right track. Let's explore how we can harness this powerful feature to further enhance our data validation process.
Let's delve into the process of defining a model and setting up checks for the same dataset we examined in the previous example. This hands-on approach will give us a clearer understanding of how we can leverage pydantic's capabilities to ensure our data meets our criteria.
To put our data through the validation wringer, we'll attempt to initialize the class we defined with our dataset. Brace yourself, though—I've purposely sprinkled a few more errors into the dataset below for demonstration purposes. Let's see how our pydantic model handles the challenge!
data = [
{"name": "test1", "ip": "1.1.1.300/32", "type": "ip-netmask"},
{"name": "test2", "ip": "*.paloaltonetworks.com", "type": "fqdn"},
{"name": "test3", "ip": "1.1.1.30-1.1.1.20", "type": "ip-range"},
]
try:
output = NetworkAddresses(addresses=data)
print(output)
except ValidationError as e:
print(e)
Now, executing the validation results in the below validation error messages.
ValidationError(
model='NetworkAddresses',
errors=[
{'loc': ('addresses', 0, '__root__'), 'msg': "'1.1.1.300/32' does not appear to be an IPv4 or IPv6 network", 'type': 'value_error'},
{'loc': ('addresses', 1, '__root__'), 'msg': ' *.paloaltonetworks.com Invalid FQDN', 'type': 'value_error'},
{'loc': ('addresses', 2, '__root__'), 'msg': 'Start ip - 1.1.1.30 must be less than end ip - 1.1.1.20.', 'type': 'value_error'}
]
)
Upon closer observation of the error messages, it clearly indicates the location of the error (0 indicating the 1st object in our list of data) and the same descriptive error message we defined in our class.
Conclusion
In conclusion, the importance of data validation cannot be overstated. By ensuring our data is thoroughly validated, we greatly reduce the risk of encountering partial failures in our scripts. Furthermore, catching errors in our data early on allows us to address them proactively, preventing potential headaches down the line. So remember, when it comes to scripting success, thorough data validation is your best friend.