Spheres of (Coding) Validation

Aug 27, 2024

4 min read

Determining Who is Responsible for Validation in a Secure, Scalable, Multi-Tenant SaaS Application From a 30,000 foot view, there are three layers to any SaaS application:

Client Application
Middleware
Data Store

When I started working on SaaS applications, around 2005, the framework didn’t really exist for many of the easier implementations available to architects and developers today. The website had a lot of JavaScript code that handled all sorts of things that would never be done today, including what a user was able to see and do. With MVC frameworks that allow for the middleware to handle all the logic for what a user can and cannot see, applications can be far more secure. No more using developer tools to modify a value to give a bad actor more rights than they should have.

Broadly, here are the best practices for what each level should be responsible for and why:

Client Application

The client application should have no logic in it apart from data validation. Got a form that needs to be filled out? The application should have enough logic to validate the data:

length of strings
format of email addresses, phone numbers, postal codes, etc.

That should be all the client application does. No business logic. It’s not needed.

Middleware

The code between the UI the users interact with and the backend datastore is where the most complex validation should happen.

Data Validation

Wait, wasn’t this done in the UI? Yes. However, the best practice is to revalidate the data before sending it to the datastore. It’s possible that something other than the UI you have developed is sending the data. Always validate the data. Is the email in the correct format? Is the GUID valid?

Validate the Auths

Auths are for both Authentication and Authorization. Is the authentication still valid? If so, is the user Authorized? That last one is something that requires ticking different boxes:

Action Authorization

It is a basic tenant of security that user actions must be validated at every step. The first question to answer is, “Does the user have the authorization to do this action?” Can they look at this object? Can they manipulate this object? However, best practices dictate that we go beyond the basics. If a user can look at an object, are they authorized to look at every property of the object? Is there data on the object that they are not allowed to see? If so, then it should never be delivered to the client for that user.

Authorized Data

Take an application that handles all the administration of an apartment complex. Within that application there would be Tenants and it may be that there is a field in the data for a Tenant that indicates happiness with the tenant. That information should only be available to the management of the apartment, not to the Tenant. If the GetTenantAPI returned that information and just didn’t display it to the tenant, it would still be possible for a savvy tenant to use DevTools and see all the information that the call to the GetTenantAPI returns, whether it is displayed in the UI or not. By having the GetTenantAPI determine whether the user initiating the call is a tenant or manager, the middleware can return the data or not. It would be better to have a separate API for getting tenant information for managers and tenants that return separate models, however, real life means applications exist that have added data that is not screened in this manner until something happens to force the change.

Data Store

The data store should have the least amount of validation code, but it still should have some. There are two types of validation that the data store should handle:

Data Integrity

If the data store is forgoing foreign keys as a means of improving scalability, then it is up to the data store to validate that the data being manipulated is valid. For example, if there is an update being made to a table that is three levels deep, then an identifier for all three levels should be passed in and the relationship chain verified before the change is allowed. If the chain is not valid then the data is not changed, and an error should be thrown.

User Authorization

To be clear, this type of authorization is not related to the action the user is taking, only to whether the user has access to the object at all. The middleware is responsible for determining if the user can do what they are attempting to do. The data store is simply responsible for determining whether the user has access to the object or not. If they do not, then an error should be thrown.

Yes, I know that the middleware has already done this. This seems redundant. Having said that, I also know that sometimes there will be needs to call stored procedures in some manner other than through the API. In those instances, the stored procedures should handle verifying that the user information being passed in is valid and not allow the process if the authorization fails.

Beyond Data Integrity

It is my firm belief that, especially with important data, every change should be logged in a way that makes it easy to see who did what to it. At the simplest level, I create a history table for each object that has all the information in the source table along with a HistoryId, the date the data was archived and the information about the user who made it.

In some cases where it is necessary to track such changes for legal reasons, I might go as far as to store the data in table with two additional fields:

A JSON field that contains the data from the source table before the change was made
A field that uses the date time stamp of the change to create a SHA256 hash of the JSON data and information about the user making the change.