VCDX Design Factors - Gathering Information, Defining Requirements and Identify Risks, Constraints and Assumptions

In my previous blog post I have been talking about the different design factors that influence the design decisions made throughout an architectural design.

This article is part of my VCDX blog article series that can be found here.

In this article I am going to talk about how to determine these design factor elements and gather information to start determining the requirements. When the requirements are determined we use those to identify Risks, Constraints and make some assumptions when a requirement leaves room for interpretation.

Before we continue we neet to know what the following terms mean:

Recovery Time Objective (RTO) - is a measure of time to recover an environment in case of a disaster / failure. This measured in hours or days. The shorter the RTO, the faster the services are returned from a failed to a recovered state.

Recovery Point Objective (RPO) - is a measure how much dataloss is allowed in case or a disaster / failure. The RPO should dictate how often data is replicated or backed up. If the data is backed up every 2 days the RPO will be 48 hours. If the data is replicated every 8 hours, the RPO will be 8 hours.

These two together (RTO & RPO) define how quick a failure is recovered and what the possible loss is in case of a disaster / failure.

A short RPO means that there is less dataloss and this is possibly more expensive because backups / replications need to happen more often taking up more data usage.

A short RTO means that the recovery is fast and this may be expensive because of the way the it is designed.

Service Level Agreement (SLA) - is an agreement between the IT departement and several Business Units (BU). This agreement is about the availability, performance, responsiveness, RTO/RPO and resiliency (Fault Tolerance (FT)) for a given services it provides (in a given design).

You as the architect will use this information to determine the services offered and the rules around these services. In determining the requirements, risks, constraints and assumptions you design services with these SLA's into mind to see what is achievable of not.

Service Level Objective - is the goal or objective that needs to be met in order to achieve the SLA.

So how are the design factors determined?

To recap the design factors the requirements, risks, assumptions and constraints will influence the design decisions made throughout an architectural design.

Step one in the design process: Design factors need to be Gathered, Defined and Identified.

The workflow for determining the design factors is shown below:

This process of getting the design factors together usually happens in the beginning but may also be partly done throughout the design process (logical and physical).

GATHER - information to determine the requirements

This information can come from may sources:

Different lines of businesses - projected growth requirements, RTO/RPO/SLA/SLO (performance / redundancy)
Application Owners - dependancies, specs in order to run the application(s)
Other IT Stakeholders (CEO, CIO, CISO)
Other Operational teams (Storage, Network, etc.)
Project Managers - documentation that may be part of this project

Extracting information can happen by:

Having Meetings and Interviews with people responsible for the items I have mentioned above.
Reviewing existing documentation, designs of the current environment and IT solutions that may touch your new design, of future project that may be impacting. Documentation about existing RTO/RPO/SLA/SLO procedures, who is responsible for what?, Etc.
Perform an assessment of the current environment

Tools that are available to perform an assessment can be:

VMware Capacity Planner
Operating System Specific Tools - Performance Monitoring in Windows or TOP in Unix based OS'es
Existing server/network/storage monitoring/management solutions
Third party virtualization assessment tools - like RVTools for example or Cirba for example
Custom written scripts - Using PowerCLI for example or other scripting languages that are used as API wrappers
Other virtualization management tools - like vCenter Server, vRealize Operations

Be trough in your assessment, but its important to stay on scope!

Things that we REALLY NEED TO gather are:

RTO/RPO/SLA/SLO for each application or service (including storage current compute and network)
Application dependancies
Resource needs in terms of RAM, CPU, Storage speeds and capacity and network speeds
Performance expectations (could also be part of SLA/SLO)
Operational factors - Determine who is doing what
Current environment data about hardware/products/services/3rd party products/solutions

DEFINE - the actual requirements

When defining the requirements it is good to use language that is simple to understand for anyone. You need to be clear and very specific and not be vague by leaving room for interpretation. If there are questions in the requirements these need to be "fixed" with an assumption. All requirements need to be defined separately and don't try do combine multiple requirements into one big one.

What is also important to know is that some requirements may change during the design process.

Sample requirement: "The design should be resilient and highly available."

This requirement is very obvious but leaves room for a lot of questions. What part of the design? Resilient up to what level? What do they mean with "high available"?

The sample requirement can be rewritten with: "Wherever this is possible all components in the design should not poses any single points of failure."

Now we know that all elements in the design (compute, storage and networking) should be redundant in a way so that it can endure at least one failure.

Sample requirement: "A separate management VLAN and network links must me used for management traffic."

This is a very clear requirement.

When you defined the requirements, you need to separate the functional requirements from the non functional requirements (constraints) Non-functional requirements can modify things along the design. For example when a non-functional requirement states that a specific vendor should be used for the compute hardware and it seems that by doing this a risk is introduced when we use this vendor combination with other technology, the compute vendor may need to change.

Functional requirements however is not something that can be changed trough out the design process. There may be cases that collectively is decided by the project team that some requirements need to be changed, but usually this never changes.

IDENTIFY - Risks, constraints and Assumptions

Once the requirements are defined with the help of all the information coming from the sources described above, we can start doing an analysis on these requirements and identify the Risks, constraints and Assumptions.

The constraints are gathered by separating the functional with the non-functional requirements and the non-functional requirements are your constraints. This is because these non-functional requirements constrain the options that you have in satisfying the functional requirements.

You then take the requirements and the constraints and analyse these to see if they contain any risk. Note that if there are any risks these need to be discussed verified what the trade off will be and if possible mitigated. Risks could eventually lead to the change of a requirements or constraints.

If the requirements are vague or leave room for interpretation and leave you with more questions then answers then these need to be backfilled with assumptions and these assumptions need to be verified with the customer/stakeholders.

After all this is done we NEED approval from the project team on the design factors before we even THINK about creating the logical design.

Example constraint (non-functional requirement) (to determine a risk): "QNAP NAS offering has been pre-selected as a vendor to use for storage."

This constraint only does not tell us anything unless you bring this in the light of the following example functional requirement:

"The environment must me capable of supporting virtualized instances of Microsoft Exchange server 2013."

We know that Microsoft Exchange only supports block-level protocols and not NAS protocols when running in a VMware vSphere environment. So if we take the previous constraint ths requirement introduces a risk.

The risk (result): "The pre-selected storage solution will not support the applications that are in described in the functional requirements."

This is a conflict that needs to be resolved by modifying the functional requirement or the constraint. It is more likely that the constraint is going to be changed.

We can change the constraint to something like this: "QNAP storage platform has been pre-selected as a vendor to use for storage."

We have now changed it from a protocol level to a vendor level constraint. This gives us the flexibility that some applications that need block-level storage can use this and other applications can still go with NAS because of cost issues or something like that ...

After changing this constraint this still needs to be approved by the project team before we proceed with the logical design. I can not stress this more...

You can imagine that this process below (again) is very important:

This article is part of my VCDX blog article series that can be found here.