In information technology, data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations. Data is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture.
Responsibilities of a Data Architect
Like a conventional architect that designs homes or buildings, a data architect creates a blueprint of a data environment that aligns with the short- and long-term goals of an organization and its unique cultural and contextual requirements.
The data architect is typically responsible for defining the target state, aligning during development and then following up to ensure enhancements are done in the spirit of the original blueprint.
During the definition of the target state, the Data Architecture breaks a subject down to the atomic level and then builds it back up to the desired form. The data architect breaks the subject down by going through 3 traditional architectural processes:
- Conceptual – represents all business entities.
- Logical – represents the logic of how entities are related.
- Physical – the realization of the data mechanisms for a specific type of functionality.
What is Data Architecture
Data architecture defines a standard set of products and tools an organization uses to manage data. But it is much more than that. A data architecture defines the processes to capture, transform, and deliver usable data to business users. Most importantly, it identifies the people who will consume that data and their unique requirements. A good data architecture flows right to left: from data consumers to data sources.
A data architecture should set data standards for all its data systems as a vision or a model of the eventual interactions between those data systems. Data integration, for example, should be dependent upon data architecture standards since data integration requires data interactions between two or more data systems. A data architecture describes the data structures used by a business and its computer applications software. Data architecture addresses data in storage, data in use and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc. It provides criteria for data processing operations so as to make it possible to design data flows and also control the flow of data in the system.
It is the process of standardizing how organizations collect, store, transform, distribute, and use data. The goal is to deliver relevant data to people who need it, when they need it, and help them make sense of it.
Data architecture describes the structure of an organization’s logical and physical data assets and data management resources. The goal of data architecture is to translate business needs into data and system requirements and to manage data and its flow through the enterprise.
Data architecture components
Data architecture can be synthesized into three overall components:
- Data architecture outcomes. These are the models, definitions, and data flows often referred to as data architecture artifacts.
- Data architecture activities. These are the forms, deploys, and fulfills of data architecture intentions.
- Data architecture behaviors. These are the collaborations, mindsets, and skills of the various roles that affect an enterprise’s data architecture.
Data architecture characteristics
Data architecture is built around certain characteristics:
Automation removes the friction that made legacy data systems tedious to configure. Processes that took months to build can now be completed in hours or days using cloud-based tools. If a user wants access to different data, automation enables the architect to quickly design a pipeline to deliver it. As new data is sourced, data architects can quickly integrate it into the architecture. And to create an adaptable architecture in which data flows continuously, data architects automate everything.
Security is built into modern data architecture, ensuring that data is available on a need-to-know basis as defined by the business. Good data architecture also recognizes existing and emerging threats to data security, and ensures regulatory compliance with legislation like HIPAA and GDPR.
In the past, data was static and access was limited. Decision makers didn’t necessarily get what they wanted or needed, but what was available. In modern data architecture, business users can confidently define the requirements, because data architects can pool data and create solutions to access it in ways that meet business objectives.
A good data architecture continuously evolves to meet new and changing customer information needs.
Any data architecture must be resilient, with high availability, disaster recovery, and backup/restore capabilities.
- Scalable data pipelines
To take advantage of emerging technologies, data architectures support real-time data streaming and micro-batch data bursts.
Effective data architecture is built on data structures that encourage collaboration. Good data architecture eliminates silos by combining data from all parts of the organization, along with external sources as needed, into one place to eliminate competing versions of the same data. In this environment, data is not bartered among business units or hoarded, but is seen as a shared, companywide asset.
- Driven by AI
Data architecture uses machine learning and artificial intelligence to build the data objects, tables, views, and models that keep data flowing. Smart data architecture takes automation to a new level, using machine learning (ML) and artificial intelligence (AI) to adjust, alert, and recommend solutions to new conditions. ML and AI can identify data types, identify and fix data quality errors, create structures for incoming data, identify relationships for fresh insights, and recommend related data sets and analytics.
Elasticity allows companies to scale up or down as needed. Cloud allows on-demand scalability quickly and affordably. Elasticity allows administrators to focus on troubleshooting and problem solving. Elastic architectures free administrators from having to calibrate capacity exactly, throttle usage if necessary, and overbuy hardware incessantly. Elasticity also spawns many types of applications and use cases, such as on-demand development and test environments, analytic sandboxes, and prototyping playgrounds.
Simplicity trumps complexity in efficient data architecture. Strive for simplicity in data movement, data platforms, data assembly frameworks, and analytic platforms.
The simplest architecture is the best architecture. To reduce complexity, organizations should strive to limit data movement and data duplication and advocate for a uniform database platform, data assembly framework, and analytic platform, despite the howls of best-of-breed proponents.
A modern data architecture needs to be flexible enough to support a multiplicity of business needs. It needs to support multiple types of business users, load operations and refresh rates, query operations, deployments, data processing engines and pipelines.
Governance is the key to self-service. A modern data architecture defines access points for each type of user to meet their information requirements. Data scientists need to be given access to raw data in the landing area or, better yet, a purpose-built sandbox where they can mix raw corporate data with their own data.
Modern data architectures are designed to support elastic scaling, high availability, end-to-end security for data in motion and data at rest, and cost and performance scalability.
- Seamless data integration
Data architectures integrate with legacy applications using standard API interfaces. They are optimized for sharing data across systems, geographies, and organizations.
- Real-time data enablement
Modern data architectures support the ability to deploy automated and active data validation, classification, management, and governance.
- Decoupled and extensible
Modern data architectures are designed to be loosely coupled, enabling services to perform minimal tasks independent of other services.
Data architecture principles
According to Joshua Klahr, vice president @AtScale, there are six principles that form the foundation of modern data architecture:
- Data is a shared asset. A modern data architecture needs to eliminate departmental data silos and give all stakeholders a complete view of the company.
- Users require adequate access to data. Modern data architectures needs to provide interfaces that make it easy for users to consume data using tools fit for their jobs.
- Security is essential. Modern data architectures must be designed for security and they must support data policies and access controls directly on the raw data.
- Common vocabularies ensure common understanding. Shared data assets, such as product catalogs, fiscal calendar dimensions, and KPI definitions, require a common vocabulary to help avoid disputes during analysis.
- Data should be curated. Invest in core functions that perform data curation
- Data flows should be optimized for agility. Reduce the number of times data must be moved to reduce cost, increase data freshness, and optimize enterprise agility.
If you have a data architecture project and you need help, let us know, we are experts in data architecture!