Skip to main content

Software Architecture

Architecture is about the important stuff…whatever that is.

-- Ralph Johnson

Architect 的职责
  • 做架构决定
  • 持续分析架构
  • 紧跟潮流
  • 确保决定能够被执行
  • 多样化的探索,经验
  • 业务知识(Domain Knowledge)
  • 人际交往技能
  • 了解,驾驭政治

MindMap of software architect roles

Structure vs Architecture

Software architecture structure

Architecture Characteristics

Software architecture characteristics

Architecture Decisions

Architecture decisions

Guide rather than Specify

Design Principles

Design Principles

Continually Analyze the Architecture

分析架构不是一天的事情,而是持续不断地评估,提出改进的建议。

Vitality 三年前定下的架构今天是否还有效?

不应该盲目自信,而应该批判的眼光看目前的架构

架构师做出的决定通常是会持续很久,难以改变的。了解当下的趋势和最新技术有助于对未来做出更好的判断,进而做出更好的决定。

Ensure Compliance with Decisions

有足够的权利和控制权,让Dev能够、不得不听从决定

Diverse Exposure and Experience

最起码各个板块都要有涉猎,有最基本的认识和理解。这个其实是比较痛苦的,需要去跨出自己的领域了解很多非常陌生,根本不了解的东西。相对来说,breadth(宽度)比depth(深度)更重要。比如对Hadoop的每个参数怎么调都非常了解,只会让你成为一个Hadoop的专家,但并不能让你成为一个优秀的大数据架构师。如果你能够了解Stream processing以及各种框架诸如Hadoop,Spark,Flink的发展和演进,各自的优劣,那么对架构师来说是更有价值的知识。

这一点其实我也想多写一些感受,毕竟这些时间也学了不少东西,很多时候真的没必要太纠结各种细节,因为这些东西你就算学的时候看了,一段时间不用肯定会忘记。更不要说人的年龄也会不断增长,没办法一直保持高强度的学习,记忆也会加速衰退,你要挑对你帮助最大的部分去学,比如了解架构、数据流、设计的逻辑等,然后去思考、总结,相对比精通”怎么用,怎么装“更有帮助。

Have Business Domain Knowledge

这就对人的交流技能要求比较高了。了解业务最快的方法,就是问。听听别人是怎么说的,比看什么文档快n倍。这就对你的提问能力有一定要求,最起码你不能问一些别人无法回答的问题,而是一些目的明确的,有价值的问题。恕我直言,目前xx大厂基本别指望有啥像样的文档了,敏捷开发哪有时间给你写文档,知识基本都是人传人。了解业务才能更好地了解问题,看上去是废话,但是由于现实的各种阻碍,这还确实是个问题。

Possess Interpersonal Skills, Understand and Navigate Politics

基本上架构的问题,都是人的问题,这个问题确实很哲学。因为归根结底,如果大家人脑编译器,即便是屎山也维护给你看,根本没问题。之所以要搞那么多架构,拆各种模块,就是因为人是会犯错的,而且一个公司有大佬也有平庸之人,所以这些设计其实不仅是技术,也是对人的一种容错。

如果一个公司发生了架构上的争执,那么最起码证明有人不愿意妥协。这时就需要一个有领导力,能够拍板的人去带领整个队伍前进——有时决定并没有最好,但你得做。

Unknown unknowns

Knowledge Pyramid1

前美国防部长Donald Rumsfeld说:

because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.

这也是为啥即便在项目开始前做Big Design,也无法一劳永逸。架构是随着业务和一些难以预料的事情(如:技术更迭,安全,政治因素等)而必须不断更迭的,你永远无法预测未知。我不知道明天的股票价格是怎么样,这是Known Unknown,因为股票会涨会跌,会停牌,这些可能性是确定的,我只是不知道现实会收敛到哪种可能性上。但是我无法预测一个我根本不知道的事情会以何种形式发生,这是Unknown unknown,也是很多人常说的“黑天鹅”。这也是为啥Engineering Practice必须和架构相辅相成,再好的架构没有优秀的迭代模式,也是注定被淘汰的。

Laws

Everything in software architecture is a trade-off.

If an architect thinks they have discovered something that isn’t a trade-off, more likely they just haven’t identified the trade-off yet.

Why is more important than how.

所有的设计都是一种妥协,如果你觉得发现了一种完美的设计,更大的可能性是你还没有发现它的缺点。

Bidirectional Communication

Collaboration View

我觉得这一点也非常重要,架构师并不是领导:我拍板定了架构,行吧,下面的dev小弟们你们就去整吧。不是这样的,因为下面的小弟首先不一定会听架构师的意见,其次不一定能做得到。而且架构师也并不知道dev小弟们做得如何,得不到反馈就会出现偏差。这就使得一个良好的沟通、交流的平台或者方式尤为重要。

Trade-offs

所有关于架构的问题都有一个通用的答案:

It depends.

没有正确答案,也没有错误答案。有的只是权衡,和妥协。

Cohesion, Coupling and Connascence

内聚,耦合与共生

Cohesion

Attempting to divide a cohesive module would only result in increased coupling and decreased readability.

-- Larry Constantine

这个回到我们软件工程学的东西了,各种内聚类型。

最差的内聚:

Temporal Cohesion(时间内聚)

Logical Cohesion(逻辑内聚)

Coincidental Cohesion(偶然内聚)

其次:

Communicational Cohesion(通信内聚)

Procedural Cohesion(过程内聚)

最好:

Sequential Cohesion(顺序内聚)

Functional Cohesion(功能性内聚)

关于顺序内聚和过程内聚,顺序内聚是没有执行顺序要求的,任意一个模块的输出都可以当做另一个模块的输入。而过程内聚则对输入的顺序有要求。

LCOM Modularity

X的内聚性最好,Y最差,因为Y的方法也可以被拆成多个独立的Class。Z则是混合内聚,可能需要重构。

Coupling

Abstractness

image-20220904094221466

m~a~是抽象的元素,m~c~是非抽象元素。比如一个5000行的main函数,A=1/5000,证明抽象度很低。

Instability

image-20220904094425948

C~e~代表Efferent,outgoing C~a~代表Afferent,Incoming

这个等式代表了改动一行上游/下游代码是否会影响这段代码。如果你要修改的函数调用多个class,那么很有可能上游的任意一个class改了代码(如版本升级等)你也要跟着改,这就是高Efferent。相对那么就是高Afferent。

Distance from the main sequence

image-20220904095130021

A就是Abstractness,I就是Instability。

img

这是很难平衡的过程。抽象度越高,意味着代码的功能越少,越没用。反之抽象度太低,维护成本又太高。

img

Connascence

共生

Two components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system.

-- Meilir Page-Jones

Static

Connascence of Name

Connascence of Type

Connascence of Meaning(Convention)

Connascence of Position

Connascence of Algorithm

Dynamic

Connascence of Execution

Connascence of Timing

Connascence of Values

Connascence of Idedntity

Strength

img

最简单的Refactoring比如消除magic value(Connascence of Meaning),代码中经常出现的字符串、整型,可以定义一个常量来做到Connascence of Name。

Locality

共生的本地性,如同一个Component的两个class有共生关系,比两个Component要好。

Degree

影响的code base的大小。就算是Dynamic connascence,如果代码仓库不大,影响也不大。但如果项目规模上去了,问题也会逐渐变严重。

Rule of thumb
  1. Minimize overall connascence by breaking the system into encapsulated elements
  2. Minimize any remaining connascence that crosses encapsulation boundaries
  3. Maximize the connascence within encapsulation boundaries
Common Rules

Rule of Degree: convert strong forms of connascence into weaker forms of connascence

Rule of Locality: as the distance between software elements increases, use weaker forms of connascence

img

Architecture Characteristics

Specifies a nondomain design consideration

通常每个应用都会有需求(Requirements),代表你必须实现的功能。而Architecture characteristics则明确一些成功的指标,如何实现功能。比如xx模块的性能指标,通常Requirements里不会定,但Architecture Characteristics里会包含。

Influences some structural aspect of the design

一些特殊的Case会影响架构的考量和设计

Operational Architecture Characteristics
TermDefinition
AvailabilityHow long the system will need to be available (if 24/7, steps need to be in place to allow the system to be up and running quickly in case of any failure).
ContinuityDisaster recovery capability.
PerformanceIncludes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and response times. Performance acceptance sometimes requires an exercise of its own, taking months to complete.
RecoverabilityBusiness continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on-line again?). This will affect the backup strategy and requirements for duplicated hardware.
Reliability/safetyAssess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will it cost the company large sums of money?
RobustnessAbility to handle error and boundary conditions while running if the internet connection goes down or if there’s a power outage or hardware failure.
ScalabilityAbility for the system to perform and operate as the number of users or requests increases.
Structural Architecture Characteristics
TermDefinition
ConfigurabilityAbility for the end users to easily change aspects of the software’s configuration (through usable interfaces).
ExtensibilityHow important it is to plug new pieces of functionality in.
InstallabilityEase of system installation on all necessary platforms.
Leverageability/reuseAbility to leverage common components across multiple products.
LocalizationSupport for multiple languages on entry/query screens in data fields; on reports, multibyte character requirements and units of measure or currencies.
MaintainabilityHow easy it is to apply changes and enhance the system?
PortabilityDoes the system need to run on more than one platform? (For example, does the frontend need to run against Oracle as well as SAP DB?
UpgradeabilityAbility to easily/quickly upgrade from a previous version of this application/solution to a newer version on servers and clients.
Cross-Cutting Architecture Characteristics

难以被分类的一些指标

TermDefinition
AccessibilityAccess to all your users, including those with disabilities like colorblindness or hearing loss.
ArchivabilityWill the data need to be archived or deleted after a period of time? (For example, customer accounts are to be deleted after three months or marked as obsolete and archived to a secondary database for future access.)
AuthenticationSecurity requirements to ensure users are who they say they are.
AuthorizationSecurity requirements to ensure users can access only certain functions within the application (by use case, subsystem, webpage, business rule, field level, etc.).
LegalWhat legislative constraints is the system operating in (data protection, Sarbanes Oxley, GDPR, etc.)? What reservation rights does the company require? Any regulations regarding the way the application is to be built or deployed?
PrivacyAbility to hide transactions from internal company employees (encrypted transactions so even DBAs and network architects cannot see them).
SecurityDoes the data need to be encrypted in the database? Encrypted for network communication between internal systems? What type of authentication needs to be in place for remote user access?
SupportabilityWhat level of technical support is needed by the application? What level of logging and other facilities are required to debug errors in the system?
Usability/achievabilityLevel of training required for users to achieve their goals with the application/solution. Usability requirements need to be treated as seriously as any other architectural issue.
Domain Concerns
Domain concernArchitecture characteristics
Mergers and acquisitionsInteroperability, scalability, adaptability, extensibility
Time to marketAgility, testability, deployability
User satisfactionPerformance, availability, fault tolerance, testability, deployability, agility, security
Competitive advantageAgility, testability, deployability, scalability, availability, fault tolerance
Time and budgetSimplicity, feasibility
Operational Measures

1%

除了平均性能,还应该能够测量Boundary conditions,如果1%的情况下性能很糟糕怎么办?很多情况下这些case并不会影响平均性能,但确实是问题。

Structural Measures

Cyclomatic Complexity (CC)

CC = E-N+2

public void decision(int c1, int c2) {
if (c1 < 100)
return 0;
else if (c1 + C2 > 500)
return 1;
else
return -1;
}

fosa 0601

通常来说,CC应当小于10。

Process Measures

Test Coverage,Deploy Time等等

Govarnance
Fitness functions

img

Conway’s law

img

Conway’s law:

Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.

通常在一个公司中,每个人的部门、职责都是根据业务模块划分的。然而,这也会导致对一些项目中共同问题的人为划分,人们会习惯将工作中的沟通模式带到设计模式中。

Architecture style

Big Ball of Mud

A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated.

The overall structure of the system may never have been well defined.

If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

-- Brian Foote and Joseph Yoder

真·屎山,没有清晰的架构,无数内部的互相调用。尽力避免,但现实中还是会发生

img

Unitary Architecture

单机架构

Client/Server

Desktop + database

Browser + webserver

Three tier

Fallacies of distributed architecture

Network is Reliable

Fallacy 1

Latency Is Zero

Fallacy 2

Bandwidth Is Infinite

Fallacy 3

The Network Is Secure

Fallacy 4

The Topology Never Changes

Fallacy 5

There Is Only One Administrator

Fallacy 6

Transport Cost Is Zero

Fallacy 7

The Network Is Homogeneous

Fallacy 8

Layered Architecture Style

Standard logical layers

物理层面的不同切分

Physical topology variants

Layered architecture是一种典型的technically partitioned,按照技术来分层,而不是按照业务模块来区分。每一层的功能都是按照技术上的角色来划分的。

Architecture sinkhole antipattern

这个其实很常见,经常会遇到。比如一个request打过来,真正的处理功能在第四层。但由于分层设计,它不能直接抵达第四层,所以前面三层都只是简单地pass through,甚至有些是trivial function,单纯地尾调用下一层函数而不经过任何处理。如果熟悉react的同学应该能理解,就有点像为了给一个子组件传参数,需要一层一层从最顶层组件向下传,带来巨大的不必要开销。

Rating

通常Layered architecture适合简单的原型程序,或者还没有定下具体架构的程序。但是它简单,易实现。

Layered Ratings

Pipeline

比如MapReduce,就是这个pattern:

img

很多监控等也是用这种架构

Pipeline Example

Ratings

Pipeline Ratings

和Layered差不多,Elasticity和Scalability都很差。

Microkernel Architecture

Microkernel architecture core system variants

比如IDEA,很多ide都是这种结构,避免了如下情况的发生:

public void assessDevice(String deviceID) {
if (deviceID.equals("iPhone6s")) {
assessiPhone6s();
} else if (deviceID.equals("iPad1"))
assessiPad1();
} else if (deviceID.equals("Galaxy5"))
assessGalaxy5();
} else ...
...
}
}

中间的Core System可以是Layered monolithic,Domain component等架构

Microkernel Ratings

Service Based Architecture

Service-based architecture basic topology

比如Docker就是典型的例子

Service-based architecture API variant

增加一个Gateway/ Proxy layer可以提升整合性和安全性。Trade-off肯定是增加了额外的解析等开销。

Service-Based Ratings

Event-Driven Architecture

Request-based Model

Broker Topology

没有中心的Mediator,消息分布在各个组件去处理。在处理简单的消息时非常好用

通常是一个消息队列

AdvantagesDisadvantages
Highly decoupled event processorsWorkflow control
High scalabilityError handling
High responsivenessRecoverability
High performanceRestart capabilities
High fault toleranceData inconsistency
Mediator Topology

处理复杂事件时,如果异常处理都让组件去做显得有些不合适,因为组件设计出来就是带着解耦合、可复用的目标的。这时就需要一个中心化的调停者去协调事件

Mediator Topology

Mediator通常都是设计为Domain(业务)划分或者一系列事件划分,来避免单点故障。

很多mediator如Apache Camel, 已经retire的Apache ODE等

Mediator Example 1

AdvantagesDisadvantages
Workflow controlMore coupling of event processors
Error handlingLower scalability
RecoverabilityLower performance
Restart capabilitiesLower fault tolerance
Better data consistencyModeling complex workflows

fire-and-forget processing (no response required)

可能需要额外的处理(FIFO队列)等来维护消息的顺序(如:操作1、2,1失败了,但2必须在1之后执行)

Workflow Event Pattern Example

Request-Based v Event-Based
Advantages over request-basedTrade-offs
Better response to dynamic user contentOnly supports eventual consistency
Better scalability and elasticityLess control over processing flow
Better agility and change managementLess certainty over outcome of event flow
Better adaptability and extensibilityDifficult to test and debug
Better responsiveness and performance
Better real-time decision making
Better reaction to situational awareness
Ratings

Event-Driven Ratings

Space-Based Architecture

Space-based architecture Topology

这应该是微服务之前的解决动态Elasticity和Scalability的方案了。中间直接加一层Virtualized Middleware,相当于把集群的资源都虚拟化成资源池了,然后应用根据实际情况去占用、分配资源。

Decision criteriaReplicated cacheDistributed cache
OptimizationPerformanceConsistency
Cache sizeSmall (<100 MB)Large (>500 MB)
Type of dataRelatively staticHighly dynamic
Update frequencyRelatively lowHigh update rate
Fault toleranceHighLow

Space-based Ratings

Orchestration-Driven Service-Oriented Architecture

img

假设有一个应用,它的消息流是这样的:

img

然后我们由于为了复用组件,形成了下面的结构:

img

这样的结构,真的好么?首先,Customer Service简直就是压力山大,想改个细微的东西都很困难,得得到茫茫多下游Service的同意,而协调的过程肯定少不了撕逼和扯皮,相信各位程序猿相对来说都会觉得这类事情比写代码还要麻烦。因此纯粹的技术上的分区、复用,现实中可能是非常糟糕的。

img

Microservices Architecture

万恶之源 “Microservices,” Martin Fowler和James Lewis在2014年发布的博客

img

Bounded Context

每个Service都代表一块业务或者某个流程。

Granularity

The term “microservice” is a label, not a description.

-- Martin Fowler

太细粒度的设计,绝对会影响整体的业务。Microservice只是一个标签,一个名字,而不是戒条。

Sidecar

img

有些东西我们是希望和服务强耦合的,比如监控、日志、打点等。不同Service之间的Sidecar可以通过Service mesh进行统一管理。

choreography

很像event driven里的broker topology,没有中间的协调者,也因此会出现异常处理等的麻烦。

也因此,最好不要在多个服务之间去做Transaction。

Don’t do transactions in microservices—fix granularity instead!

Saga Pattern

在发生异常时,做补偿操作

img

img

Soft Skills

看了一眼,还是挺有意思的。如果有幸能够成为一个Architect,那么可以尝试用一下(

首先就是不要怕做决定,要经常和Team沟通来保持信息畅通。

然后就是只通知相关受影响的人,不要去打扰无关的人。

Architecture Decision Records

比如用Markdown 记录下决定

Basic ADR Structure

Analyzing Risk

Risk Matrix

Risk Assessment

Assessment Direction 2

Team Boundaries

Boundaries

团队内关系很重要,太亲密太疏远都不好,拿捏这个平衡很难。

领导既不能是Armchair President甩手掌柜,也不能是Control Freak控制狂。

Advices

How do we get great designers? Great designers design, of course.

Fred Brooks

So how are we supposed to get great architects, if they only get the chance to architect fewer than a half-dozen times in their career?

Ted Neward

There are not right or wrong answers in architecture—only trade-offs**.