【JS语法】赋值运算符执行过程经典面试题+v8源码AssignmentExpression Parser和字节码生成过程浅析
赋值运算符执行过程经典面试题
本文juejin:juejin.cn/post/724066…
let o1 = {x: 100}
let o2 = o1
o1.y = o1 = {x: 200}
console.log('??', o1, o1.y) // o1 = {x: 200}
console.log('??', o2) // o2 = {x: 100, y: {x: 200}}
做这题只需要背下一个结论:JS引擎是怎样计算一般的赋值表达式A = B
的呢?简单地说,按如下步骤:
- 计算表达式A,得到一个引用
refA
。 - 计算表达式B,得到一个值
valueB
。 - 将
valueB
赋给refA
指向的名称绑定。 - 返回
valueB
。
这个结论来自参考链接2。原文:
The production AssignmentExpression : LeftHandSideExpression
**=**
AssignmentExpression is evaluated as follows:
- Let lref be the result of evaluating LeftHandSideExpression.
- Let rref be the result of evaluating AssignmentExpression.
- Let rval be GetValue(rref).
- Throw a SyntaxError exception if the following conditions are all true:
- Type(lref) is Reference is true
- IsStrictReference(lref) is true
- Type(GetBase(lref)) is Environment Record
- GetReferencedName(lref) is either
**"eval"**
or**"arguments"**
- Call PutValue(lref, rval).
- Return rval.
NOTE When an assignment occurs within strict mode code, its LeftHandSide must not evaluate to an unresolvable reference. If it does a ReferenceError exception is thrown upon assignment. The LeftHandSide also may not be a reference to a data property with the attribute value {[[Writable]]:false}, to an accessor property with the attribute value {[[Set]]:undefined}, nor to a non-existent property of an object whose [[Extensible]] internal property has the value false. In these cases a TypeError exception is thrown.
所以第一步就是解析o1.y
,将最后一个赋值语句o1.y = o1
认定为在{x: 100}
这个对象上添加属性。第二步是o1 = {x: 200}
,o1
指向新对象。第三步是({x: 100}).y = o1的当前值
,即{x: 100}
被修改了。
知道这个结论后,立刻有推论:
o2.y = o1 = {x: 200}
这句话作用和o1.y = o1 = {x: 200}
完全一样。- 在JS里,连续赋值和拆为多个语句一般是不等价的:
let o1 = {x: 100}
let o2 = o1
o1 = {x: 200}
o1.y = o1
console.log('??', o1) // o1 变成一个自指的对象
console.log('??', o2) // o2 = {x: 100}
从AST和字节码的角度来看连续赋值
这一节只是我的口胡。其实还可以更进一步地考虑这个问题:这是编译器实现策略的问题。我们看下o1.y = o1 = {x: 200}
这句话的AST结构,传送门:
AssignmentExpression:
left: MemberExpression
object: Identifier
property: Identifier
right: AssignmentExpression
left:
Identifier: o1
right:
ObjectExpression
正常人dfs遍历AST的时候会先遍历左边,把左边的变量当时指向的堆地址先解析出来。但如果我们先遍历右边,那么执行o1.y = o1
的o1.y
就会取到{x: 200}
这个对象了:
- Let rref be the result of evaluating AssignmentExpression.
- Let rval be GetValue(rref).
- Let lref be the result of evaluating LeftHandSideExpression.
- Throw a SyntaxError exception if the following conditions are all true:
- Type(lref) is Reference is true
- IsStrictReference(lref) is true
- Type(GetBase(lref)) is Environment Record
- GetReferencedName(lref) is either
**"eval"**
or**"arguments"**
- Call PutValue(lref, rval).
- Return rval.
接下来从字节码的角度来看上述经典面试题。node --print-bytecode assign_demo.js
可以输出JS字节码。输出量很大,为了方便定位,我们可以用一个函数包裹:
function assign_demo() {
let o1 = {x: 100}
let o2 = o1
o1.y = o1 = {x: 200}
console.log('??', o1, o1.y) // o1 = {x: 200}
console.log('??', o2) // o2 = {x: 100, y: {x: 200}}
}
assign_demo()
定位到的字节码:
[generated bytecode for function: assign_demo (0x0067c62f8531 <SharedFunctionInfo assign_demo>)]
Bytecode length: 60
Parameter count 1
Register count 7
Frame size 56
OSR nesting level: 0
Bytecode Age: 0
37 S> 00000067C62F8F66 @ 0 : 7b 00 00 29 CreateObjectLiteral [0], [0], #41 ; 创建对象{x: 100},存在常量池下标0
00000067C62F8F6A @ 4 : c3 Star0 ; r0寄存器指向常量池下标0的对象
58 S> 00000067C62F8F6B @ 5 : c2 Star1 ; r1寄存器也指向常量池下标0的对象
64 S> 00000067C62F8F6C @ 6 : 7b 01 01 29 CreateObjectLiteral [1], [1], #41 ; 创建对象{x: 200},存在常量池下标1
00000067C62F8F70 @ 10 : c3 Star0 ; r0寄存器现在指向常量池下标1的对象,{x: 200}
69 E> 00000067C62F8F71 @ 11 : 32 f9 02 02 StaNamedProperty r1, [2], [2] ; 常量池下标2是“y”,r1现在指向常量池下标0的对象,所以相当于给常量池下标0的对象({x: 100}).y赋值
88 S> 00000067C62F8F75 @ 15 : 21 03 04 LdaGlobal [3], [4] ; 常量池下标3是console,“[4]”和反馈向量有关,本文可以不管
00000067C62F8F78 @ 18 : c0 Star3 ; r3指向console
96 E> 00000067C62F8F79 @ 19 : 2d f7 04 06 LdaNamedProperty r3, [4], [6] ; 加载r3指向的对象的属性到累加寄存器,属性名是常量池下标4,即log
00000067C62F8F7D @ 23 : c1 Star2 ; r2指向console.log这个对象
00000067C62F8F7E @ 24 : 13 05 LdaConstant [5] ; 累加寄存器现在是'??'
00000067C62F8F80 @ 26 : bf Star4 ; 累加寄存器的值写入r4,r4是'??'
113 E> 00000067C62F8F81 @ 27 : 2d fa 02 08 LdaNamedProperty r0, [2], [8] ; 读取({x: 200}).y,存到累加寄存器
00000067C62F8F85 @ 31 : bd Star6 ; ({x: 200}).y的值写入r6
00000067C62F8F86 @ 32 : 19 fa f5 Mov r0, r5 ; AT&T格式-_-。把r0寄存器赋值给r5寄存器,r5指向常量池下标1的对象{x: 200}
96 E> 00000067C62F8F89 @ 35 : 5b f8 f7 04 0a CallProperty r2, r3-r6, [10] ; r2, r3负责执行输出,没看懂这句,但理应用到r0和r6,理应输出 “?? {x: 200} undefined”
136 S> 00000067C62F8F8E @ 40 : 21 03 04 LdaGlobal [3], [4] ; 同上
00000067C62F8F91 @ 43 : c0 Star3 ; r3指向console
144 E> 00000067C62F8F92 @ 44 : 2d f7 04 06 LdaNamedProperty r3, [4], [6] ; 同上
00000067C62F8F96 @ 48 : c1 Star2 ; 同上
00000067C62F8F97 @ 49 : 13 05 LdaConstant [5] ; 累加寄存器现在是'??'
00000067C62F8F99 @ 51 : bf Star4 ; 同上,r4是'??'
144 E> 00000067C62F8F9A @ 52 : 5e f8 f7 f6 f9 0c CallProperty2 r2, r3, r4, r1, [12] ; r2, r3负责执行输出;r1没动过,所以输出了常量池下标0的对象。
00000067C62F8FA0 @ 58 : 0e LdaUndefined
189 S> 00000067C62F8FA1 @ 59 : a8 Return
Constant pool (size = 6)
00000067C62F8EF1: [FixedArray] in OldSpace
- map: 0x0198dd8012c1 <Map>
- length: 6
0: 0x0067c62f8ea1 <ObjectBoilerplateDescription[3]>
1: 0x0067c62f8ec9 <ObjectBoilerplateDescription[3]>
2: 0x01e31fdd5b49 <String[1]: #y>
3: 0x014f3febaf11 <String[7]: #console>
4: 0x014f3fe8caa9 <String[3]: #log>
5: 0x0067c62f8e41 <String[2]: #??>
Handler Table (size = 0)
Source Position Table (size = 28)
0x0067c62f8fa9 <ByteArray[28]>
看上去是优化过的-_-,并没有体现“先获取lref
再获取rval
最后赋值的事实”。所以我会希望在源码中找到参考链接2表述所对应的代码,需要读AssignmentExpression Parser建AST和遍历Assignment AstNode生成字节码这两块代码。
V8源码AssignmentExpression Parser建AST过程浅析
AstNode
表示AST节点的类,比如Statement, Expression, Assignment, CompoundAssignment
都继承自AstNode
,src\ast\ast.h
。并且有继承关系链CompoundAssignment -> Assignment -> Expression -> AstNode
。
接下来在V8源码中找下AssignmentExpression
的实现。
src\parsing\parser-base.h
和src\parsing\parser.h
主要实现了ParserBase, Parser
类。AssignmentExpression
相关代码主要在src\parsing\parser-base.h
。传送门
// ParseAssignmentExpression调用ParseAssignmentExpressionCoverGrammar,ParseAssignmentExpressionCoverGrammar递归调用ParseAssignmentExpression
// 返回值就是创建的AST节点。
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseAssignmentExpression() {
// 可以理解为,使用impl()就是为了让父类能够直接调用子类函数。
ExpressionParsingScope expression_scope(impl());
ExpressionT result = ParseAssignmentExpressionCoverGrammar();
expression_scope.ValidateExpression(); // TODO:不知道是什么作用
return result;
}
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseAssignmentExpressionCoverGrammar() {
// 上下文无关文法。代码中的这类注释描述的是非终结符(Non-Terminal)AssignmentExpression的产生式(Productions)。
// AssignmentExpression ::
// ConditionalExpression
// ArrowFunction
// YieldExpression
// LeftHandSideExpression AssignmentOperator AssignmentExpression
int lhs_beg_pos = peek_position();
if (peek() == Token::YIELD && is_generator()) {
return ParseYieldExpression();
}
FuncNameInferrerState fni_state(&fni_);
DCHECK_IMPLIES(!has_error(), next_arrow_function_info_.HasInitialState());
// ParseConditionalExpression 就是 a ? b : c,ParseConditionalExpression在右侧符号不是“?”的时候会直接返回,所以相当于解析了左侧表达式,获得LHS的AST子树。
ExpressionT expression = ParseConditionalExpression();
Token::Value op = peek(); // 下一个token
if (!Token::IsArrowOrAssignmentOp(op)) return expression;
// Arrow functions. Token::ARROW就是“=>”。这段代码就是在解析箭头函数。
// V8_UNLIKELY是一个宏,在后文有解释。读代码时认为这个宏没有作用即可。
if (V8_UNLIKELY(op == Token::ARROW)) {
Scanner::Location loc(lhs_beg_pos, end_position());
// 箭头函数参数列表只能是标识符或者由小括号括起来。
if (!impl()->IsIdentifier(expression) && !expression->is_parenthesized()) {
impl()->ReportMessageAt(
Scanner::Location(expression->position(), position()),
MessageTemplate::kMalformedArrowFunParamList); // Malformed arrow function parameter list
return impl()->FailureExpression();
}
DeclarationScope* scope = next_arrow_function_info_.scope;
scope->set_start_position(lhs_beg_pos);
FormalParametersT parameters(scope);
parameters.set_strict_parameter_error(
next_arrow_function_info_.strict_parameter_error_location,
next_arrow_function_info_.strict_parameter_error_message);
parameters.is_simple = scope->has_simple_parameters();
next_arrow_function_info_.Reset();
impl()->DeclareArrowFunctionFormalParameters(¶meters, expression, loc); // 在后文解析
expression = ParseArrowFunctionLiteral(parameters); // TODO:很复杂,不看了,猜测是解析函数体的,AST节点的创建是由这个函数完成的,然后return给调用者
return expression;
}
// IsAssignableIdentifier(expression) 表示左侧表达式是“可以赋值的标识符”
if (V8_LIKELY(impl()->IsAssignableIdentifier(expression))) {
// TODO:猜测是处理 let (x) = 0; 这种情况?但似乎无法简单复现,因为这句话执行的报错是Reference Error: let is not defined。
if (expression->is_parenthesized()) {
// 消息模板 "Invalid destructuring assignment target"。注意这里没有return也没有异常,应该要继续执行到 Consume(op); 这一句。TODO:确认我的结论是否正确。
expression_scope()->RecordDeclarationError(
Scanner::Location(lhs_beg_pos, end_position()),
MessageTemplate::kInvalidDestructuringTarget);
}
expression_scope()->MarkIdentifierAsAssigned();
} else if (expression->IsProperty()) {
expression_scope()->RecordDeclarationError(
Scanner::Location(lhs_beg_pos, end_position()),
MessageTemplate::kInvalidPropertyBindingPattern);
expression_scope()->ValidateAsExpression();
} else if (expression->IsPattern() && op == Token::ASSIGN) {
// Destructuring assignmment. 看下IsPattern实现就明白。
if (expression->is_parenthesized()) {
Scanner::Location loc(lhs_beg_pos, end_position());
if (expression_scope()->IsCertainlyDeclaration()) {
// 猜测是处理 let ({x}) = {x: 100} 这种情况?报错也是Reference Error: let is not defined,无法简单复现。
impl()->ReportMessageAt(loc,
MessageTemplate::kInvalidDestructuringTarget);
} else {
// Syntax Error if LHS is neither object literal nor an array literal
// (Parenthesized literals are
// CoverParenthesizedExpressionAndArrowParameterList).
// #sec-assignment-operators-static-semantics-early-errors
// CoverParenthesizedExpressionAndArrowParameterList 见下文介绍,简单来说,这里处理的是 (x, y) = 1、(function foo() { }) = 1 这类语法错误,这些例子中的LHS可以是括号表达式或箭头函数参数,且整个表达式不是变量声明语句。
impl()->ReportMessageAt(loc, MessageTemplate::kInvalidLhsInAssignment);
}
}
expression_scope()->ValidateAsPattern(expression, lhs_beg_pos,
end_position());
} else {
DCHECK(!IsValidReferenceExpression(expression));
// For web compatibility reasons, throw early errors only for logical
// assignment, not for regular assignment.
// LogicalAssignmentOp 包括 ||=, &&=, and ??=
const bool early_error = Token::IsLogicalAssignmentOp(op);
// TODO:不清楚这段代码的作用
expression = RewriteInvalidReferenceExpression(
expression, lhs_beg_pos, end_position(),
MessageTemplate::kInvalidLhsInAssignment, early_error);
}
Consume(op); // TODO:这里猜测是有副作用的,执行后op就“加1”,指向了当前的赋值运算符。
int op_position = position();
ExpressionT right = ParseAssignmentExpression(); // 递归调用ParseAssignmentExpression,获取RHS的AST子树。
// Anonymous function name inference applies to =, ||=, &&=, and ??=. 下文讲解。
if (op == Token::ASSIGN || Token::IsLogicalAssignmentOp(op)) {
// 顾名思义,如果是 xxobj.yy = function(){...} 的赋值,则针对函数字面量做一个优化。
impl()->CheckAssigningFunctionLiteralToProperty(expression, right);
// Check if the right hand side is a call to avoid inferring a
// name if we're dealing with "a = function(){...}();"-like
// expression.
// 检查RHS是否为call,以避免在处理“a = function(){…}();”之类的表达式时推断出名称。TODO:没看懂-_-
if (right->IsCall() || right->IsCallNew()) {
fni_.RemoveLastFunction();
} else {
fni_.Infer();
}
impl()->SetFunctionNameFromIdentifierRef(right, expression);
} else {
fni_.RemoveLastFunction();
}
if (op == Token::ASSIGN) {
// We try to estimate the set of properties set by constructors. We define a
// new property whenever there is an assignment to a property of 'this'. We
// should probably only add properties if we haven't seen them before.
// Otherwise we'll probably overestimate the number of properties.
// 从注释来看,这是这个函数头一次针对 this.xx = yy 进行处理。但我没找到判断“之前没见过的属性”的地方……
if (impl()->IsThisProperty(expression)) function_state_->AddProperty();
} else {
// Only initializers (i.e. no compound assignments) are allowed in patterns.
// TODO:猜测是针对这种类型的无效语句:let {x: 114514} = { x: 100 } 的报错,这里114514是一个无效的解构赋值target。
expression_scope()->RecordPatternError(
Scanner::Location(lhs_beg_pos, end_position()),
MessageTemplate::kInvalidDestructuringTarget);
}
// factory()->NewXXX() 创建AST节点,在后文解析
return factory()->NewAssignment(op, expression, right, op_position);
}
一些额外的补充:
一、随处可见的impl()
是为了让父类能够直接调用子类函数。
template <typename Impl>
class ParserBase {
// All implementation-specific methods must be called through this.
Impl* impl() { return static_cast<Impl*>(this); }
const Impl* impl() const { return static_cast<const Impl*>(this); }
}
二、V8_UNLIKELY
和V8_LIKELY
可以认为是给编译器提供的分支预测信息,用来提升性能的。V8_UNLIKELY
表示condition
大概率是false。阅读源码时认为这个宏没有影响就行。
// A macro to provide the compiler with branch prediction information.
#if V8_HAS_BUILTIN_EXPECT
# define V8_UNLIKELY(condition) (__builtin_expect(!!(condition), 0))
# define V8_LIKELY(condition) (__builtin_expect(!!(condition), 1))
#else
# define V8_UNLIKELY(condition) (condition)
# define V8_LIKELY(condition) (condition)
#endif
三、ParseConditionalExpression
在发现表达式的下一个Token不是“?”的时候会直接返回表达式,所以ParseAssignmentExpressionCoverGrammar
可以放心地调用它。
// Precedence = 3
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseConditionalExpression() {
// ConditionalExpression ::
// LogicalExpression
// LogicalExpression '?' AssignmentExpression ':' AssignmentExpression
//
int pos = peek_position();
ExpressionT expression = ParseLogicalExpression();
return peek() == Token::CONDITIONAL
? ParseConditionalContinuation(expression, pos)
: expression;
}
四、IsAssignableIdentifier(expression)
表示左侧表达式是“可以赋值的运算符”。可见在严格模式下eval
和arguments
是不可重新赋值的标识符。实际上在严格模式下给这两个变量赋值会报错Unexpected eval or arguments in strict mode
。
bool IsAssignableIdentifier(ExpressionT expression) {
if (!impl()->IsIdentifier(expression)) return false;
if (is_strict(language_mode()) &&
impl()->IsEvalOrArguments(impl()->AsIdentifier(expression))) {
return false;
}
return true;
}
五、消息模板,比如MessageTemplate::kInvalidDestructuringTarget
,定义在src\common\message-template.h
。搜MessageTemplate::k
后面的字符串就能搜到:
#define MESSAGE_TEMPLATES(T) \
/* Error */ \
T(None, "") \
T(ConflictingPrivateName, \
"Operation is ambiguous because there are more than one private name" \
"'%' on the object") \
T(CyclicProto, "Cyclic __proto__ value") \
T(Debugger, "Debugger: %") \
// ...
T(InvalidDestructuringTarget, "Invalid destructuring assignment target") \
// ...
六、expression->IsPattern()
,判断是否是对象解构或数组解构符号。ExpressionT expression = ParseConditionalExpression();
并且using ExpressionT = typename Types::Expression;
,所以expression->IsPattern()
走的是src\ast\ast.h
的实现。不过下面还是顺便看了一下PreParserExpression
的实现。
// src\ast\ast.h
// 宏 AST_NODE_LIST 为入口,“调用”到 LITERAL_NODE_LIST ,“调用”到V(ObjectLiteral)和V(ArrayLiteral),于是产生 kObjectLiteral 和 kArrayLiteral
class Expression : public AstNode {
bool IsPattern() {
static_assert(kObjectLiteral + 1 == kArrayLiteral);
return base::IsInRange(node_type(), kObjectLiteral, kArrayLiteral);
}
}
// src\parsing\preparser.h
class PreParserExpression {
bool IsPattern() const {
static_assert(kObjectLiteralExpression + 1 == kArrayLiteralExpression);
return base::IsInRange(TypeField::decode(code_), kObjectLiteralExpression,
kArrayLiteralExpression);
}
}
kObjectLiteralExpression, kArrayLiteralExpression
是枚举值,kArrayLiteralExpression = kObjectLiteralExpression + 1
。
enum Type {
kNull,
kFailure,
kExpression,
kIdentifierExpression,
kStringLiteralExpression,
kSpreadExpression,
kObjectLiteralExpression,
kArrayLiteralExpression
};
顺便看下IsInRange
(在src\base\bounds.h
,namespacev8::base
)的实现:
template <typename T, typename U>
inline constexpr bool IsInRange(T value, U lower_limit, U higher_limit) {
DCHECK_LE(lower_limit, higher_limit);
static_assert(sizeof(U) <= sizeof(T));
using unsigned_T = typename std::make_unsigned<T>::type;
// Use static_cast to support enum classes.
return static_cast<unsigned_T>(static_cast<unsigned_T>(value) -
static_cast<unsigned_T>(lower_limit)) <=
static_cast<unsigned_T>(static_cast<unsigned_T>(higher_limit) -
static_cast<unsigned_T>(lower_limit));
}
七、op == Token::ASSIGN || Token::IsLogicalAssignmentOp(op)
的注释Anonymous function name inference applies to =, ||=, &&=, and ??=
是正确且必要的,但也可以看下IsLogicalAssignmentOp
的实现来理解为什么注释是正确的:
static bool IsLogicalAssignmentOp(Value token) {
return base::IsInRange(token, ASSIGN_NULLISH, ASSIGN_AND);
}
// BINARY_OP_TOKEN_LIST即入口,“调用”了EXPAND_BINOP_ASSIGN_TOKEN(T, NULLISH),于是"调用"了T(ASSIGN_NULLISH, "??" "=", 2),最终产生 ASSIGN_NULLISH。所以IsInRange就取到了 ASSIGN_NULLISH、ASSIGN_OR和ASSIGN_AND。相关代码:
#define EXPAND_BINOP_ASSIGN_TOKEN(T, name, string, precedence) \
T(ASSIGN_##name, string "=", 2)
/* Binary operators */
/* ADD and SUB are at the end since they are UnaryOp */
#define BINARY_OP_TOKEN_LIST(T, E) \
E(T, NULLISH, "??", 3) \
E(T, OR, "||", 4) \
E(T, AND, "&&", 5) \
E(T, BIT_OR, "|", 6) \
E(T, BIT_XOR, "^", 7) \
E(T, BIT_AND, "&", 8) \
E(T, SHL, "<<", 11) \
E(T, SAR, ">>", 11) \
E(T, SHR, ">>>", 11) \
E(T, MUL, "*", 13) \
E(T, DIV, "/", 13) \
E(T, MOD, "%", 13) \
E(T, EXP, "**", 14) \
E(T, ADD, "+", 12) \
E(T, SUB, "-", 12)
BINARY_OP_TOKEN_LIST(T, EXPAND_BINOP_ASSIGN_TOKEN) \
八、CoverParenthesizedExpressionAndArrowParameterList
(简写为CPEAAPL
)是ES2017引入的上下文无关文法(根据参考链接3,和cover grammar有关)符号。引入这个符号是为了解决一个文法二义性问题:考虑let x = (a, b);
和let x = (a, b) => { return a + b };
,我们确定这是一个AssignmentExpression
,并读到一个左括号后,如何确定是去解析箭头函数还是解析括号表达式?
这个符号可以直译为“括号表达式和箭头函数参数列表的并集的超集”,(a, b = 1)
(有效的ParenthesizedExpression和ArrowParameterList)、(function foo() { })
(有效的ParenthesizedExpression)、(a = 1, ...b)
(有效的ArrowParameterList)、(1, ...b)
(无效的ParenthesizedExpression和ArrowParameterList)都是有效的CPEAAPL
。
九、DeclareArrowFunctionFormalParameters
很复杂。看上去是进行作用域数据结构的维护,没有添加AST节点。
// src\parsing\parser.cc
void Parser::DeclareArrowFunctionFormalParameters(
ParserFormalParameters* parameters, Expression* expr,
const Scanner::Location& params_loc) {
if (expr->IsEmptyParentheses() || has_error()) return;
// 猜测主要是分配了内存空间,记录作用域信息还得下面的 DeclareFormalParameters 完成?
AddArrowFunctionFormalParameters(parameters, expr, params_loc.end_pos);
if (parameters->arity > Code::kMaxArguments) { // 箭头函数允许的最大参数个数 (1 << 16) - 2
ReportMessageAt(params_loc, MessageTemplate::kMalformedArrowFunParamList);
return;
}
DeclareFormalParameters(parameters);
DCHECK_IMPLIES(parameters->is_simple,
parameters->scope->has_simple_parameters());
}
// src\parsing\parser.h
V8_INLINE void DeclareFormalParameters(ParserFormalParameters* parameters) {
bool is_simple = parameters->is_simple;
DeclarationScope* scope = parameters->scope;
if (!is_simple) scope->MakeParametersNonSimple();
for (auto parameter : parameters->params) {
bool is_optional = parameter->initializer() != nullptr;
// If the parameter list is simple, declare the parameters normally with
// their names. If the parameter list is not simple, declare a temporary
// for each parameter - the corresponding named variable is declared by
// BuildParamerterInitializationBlock.
scope->DeclareParameter(
is_simple ? parameter->name() : ast_value_factory()->empty_string(),
is_simple ? VariableMode::kVar : VariableMode::kTemporary,
is_optional, parameter->is_rest(), ast_value_factory(),
parameter->position);
}
}
// AddArrowFunctionFormalParameters代码比较长就不贴了。其为递归调用,但最终会体现为调用AddFormalParameter,所以我们继续查看 AddFormalParameter 实现。src\parsing\parser.h
V8_INLINE void AddFormalParameter(ParserFormalParameters* parameters,
Expression* pattern,
Expression* initializer,
int initializer_end_position,
bool is_rest) {
parameters->UpdateArityAndFunctionLength(initializer != nullptr, is_rest);
auto parameter =
parameters->scope->zone()->New<ParserFormalParameters::Parameter>(
pattern, initializer, scanner()->location().beg_pos,
initializer_end_position, is_rest);
parameters->params.Add(parameter);
}
parameters->scope->zone()
是Zone*
(src\zone\zone.h
),用来做堆内存分配:
template <typename T, typename... Args>
T* New(Args&&... args) {
static_assert(alignof(T) <= kAlignmentInBytes);
void* memory = Allocate<T>(sizeof(T));
return new (memory) T(std::forward<Args>(args)...);
}
十、ParseAssignmentExpressionCoverGrammar
的最后一句话是创建Assignment
节点:return factory()->NewAssignment(op, expression, right, op_position);
,其实现:
Assignment* NewAssignment(Token::Value op,
Expression* target,
Expression* value,
int pos) {
DCHECK(Token::IsAssignmentOp(op));
DCHECK_NOT_NULL(target);
DCHECK_NOT_NULL(value);
if (op != Token::INIT && target->IsVariableProxy()) {
target->AsVariableProxy()->set_is_assigned();
}
if (op == Token::ASSIGN || op == Token::INIT) {
return zone_->New<Assignment>(AstNode::kAssignment, op, target, value,
pos);
} else {
return zone_->New<CompoundAssignment>(
op, target, value, pos,
NewBinaryOperation(Token::BinaryOpForAssignment(op), target, value,
pos + 1));
}
}
上文说过zone_->New
仅仅是做内存分配,而zone_->New<Assignment>
就可以认为是调用Assignment
的构造函数:
class Assignment : public Expression {
public:
Token::Value op() const { return TokenField::decode(bit_field_); }
Expression* target() const { return target_; }
Expression* value() const { return value_; }
// The assignment was generated as part of block-scoped sloppy-mode
// function hoisting, see
// ES#sec-block-level-function-declarations-web-legacy-compatibility-semantics
LookupHoistingMode lookup_hoisting_mode() const {
return static_cast<LookupHoistingMode>(
LookupHoistingModeField::decode(bit_field_));
}
void set_lookup_hoisting_mode(LookupHoistingMode mode) {
bit_field_ =
LookupHoistingModeField::update(bit_field_, static_cast<bool>(mode));
}
protected:
Assignment(NodeType type, Token::Value op, Expression* target,
Expression* value, int pos);
private:
friend class AstNodeFactory;
friend Zone;
using TokenField = Expression::NextBitField<Token::Value, 7>;
using LookupHoistingModeField = TokenField::Next<bool, 1>;
Expression* target_;
Expression* value_;
};
V8源码遍历Assignment AstNode生成字节码过程浅析
为了找到参考链接2表述对应的代码,我们不仅需要看Parser部分的源码,还需要看字节码生成部分的源码。
字节码是机器码的抽象表示,采用和物理CPU相同的计算模型进行设计。
字节码的定义在src/interpreter/bytecodes.h
中:
#define BYTECODE_LIST_WITH_UNIQUE_HANDLERS(V) \
/* Extended width operands */ \
V(Wide, ImplicitRegisterUse::kNone) \
V(ExtraWide, ImplicitRegisterUse::kNone) \
\
/* Debug Breakpoints - one for each possible size of unscaled bytecodes */ \
/* and one for each operand widening prefix bytecode */ \
V(DebugBreakWide, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreakExtraWide, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreak0, ImplicitRegisterUse::kReadWriteAccumulator) \
V(DebugBreak1, ImplicitRegisterUse::kReadWriteAccumulator, \
OperandType::kReg) \
// ...
/* - [Register Loads ] */ \
V(Star, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(Mov, ImplicitRegisterUse::kNone, OperandType::kReg, OperandType::kRegOut) \
V(PushContext, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(PopContext, ImplicitRegisterUse::kNone, OperandType::kReg) \
/* Cast operators */ \
V(ToName, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(ToNumber, ImplicitRegisterUse::kReadWriteAccumulator, OperandType::kIdx) \
V(ToNumeric, ImplicitRegisterUse::kReadWriteAccumulator, OperandType::kIdx) \
V(ToObject, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut) \
V(ToString, ImplicitRegisterUse::kReadWriteAccumulator) \
V(ToBoolean, ImplicitRegisterUse::kReadWriteAccumulator) \
\
/* Literals */ \
V(CreateRegExpLiteral, ImplicitRegisterUse::kWriteAccumulator, \
OperandType::kIdx, OperandType::kIdx, OperandType::kFlag16) \
// ...
/* Control Flow -- carefully ordered for efficient checks */ \
/* - [Unconditional jumps] */ \
V(JumpLoop, ImplicitRegisterUse::kNone, OperandType::kUImm, \
OperandType::kImm, OperandType::kIdx) \
/* - [Forward jumps] */ \
V(Jump, ImplicitRegisterUse::kNone, OperandType::kUImm) \
/* - [Start constant jumps] */ \
V(JumpConstant, ImplicitRegisterUse::kNone, OperandType::kIdx) \
// The list of bytecodes which are interpreted by the interpreter.
// Format is V(<bytecode>, <implicit_register_use>, <operands>).
#define BYTECODE_LIST(V) \
BYTECODE_LIST_WITH_UNIQUE_HANDLERS(V) \
\
/* Special-case Star for common register numbers, to save space */ \
SHORT_STAR_BYTECODE_LIST(V) \
\
/* Illegal bytecode */ \
V(Illegal, ImplicitRegisterUse::kNone)
// “调用”入口
#define DECLARE_BYTECODE(Name, ...) k##Name,
BYTECODE_LIST(DECLARE_BYTECODE)
#undef DECLARE_BYTECODE
于是外部可以这么用上面这些宏定义出的常量:interpreter::Bytecode::kMov
。
数据结构上,有寄存器r0,r1,r2,…
和累加寄存器(accumulator register)。累加器是和其它寄存器一样的常规寄存器,但不同的是累加器的操作没有显式给出指令,具体来说,Add r1
将寄存器r1
中的值和累加器中的值进行加法运算。
部分字节码的含义:
LdaSmi [1]
,load accumulator small int
,这里的[1]
是小整型(small int,Smi)常量,加载到累加器中。Star r1
,store accumulator (to) register
,把累加器中的当前值写入到r1
寄存器。
显然字节码需要通过遍历AST生成。但字节码生成的时机怎么找呢?借助参考链接4得到结论:v8源码里,字节码生成的函数是由解释器调用的,src\interpreter\interpreter.cc
。
InterpreterCompilationJob::Status InterpreterCompilationJob::ExecuteJobImpl() {
RCS_SCOPE(parse_info()->runtime_call_stats(),
RuntimeCallCounterId::kCompileIgnition,
RuntimeCallStats::kThreadSpecific);
// TODO(lpy): add support for background compilation RCS trace.
TRACE_EVENT0(TRACE_DISABLED_BY_DEFAULT("v8.compile"), "V8.CompileIgnition");
// Print AST if flag is enabled. Note, if compiling on a background thread
// then ASTs from different functions may be intersperse when printed.
{
DisallowGarbageCollection no_heap_access;
MaybePrintAst(parse_info(), compilation_info());
}
ParkedScopeIfOnBackground parked_scope(local_isolate_);
generator()->GenerateBytecode(stack_limit()); // 就是这一行
if (generator()->HasStackOverflow()) {
return FAILED;
}
return SUCCEEDED;
}
src\interpreter\bytecode-generator.cc
void BytecodeGenerator::GenerateBytecode(uintptr_t stack_limit) {
InitializeAstVisitor(stack_limit);
if (v8_flags.stress_lazy_compilation && local_isolate_->is_main_thread()) {
// Trigger stack overflow with 1/stress_lazy_compilation probability.
// Do this only for the main thread compilations because querying random
// numbers from background threads will make the random values dependent
// on the thread scheduling and thus non-deterministic.
stack_overflow_ = local_isolate_->fuzzer_rng()->NextInt(
v8_flags.stress_lazy_compilation) == 0;
}
// Initialize the incoming context.
ContextScope incoming_context(this, closure_scope());
// Initialize control scope.
ControlScopeForTopLevel control(this);
RegisterAllocationScope register_scope(this);
AllocateTopLevelRegisters();
builder()->EmitFunctionStartSourcePosition(
info()->literal()->start_position());
if (info()->literal()->CanSuspend()) {
BuildGeneratorPrologue();
}
if (NeedsContextInitialization(closure_scope())) {
// Push a new inner context scope for the function.
BuildNewLocalActivationContext();
ContextScope local_function_context(this, closure_scope());
BuildLocalActivationContextInitialization();
GenerateBytecodeBody();
} else {
GenerateBytecodeBody();
}
// Reset variables with hole check bitmap indices for subsequent compilations
// in the same parsing zone.
for (Variable* var : vars_in_hole_check_bitmap_) {
var->ResetHoleCheckBitmapIndex();
}
// Check that we are not falling off the end.
DCHECK(builder()->RemainderOfBlockIsDead());
}
void BytecodeGenerator::GenerateBytecodeBody() {
// Build the arguments object if it is used.
VisitArgumentsObject(closure_scope()->arguments());
// Build rest arguments array if it is used.
Variable* rest_parameter = closure_scope()->rest_parameter();
VisitRestArgumentsArray(rest_parameter);
// Build assignment to the function name or {.this_function}
// variables if used.
VisitThisFunctionVariable(closure_scope()->function_var());
VisitThisFunctionVariable(closure_scope()->this_function_var());
// Build assignment to {new.target} variable if it is used.
VisitNewTargetVariable(closure_scope()->new_target_var());
// Create a generator object if necessary and initialize the
// {.generator_object} variable.
FunctionLiteral* literal = info()->literal();
if (IsResumableFunction(literal->kind())) {
BuildGeneratorObjectVariableInitialization();
}
// Emit tracing call if requested to do so.
if (v8_flags.trace) builder()->CallRuntime(Runtime::kTraceEnter);
// Increment the function-scope block coverage counter.
BuildIncrementBlockCoverageCounterIfEnabled(literal, SourceRangeKind::kBody);
// Visit declarations within the function scope.
if (closure_scope()->is_script_scope()) {
VisitGlobalDeclarations(closure_scope()->declarations());
} else if (closure_scope()->is_module_scope()) {
VisitModuleDeclarations(closure_scope()->declarations());
} else {
VisitDeclarations(closure_scope()->declarations());
}
// Emit initializing assignments for module namespace imports (if any).
VisitModuleNamespaceImports();
// The derived constructor case is handled in VisitCallSuper.
if (IsBaseConstructor(function_kind())) {
if (literal->class_scope_has_private_brand()) {
ClassScope* scope = info()->scope()->outer_scope()->AsClassScope();
DCHECK_NOT_NULL(scope->brand());
BuildPrivateBrandInitialization(builder()->Receiver(), scope->brand());
}
if (literal->requires_instance_members_initializer()) {
BuildInstanceMemberInitialization(Register::function_closure(),
builder()->Receiver());
}
}
// Visit statements in the function body.
VisitStatements(literal->body());
// Emit an implicit return instruction in case control flow can fall off the
// end of the function without an explicit return being present on all paths.
if (!builder()->RemainderOfBlockIsDead()) {
builder()->LoadUndefined();
BuildReturn(literal->return_position());
}
}
我们主要关注VisitStatements(literal->body());
。
void BytecodeGenerator::VisitStatements(
const ZonePtrList<Statement>* statements) {
for (int i = 0; i < statements->length(); i++) {
// Allocate an outer register allocations scope for the statement.
RegisterAllocationScope allocation_scope(this);
Statement* stmt = statements->at(i);
Visit(stmt);
if (builder()->RemainderOfBlockIsDead()) break;
}
}
到这里线索似乎断了,因为Visit
的实现不好找,BytecodeGenerator
和其父类AstVisitor
都没有找到有意义的实现。根据参考链接4和动态调试才知道,BytecodeGenerator
调用了一个叫DEFINE_AST_VISITOR_SUBCLASS_MEMBERS
的宏,这个宏定义在src\ast\ast.h
,里面就有Visit
方法。
void Visit(AstNode* node) { \
if (CheckStackOverflow()) return; \
VisitNoStackOverflowCheck(node); \
} \
这就是BytecodeGenerator.Visit
的实现。继续跟进VisitNoStackOverflowCheck
:
#define GENERATE_VISIT_CASE(NodeType) \
case AstNode::k##NodeType: \
return this->impl()->Visit##NodeType(static_cast<NodeType*>(node));
#define GENERATE_AST_VISITOR_SWITCH() \
switch (node->node_type()) { \
AST_NODE_LIST(GENERATE_VISIT_CASE) \
FAILURE_NODE_LIST(GENERATE_FAILURE_CASE) \
}
#define DEFINE_AST_VISITOR_SUBCLASS_MEMBERS() \
public: \
void VisitNoStackOverflowCheck(AstNode* node) { \
GENERATE_AST_VISITOR_SWITCH() \
} \
至此我们可以知道,Assignment类型的AstNode对应的遍历函数就是VisitAssignment
。
void BytecodeGenerator::VisitAssignment(Assignment* expr) {
AssignmentLhsData lhs_data = PrepareAssignmentLhs(expr->target());
VisitForAccumulatorValue(expr->value());
builder()->SetExpressionPosition(expr);
BuildAssignment(lhs_data, expr->op(), expr->lookup_hoisting_mode());
}
根据上一节的分析结果,expr->target(), expr->op(), expr->value()
分别相当于Babel AST AssignmentExpression
中的left, op, right
属性。
TODO
参考资料
- 由ES规范学JavaScript(二):深入理解“连等赋值”问题:segmentfault.com/a/119000000…
- es5.github.io/#x11.13.1
- 理解ECMAScript规范(4):zhuanlan.zhihu.com/p/373857729
- 连载《Chrome V8 原理讲解》第六篇 bytecode字节码生成:zhuanlan.zhihu.com/p/427676313
转载自:https://juejin.cn/post/7240666488336465976