likes
comments
collection
share

【JS语法】赋值运算符执行过程经典面试题+v8源码AssignmentExpression Parser和字节码生成过程浅析

作者站长头像
站长
· 阅读数 41

赋值运算符执行过程经典面试题

作者:hans774882968

本文juejin:juejin.cn/post/724066…

let o1 = {x: 100}
let o2 = o1
o1.y = o1 = {x: 200}
console.log('??', o1, o1.y) // o1 = {x: 200}
console.log('??', o2) // o2 = {x: 100, y: {x: 200}}

做这题只需要背下一个结论:JS引擎是怎样计算一般的赋值表达式A = B的呢?简单地说,按如下步骤:

  1. 计算表达式A,得到一个引用refA
  2. 计算表达式B,得到一个值valueB
  3. valueB赋给refA指向的名称绑定。
  4. 返回valueB

这个结论来自参考链接2。原文:

The production AssignmentExpression : LeftHandSideExpression **=** AssignmentExpression is evaluated as follows:

  1. Let lref be the result of evaluating LeftHandSideExpression.
  2. Let rref be the result of evaluating AssignmentExpression.
  3. Let rval be GetValue(rref).
  4. Throw a SyntaxError exception if the following conditions are all true:
  5. Call PutValue(lref, rval).
  6. Return rval.

NOTE When an assignment occurs within strict mode code, its LeftHandSide must not evaluate to an unresolvable reference. If it does a ReferenceError exception is thrown upon assignment. The LeftHandSide also may not be a reference to a data property with the attribute value {[[Writable]]:false}, to an accessor property with the attribute value {[[Set]]:undefined}, nor to a non-existent property of an object whose [[Extensible]] internal property has the value false. In these cases a TypeError exception is thrown.

所以第一步就是解析o1.y,将最后一个赋值语句o1.y = o1认定为在{x: 100}这个对象上添加属性。第二步是o1 = {x: 200}o1指向新对象。第三步是({x: 100}).y = o1的当前值,即{x: 100}被修改了。 知道这个结论后,立刻有推论:

  1. o2.y = o1 = {x: 200}这句话作用和o1.y = o1 = {x: 200}完全一样。
  2. 在JS里,连续赋值和拆为多个语句一般是不等价的:
let o1 = {x: 100}
let o2 = o1
o1 = {x: 200}
o1.y = o1
console.log('??', o1) // o1 变成一个自指的对象
console.log('??', o2) // o2 = {x: 100}

从AST和字节码的角度来看连续赋值

这一节只是我的口胡。其实还可以更进一步地考虑这个问题:这是编译器实现策略的问题。我们看下o1.y = o1 = {x: 200}这句话的AST结构,传送门

AssignmentExpression:
  left: MemberExpression
    object: Identifier
    property: Identifier
  right: AssignmentExpression
    left:
      Identifier: o1
    right:
      ObjectExpression

正常人dfs遍历AST的时候会先遍历左边,把左边的变量当时指向的堆地址先解析出来。但如果我们先遍历右边,那么执行o1.y = o1o1.y就会取到{x: 200}这个对象了:

  1. Let rref be the result of evaluating AssignmentExpression.
  2. Let rval be GetValue(rref).
  3. Let lref be the result of evaluating LeftHandSideExpression.
  4. Throw a SyntaxError exception if the following conditions are all true:
  5. Call PutValue(lref, rval).
  6. Return rval.

接下来从字节码的角度来看上述经典面试题。node --print-bytecode assign_demo.js可以输出JS字节码。输出量很大,为了方便定位,我们可以用一个函数包裹:

function assign_demo() {
  let o1 = {x: 100}
  let o2 = o1
  o1.y = o1 = {x: 200}
  console.log('??', o1, o1.y) // o1 = {x: 200}
  console.log('??', o2) // o2 = {x: 100, y: {x: 200}}
}
assign_demo()

定位到的字节码:

[generated bytecode for function: assign_demo (0x0067c62f8531 <SharedFunctionInfo assign_demo>)]
Bytecode length: 60
Parameter count 1
Register count 7
Frame size 56
OSR nesting level: 0
Bytecode Age: 0
   37 S> 00000067C62F8F66 @    0 : 7b 00 00 29       CreateObjectLiteral [0], [0], #41 ; 创建对象{x: 100},存在常量池下标0
         00000067C62F8F6A @    4 : c3                Star0 ; r0寄存器指向常量池下标0的对象
   58 S> 00000067C62F8F6B @    5 : c2                Star1 ; r1寄存器也指向常量池下标0的对象
   64 S> 00000067C62F8F6C @    6 : 7b 01 01 29       CreateObjectLiteral [1], [1], #41 ; 创建对象{x: 200},存在常量池下标1
         00000067C62F8F70 @   10 : c3                Star0 ; r0寄存器现在指向常量池下标1的对象,{x: 200}
   69 E> 00000067C62F8F71 @   11 : 32 f9 02 02       StaNamedProperty r1, [2], [2] ; 常量池下标2是“y”,r1现在指向常量池下标0的对象,所以相当于给常量池下标0的对象({x: 100}).y赋值
   88 S> 00000067C62F8F75 @   15 : 21 03 04          LdaGlobal [3], [4] ; 常量池下标3是console,“[4]”和反馈向量有关,本文可以不管
         00000067C62F8F78 @   18 : c0                Star3 ; r3指向console
   96 E> 00000067C62F8F79 @   19 : 2d f7 04 06       LdaNamedProperty r3, [4], [6] ; 加载r3指向的对象的属性到累加寄存器,属性名是常量池下标4,即log
         00000067C62F8F7D @   23 : c1                Star2 ; r2指向console.log这个对象
         00000067C62F8F7E @   24 : 13 05             LdaConstant [5] ; 累加寄存器现在是'??'
         00000067C62F8F80 @   26 : bf                Star4 ; 累加寄存器的值写入r4,r4是'??'
  113 E> 00000067C62F8F81 @   27 : 2d fa 02 08       LdaNamedProperty r0, [2], [8] ; 读取({x: 200}).y,存到累加寄存器
         00000067C62F8F85 @   31 : bd                Star6 ; ({x: 200}).y的值写入r6
         00000067C62F8F86 @   32 : 19 fa f5          Mov r0, r5 ; AT&T格式-_-。把r0寄存器赋值给r5寄存器,r5指向常量池下标1的对象{x: 200}
   96 E> 00000067C62F8F89 @   35 : 5b f8 f7 04 0a    CallProperty r2, r3-r6, [10] ; r2, r3负责执行输出,没看懂这句,但理应用到r0和r6,理应输出 “?? {x: 200} undefined”
  136 S> 00000067C62F8F8E @   40 : 21 03 04          LdaGlobal [3], [4] ; 同上
         00000067C62F8F91 @   43 : c0                Star3 ; r3指向console
  144 E> 00000067C62F8F92 @   44 : 2d f7 04 06       LdaNamedProperty r3, [4], [6] ; 同上
         00000067C62F8F96 @   48 : c1                Star2 ; 同上
         00000067C62F8F97 @   49 : 13 05             LdaConstant [5] ; 累加寄存器现在是'??'
         00000067C62F8F99 @   51 : bf                Star4 ; 同上,r4是'??'
  144 E> 00000067C62F8F9A @   52 : 5e f8 f7 f6 f9 0c CallProperty2 r2, r3, r4, r1, [12] ; r2, r3负责执行输出;r1没动过,所以输出了常量池下标0的对象。
         00000067C62F8FA0 @   58 : 0e                LdaUndefined 
  189 S> 00000067C62F8FA1 @   59 : a8                Return 
Constant pool (size = 6)
00000067C62F8EF1: [FixedArray] in OldSpace
 - map: 0x0198dd8012c1 <Map>
 - length: 6
           0: 0x0067c62f8ea1 <ObjectBoilerplateDescription[3]>
           1: 0x0067c62f8ec9 <ObjectBoilerplateDescription[3]>
           2: 0x01e31fdd5b49 <String[1]: #y>
           3: 0x014f3febaf11 <String[7]: #console>
           4: 0x014f3fe8caa9 <String[3]: #log>
           5: 0x0067c62f8e41 <String[2]: #??>
Handler Table (size = 0)
Source Position Table (size = 28)
0x0067c62f8fa9 <ByteArray[28]>

看上去是优化过的-_-,并没有体现“先获取lref再获取rval最后赋值的事实”。所以我会希望在源码中找到参考链接2表述所对应的代码,需要读AssignmentExpression Parser建AST和遍历Assignment AstNode生成字节码这两块代码。

V8源码AssignmentExpression Parser建AST过程浅析

AstNode表示AST节点的类,比如Statement, Expression, Assignment, CompoundAssignment都继承自AstNodesrc\ast\ast.h。并且有继承关系链CompoundAssignment -> Assignment -> Expression -> AstNode

接下来在V8源码中找下AssignmentExpression的实现。

src\parsing\parser-base.hsrc\parsing\parser.h主要实现了ParserBase, Parser类。AssignmentExpression相关代码主要在src\parsing\parser-base.h传送门

// ParseAssignmentExpression调用ParseAssignmentExpressionCoverGrammar,ParseAssignmentExpressionCoverGrammar递归调用ParseAssignmentExpression
// 返回值就是创建的AST节点。
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseAssignmentExpression() {
  // 可以理解为,使用impl()就是为了让父类能够直接调用子类函数。
  ExpressionParsingScope expression_scope(impl());
  ExpressionT result = ParseAssignmentExpressionCoverGrammar();
  expression_scope.ValidateExpression(); // TODO:不知道是什么作用
  return result;
}

template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseAssignmentExpressionCoverGrammar() {
  // 上下文无关文法。代码中的这类注释描述的是非终结符(Non-Terminal)AssignmentExpression的产生式(Productions)。
  // AssignmentExpression ::
  //   ConditionalExpression
  //   ArrowFunction
  //   YieldExpression
  //   LeftHandSideExpression AssignmentOperator AssignmentExpression
  int lhs_beg_pos = peek_position();

  if (peek() == Token::YIELD && is_generator()) {
    return ParseYieldExpression();
  }

  FuncNameInferrerState fni_state(&fni_);

  DCHECK_IMPLIES(!has_error(), next_arrow_function_info_.HasInitialState());
  // ParseConditionalExpression 就是 a ? b : c,ParseConditionalExpression在右侧符号不是“?”的时候会直接返回,所以相当于解析了左侧表达式,获得LHS的AST子树。
  ExpressionT expression = ParseConditionalExpression();

  Token::Value op = peek(); // 下一个token

  if (!Token::IsArrowOrAssignmentOp(op)) return expression;

  // Arrow functions. Token::ARROW就是“=>”。这段代码就是在解析箭头函数。
  // V8_UNLIKELY是一个宏,在后文有解释。读代码时认为这个宏没有作用即可。
  if (V8_UNLIKELY(op == Token::ARROW)) {
    Scanner::Location loc(lhs_beg_pos, end_position());
    // 箭头函数参数列表只能是标识符或者由小括号括起来。
    if (!impl()->IsIdentifier(expression) && !expression->is_parenthesized()) {
      impl()->ReportMessageAt(
          Scanner::Location(expression->position(), position()),
          MessageTemplate::kMalformedArrowFunParamList); // Malformed arrow function parameter list
      return impl()->FailureExpression();
    }

    DeclarationScope* scope = next_arrow_function_info_.scope;
    scope->set_start_position(lhs_beg_pos);

    FormalParametersT parameters(scope);
    parameters.set_strict_parameter_error(
        next_arrow_function_info_.strict_parameter_error_location,
        next_arrow_function_info_.strict_parameter_error_message);
    parameters.is_simple = scope->has_simple_parameters();
    next_arrow_function_info_.Reset();

    impl()->DeclareArrowFunctionFormalParameters(&parameters, expression, loc); // 在后文解析

    expression = ParseArrowFunctionLiteral(parameters); // TODO:很复杂,不看了,猜测是解析函数体的,AST节点的创建是由这个函数完成的,然后return给调用者

    return expression;
  }
  // IsAssignableIdentifier(expression) 表示左侧表达式是“可以赋值的标识符”
  if (V8_LIKELY(impl()->IsAssignableIdentifier(expression))) {
    // TODO:猜测是处理 let (x) = 0; 这种情况?但似乎无法简单复现,因为这句话执行的报错是Reference Error: let is not defined。
    if (expression->is_parenthesized()) {
      // 消息模板 "Invalid destructuring assignment target"。注意这里没有return也没有异常,应该要继续执行到 Consume(op); 这一句。TODO:确认我的结论是否正确。
      expression_scope()->RecordDeclarationError(
          Scanner::Location(lhs_beg_pos, end_position()),
          MessageTemplate::kInvalidDestructuringTarget);
    }
    expression_scope()->MarkIdentifierAsAssigned();
  } else if (expression->IsProperty()) {
    expression_scope()->RecordDeclarationError(
        Scanner::Location(lhs_beg_pos, end_position()),
        MessageTemplate::kInvalidPropertyBindingPattern);
    expression_scope()->ValidateAsExpression();
  } else if (expression->IsPattern() && op == Token::ASSIGN) {
    // Destructuring assignmment. 看下IsPattern实现就明白。
    if (expression->is_parenthesized()) {
      Scanner::Location loc(lhs_beg_pos, end_position());
      if (expression_scope()->IsCertainlyDeclaration()) {
        // 猜测是处理 let ({x}) = {x: 100} 这种情况?报错也是Reference Error: let is not defined,无法简单复现。
        impl()->ReportMessageAt(loc,
                                MessageTemplate::kInvalidDestructuringTarget);
      } else {
        // Syntax Error if LHS is neither object literal nor an array literal
        // (Parenthesized literals are
        // CoverParenthesizedExpressionAndArrowParameterList).
        // #sec-assignment-operators-static-semantics-early-errors
        // CoverParenthesizedExpressionAndArrowParameterList 见下文介绍,简单来说,这里处理的是 (x, y) = 1、(function foo() { }) = 1 这类语法错误,这些例子中的LHS可以是括号表达式或箭头函数参数,且整个表达式不是变量声明语句。
        impl()->ReportMessageAt(loc, MessageTemplate::kInvalidLhsInAssignment);
      }
    }
    expression_scope()->ValidateAsPattern(expression, lhs_beg_pos,
                                          end_position());
  } else {
    DCHECK(!IsValidReferenceExpression(expression));
    // For web compatibility reasons, throw early errors only for logical
    // assignment, not for regular assignment.
    // LogicalAssignmentOp 包括 ||=, &&=, and ??=
    const bool early_error = Token::IsLogicalAssignmentOp(op);
    // TODO:不清楚这段代码的作用
    expression = RewriteInvalidReferenceExpression(
        expression, lhs_beg_pos, end_position(),
        MessageTemplate::kInvalidLhsInAssignment, early_error);
  }

  Consume(op); // TODO:这里猜测是有副作用的,执行后op就“加1”,指向了当前的赋值运算符。
  int op_position = position();

  ExpressionT right = ParseAssignmentExpression(); // 递归调用ParseAssignmentExpression,获取RHS的AST子树。

  // Anonymous function name inference applies to =, ||=, &&=, and ??=. 下文讲解。
  if (op == Token::ASSIGN || Token::IsLogicalAssignmentOp(op)) {
    // 顾名思义,如果是 xxobj.yy = function(){...} 的赋值,则针对函数字面量做一个优化。
    impl()->CheckAssigningFunctionLiteralToProperty(expression, right);

    // Check if the right hand side is a call to avoid inferring a
    // name if we're dealing with "a = function(){...}();"-like
    // expression.
    // 检查RHS是否为call,以避免在处理“a = function(){…}();”之类的表达式时推断出名称。TODO:没看懂-_-
    if (right->IsCall() || right->IsCallNew()) {
      fni_.RemoveLastFunction();
    } else {
      fni_.Infer();
    }

    impl()->SetFunctionNameFromIdentifierRef(right, expression);
  } else {
    fni_.RemoveLastFunction();
  }

  if (op == Token::ASSIGN) {
    // We try to estimate the set of properties set by constructors. We define a
    // new property whenever there is an assignment to a property of 'this'. We
    // should probably only add properties if we haven't seen them before.
    // Otherwise we'll probably overestimate the number of properties.
    // 从注释来看,这是这个函数头一次针对 this.xx = yy 进行处理。但我没找到判断“之前没见过的属性”的地方……
    if (impl()->IsThisProperty(expression)) function_state_->AddProperty();
  } else {
    // Only initializers (i.e. no compound assignments) are allowed in patterns.
    // TODO:猜测是针对这种类型的无效语句:let {x: 114514} = { x: 100 } 的报错,这里114514是一个无效的解构赋值target。
    expression_scope()->RecordPatternError(
        Scanner::Location(lhs_beg_pos, end_position()),
        MessageTemplate::kInvalidDestructuringTarget);
  }
  // factory()->NewXXX() 创建AST节点,在后文解析
  return factory()->NewAssignment(op, expression, right, op_position);
}

一些额外的补充:

一、随处可见的impl()是为了让父类能够直接调用子类函数。

template <typename Impl>
class ParserBase {
  // All implementation-specific methods must be called through this.
  Impl* impl() { return static_cast<Impl*>(this); }
  const Impl* impl() const { return static_cast<const Impl*>(this); }
}

二、V8_UNLIKELYV8_LIKELY可以认为是给编译器提供的分支预测信息,用来提升性能的。V8_UNLIKELY表示condition大概率是false。阅读源码时认为这个宏没有影响就行。

// A macro to provide the compiler with branch prediction information.
#if V8_HAS_BUILTIN_EXPECT
# define V8_UNLIKELY(condition) (__builtin_expect(!!(condition), 0))
# define V8_LIKELY(condition) (__builtin_expect(!!(condition), 1))
#else
# define V8_UNLIKELY(condition) (condition)
# define V8_LIKELY(condition) (condition)
#endif

三、ParseConditionalExpression在发现表达式的下一个Token不是“?”的时候会直接返回表达式,所以ParseAssignmentExpressionCoverGrammar可以放心地调用它。

// Precedence = 3
template <typename Impl>
typename ParserBase<Impl>::ExpressionT
ParserBase<Impl>::ParseConditionalExpression() {
  // ConditionalExpression ::
  //   LogicalExpression
  //   LogicalExpression '?' AssignmentExpression ':' AssignmentExpression
  //
  int pos = peek_position();
  ExpressionT expression = ParseLogicalExpression();
  return peek() == Token::CONDITIONAL
             ? ParseConditionalContinuation(expression, pos)
             : expression;
}

四、IsAssignableIdentifier(expression)表示左侧表达式是“可以赋值的运算符”。可见在严格模式下evalarguments是不可重新赋值的标识符。实际上在严格模式下给这两个变量赋值会报错Unexpected eval or arguments in strict mode

  bool IsAssignableIdentifier(ExpressionT expression) {
    if (!impl()->IsIdentifier(expression)) return false;
    if (is_strict(language_mode()) &&
        impl()->IsEvalOrArguments(impl()->AsIdentifier(expression))) {
      return false;
    }
    return true;
  }

五、消息模板,比如MessageTemplate::kInvalidDestructuringTarget,定义在src\common\message-template.h。搜MessageTemplate::k后面的字符串就能搜到:

#define MESSAGE_TEMPLATES(T)                                                   \
  /* Error */                                                                  \
  T(None, "")                                                                  \
  T(ConflictingPrivateName,                                                    \
    "Operation is ambiguous because there are more than one private name"      \
    "'%' on the object")                                                       \
  T(CyclicProto, "Cyclic __proto__ value")                                     \
  T(Debugger, "Debugger: %")                                                   \
  // ...
  T(InvalidDestructuringTarget, "Invalid destructuring assignment target")     \
  // ...

六、expression->IsPattern(),判断是否是对象解构或数组解构符号。ExpressionT expression = ParseConditionalExpression();并且using ExpressionT = typename Types::Expression;,所以expression->IsPattern()走的是src\ast\ast.h的实现。不过下面还是顺便看了一下PreParserExpression的实现。

// src\ast\ast.h
// 宏 AST_NODE_LIST 为入口,“调用”到 LITERAL_NODE_LIST ,“调用”到V(ObjectLiteral)和V(ArrayLiteral),于是产生 kObjectLiteral 和 kArrayLiteral
class Expression : public AstNode {
  bool IsPattern() {
    static_assert(kObjectLiteral + 1 == kArrayLiteral);
    return base::IsInRange(node_type(), kObjectLiteral, kArrayLiteral);
  }
}

// src\parsing\preparser.h
class PreParserExpression {
  bool IsPattern() const {
    static_assert(kObjectLiteralExpression + 1 == kArrayLiteralExpression);
    return base::IsInRange(TypeField::decode(code_), kObjectLiteralExpression,
                           kArrayLiteralExpression);
  }
}

kObjectLiteralExpression, kArrayLiteralExpression是枚举值,kArrayLiteralExpression = kObjectLiteralExpression + 1

  enum Type {
    kNull,
    kFailure,
    kExpression,
    kIdentifierExpression,
    kStringLiteralExpression,
    kSpreadExpression,
    kObjectLiteralExpression,
    kArrayLiteralExpression
  };

顺便看下IsInRange(在src\base\bounds.h,namespacev8::base)的实现:

template <typename T, typename U>
inline constexpr bool IsInRange(T value, U lower_limit, U higher_limit) {
  DCHECK_LE(lower_limit, higher_limit);
  static_assert(sizeof(U) <= sizeof(T));
  using unsigned_T = typename std::make_unsigned<T>::type;
  // Use static_cast to support enum classes.
  return static_cast<unsigned_T>(static_cast<unsigned_T>(value) -
                                 static_cast<unsigned_T>(lower_limit)) <=
         static_cast<unsigned_T>(static_cast<unsigned_T>(higher_limit) -
                                 static_cast<unsigned_T>(lower_limit));
}

七、op == Token::ASSIGN || Token::IsLogicalAssignmentOp(op)的注释Anonymous function name inference applies to =, ||=, &&=, and ??=是正确且必要的,但也可以看下IsLogicalAssignmentOp的实现来理解为什么注释是正确的:

  static bool IsLogicalAssignmentOp(Value token) {
    return base::IsInRange(token, ASSIGN_NULLISH, ASSIGN_AND);
  }
// BINARY_OP_TOKEN_LIST即入口,“调用”了EXPAND_BINOP_ASSIGN_TOKEN(T, NULLISH),于是"调用"了T(ASSIGN_NULLISH, "??" "=", 2),最终产生 ASSIGN_NULLISH。所以IsInRange就取到了 ASSIGN_NULLISH、ASSIGN_OR和ASSIGN_AND。相关代码:
#define EXPAND_BINOP_ASSIGN_TOKEN(T, name, string, precedence) \
  T(ASSIGN_##name, string "=", 2)
/* Binary operators */
/* ADD and SUB are at the end since they are UnaryOp */
#define BINARY_OP_TOKEN_LIST(T, E) \
  E(T, NULLISH, "??", 3)           \
  E(T, OR, "||", 4)                \
  E(T, AND, "&&", 5)               \
  E(T, BIT_OR, "|", 6)             \
  E(T, BIT_XOR, "^", 7)            \
  E(T, BIT_AND, "&", 8)            \
  E(T, SHL, "<<", 11)              \
  E(T, SAR, ">>", 11)              \
  E(T, SHR, ">>>", 11)             \
  E(T, MUL, "*", 13)               \
  E(T, DIV, "/", 13)               \
  E(T, MOD, "%", 13)               \
  E(T, EXP, "**", 14)              \
  E(T, ADD, "+", 12)               \
  E(T, SUB, "-", 12)
  BINARY_OP_TOKEN_LIST(T, EXPAND_BINOP_ASSIGN_TOKEN)               \

八、CoverParenthesizedExpressionAndArrowParameterList(简写为CPEAAPL)是ES2017引入的上下文无关文法(根据参考链接3,和cover grammar有关)符号。引入这个符号是为了解决一个文法二义性问题:考虑let x = (a, b);let x = (a, b) => { return a + b };,我们确定这是一个AssignmentExpression,并读到一个左括号后,如何确定是去解析箭头函数还是解析括号表达式?

这个符号可以直译为“括号表达式和箭头函数参数列表的并集的超集”,(a, b = 1)(有效的ParenthesizedExpression和ArrowParameterList)、(function foo() { })(有效的ParenthesizedExpression)、(a = 1, ...b)(有效的ArrowParameterList)、(1, ...b)(无效的ParenthesizedExpression和ArrowParameterList)都是有效的CPEAAPL

九、DeclareArrowFunctionFormalParameters很复杂。看上去是进行作用域数据结构的维护,没有添加AST节点。

// src\parsing\parser.cc
void Parser::DeclareArrowFunctionFormalParameters(
    ParserFormalParameters* parameters, Expression* expr,
    const Scanner::Location& params_loc) {
  if (expr->IsEmptyParentheses() || has_error()) return;
  // 猜测主要是分配了内存空间,记录作用域信息还得下面的 DeclareFormalParameters 完成?
  AddArrowFunctionFormalParameters(parameters, expr, params_loc.end_pos);

  if (parameters->arity > Code::kMaxArguments) { // 箭头函数允许的最大参数个数 (1 << 16) - 2
    ReportMessageAt(params_loc, MessageTemplate::kMalformedArrowFunParamList);
    return;
  }

  DeclareFormalParameters(parameters);
  DCHECK_IMPLIES(parameters->is_simple,
                 parameters->scope->has_simple_parameters());
}
  // src\parsing\parser.h
  V8_INLINE void DeclareFormalParameters(ParserFormalParameters* parameters) {
    bool is_simple = parameters->is_simple;
    DeclarationScope* scope = parameters->scope;
    if (!is_simple) scope->MakeParametersNonSimple();
    for (auto parameter : parameters->params) {
      bool is_optional = parameter->initializer() != nullptr;
      // If the parameter list is simple, declare the parameters normally with
      // their names. If the parameter list is not simple, declare a temporary
      // for each parameter - the corresponding named variable is declared by
      // BuildParamerterInitializationBlock.
      scope->DeclareParameter(
          is_simple ? parameter->name() : ast_value_factory()->empty_string(),
          is_simple ? VariableMode::kVar : VariableMode::kTemporary,
          is_optional, parameter->is_rest(), ast_value_factory(),
          parameter->position);
    }
  }
  // AddArrowFunctionFormalParameters代码比较长就不贴了。其为递归调用,但最终会体现为调用AddFormalParameter,所以我们继续查看 AddFormalParameter 实现。src\parsing\parser.h
  V8_INLINE void AddFormalParameter(ParserFormalParameters* parameters,
                                    Expression* pattern,
                                    Expression* initializer,
                                    int initializer_end_position,
                                    bool is_rest) {
    parameters->UpdateArityAndFunctionLength(initializer != nullptr, is_rest);
    auto parameter =
        parameters->scope->zone()->New<ParserFormalParameters::Parameter>(
            pattern, initializer, scanner()->location().beg_pos,
            initializer_end_position, is_rest);

    parameters->params.Add(parameter);
  }

parameters->scope->zone()Zone*src\zone\zone.h),用来做堆内存分配:

  template <typename T, typename... Args>
  T* New(Args&&... args) {
    static_assert(alignof(T) <= kAlignmentInBytes);
    void* memory = Allocate<T>(sizeof(T));
    return new (memory) T(std::forward<Args>(args)...);
  }

十、ParseAssignmentExpressionCoverGrammar的最后一句话是创建Assignment节点:return factory()->NewAssignment(op, expression, right, op_position);,其实现:

  Assignment* NewAssignment(Token::Value op,
                            Expression* target,
                            Expression* value,
                            int pos) {
    DCHECK(Token::IsAssignmentOp(op));
    DCHECK_NOT_NULL(target);
    DCHECK_NOT_NULL(value);

    if (op != Token::INIT && target->IsVariableProxy()) {
      target->AsVariableProxy()->set_is_assigned();
    }

    if (op == Token::ASSIGN || op == Token::INIT) {
      return zone_->New<Assignment>(AstNode::kAssignment, op, target, value,
                                    pos);
    } else {
      return zone_->New<CompoundAssignment>(
          op, target, value, pos,
          NewBinaryOperation(Token::BinaryOpForAssignment(op), target, value,
                             pos + 1));
    }
  }

上文说过zone_->New仅仅是做内存分配,而zone_->New<Assignment>就可以认为是调用Assignment的构造函数:

class Assignment : public Expression {
 public:
  Token::Value op() const { return TokenField::decode(bit_field_); }
  Expression* target() const { return target_; }
  Expression* value() const { return value_; }

  // The assignment was generated as part of block-scoped sloppy-mode
  // function hoisting, see
  // ES#sec-block-level-function-declarations-web-legacy-compatibility-semantics
  LookupHoistingMode lookup_hoisting_mode() const {
    return static_cast<LookupHoistingMode>(
        LookupHoistingModeField::decode(bit_field_));
  }
  void set_lookup_hoisting_mode(LookupHoistingMode mode) {
    bit_field_ =
        LookupHoistingModeField::update(bit_field_, static_cast<bool>(mode));
  }

 protected:
  Assignment(NodeType type, Token::Value op, Expression* target,
             Expression* value, int pos);

 private:
  friend class AstNodeFactory;
  friend Zone;

  using TokenField = Expression::NextBitField<Token::Value, 7>;
  using LookupHoistingModeField = TokenField::Next<bool, 1>;

  Expression* target_;
  Expression* value_;
};

V8源码遍历Assignment AstNode生成字节码过程浅析

为了找到参考链接2表述对应的代码,我们不仅需要看Parser部分的源码,还需要看字节码生成部分的源码。

字节码是机器码的抽象表示,采用和物理CPU相同的计算模型进行设计。

字节码的定义在src/interpreter/bytecodes.h中:

#define BYTECODE_LIST_WITH_UNIQUE_HANDLERS(V)                                  \
  /* Extended width operands */                                                \
  V(Wide, ImplicitRegisterUse::kNone)                                          \
  V(ExtraWide, ImplicitRegisterUse::kNone)                                     \
                                                                               \
  /* Debug Breakpoints - one for each possible size of unscaled bytecodes */   \
  /* and one for each operand widening prefix bytecode                    */   \
  V(DebugBreakWide, ImplicitRegisterUse::kReadWriteAccumulator)                \
  V(DebugBreakExtraWide, ImplicitRegisterUse::kReadWriteAccumulator)           \
  V(DebugBreak0, ImplicitRegisterUse::kReadWriteAccumulator)                   \
  V(DebugBreak1, ImplicitRegisterUse::kReadWriteAccumulator,                   \
    OperandType::kReg)                                                         \
  // ...
  /* - [Register Loads ] */                                                    \
  V(Star, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut)         \
  V(Mov, ImplicitRegisterUse::kNone, OperandType::kReg, OperandType::kRegOut)  \
  V(PushContext, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut)  \
  V(PopContext, ImplicitRegisterUse::kNone, OperandType::kReg)                 \
  /* Cast operators */                                                         \
  V(ToName, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut)       \
  V(ToNumber, ImplicitRegisterUse::kReadWriteAccumulator, OperandType::kIdx)   \
  V(ToNumeric, ImplicitRegisterUse::kReadWriteAccumulator, OperandType::kIdx)  \
  V(ToObject, ImplicitRegisterUse::kReadAccumulator, OperandType::kRegOut)     \
  V(ToString, ImplicitRegisterUse::kReadWriteAccumulator)                      \
  V(ToBoolean, ImplicitRegisterUse::kReadWriteAccumulator)                     \
                                                                               \
  /* Literals */                                                               \
  V(CreateRegExpLiteral, ImplicitRegisterUse::kWriteAccumulator,               \
    OperandType::kIdx, OperandType::kIdx, OperandType::kFlag16)                \
  // ...
    /* Control Flow -- carefully ordered for efficient checks */                 \
  /* - [Unconditional jumps] */                                                \
  V(JumpLoop, ImplicitRegisterUse::kNone, OperandType::kUImm,                  \
    OperandType::kImm, OperandType::kIdx)                                      \
  /* - [Forward jumps] */                                                      \
  V(Jump, ImplicitRegisterUse::kNone, OperandType::kUImm)                      \
  /* - [Start constant jumps] */                                               \
  V(JumpConstant, ImplicitRegisterUse::kNone, OperandType::kIdx)               \
// The list of bytecodes which are interpreted by the interpreter.
// Format is V(<bytecode>, <implicit_register_use>, <operands>).
#define BYTECODE_LIST(V)                                             \
  BYTECODE_LIST_WITH_UNIQUE_HANDLERS(V)                              \
                                                                     \
  /* Special-case Star for common register numbers, to save space */ \
  SHORT_STAR_BYTECODE_LIST(V)                                        \
                                                                     \
  /* Illegal bytecode  */                                            \
  V(Illegal, ImplicitRegisterUse::kNone)
// “调用”入口
#define DECLARE_BYTECODE(Name, ...) k##Name,
  BYTECODE_LIST(DECLARE_BYTECODE)
#undef DECLARE_BYTECODE

于是外部可以这么用上面这些宏定义出的常量:interpreter::Bytecode::kMov

数据结构上,有寄存器r0,r1,r2,… 和累加寄存器(accumulator register)。累加器是和其它寄存器一样的常规寄存器,但不同的是累加器的操作没有显式给出指令,具体来说,Add r1将寄存器r1中的值和累加器中的值进行加法运算。

部分字节码的含义:

  1. LdaSmi [1]load accumulator small int,这里的[1]是小整型(small int,Smi)常量,加载到累加器中。
  2. Star r1store accumulator (to) register,把累加器中的当前值写入到r1寄存器。

显然字节码需要通过遍历AST生成。但字节码生成的时机怎么找呢?借助参考链接4得到结论:v8源码里,字节码生成的函数是由解释器调用的,src\interpreter\interpreter.cc

InterpreterCompilationJob::Status InterpreterCompilationJob::ExecuteJobImpl() {
  RCS_SCOPE(parse_info()->runtime_call_stats(),
            RuntimeCallCounterId::kCompileIgnition,
            RuntimeCallStats::kThreadSpecific);
  // TODO(lpy): add support for background compilation RCS trace.
  TRACE_EVENT0(TRACE_DISABLED_BY_DEFAULT("v8.compile"), "V8.CompileIgnition");

  // Print AST if flag is enabled. Note, if compiling on a background thread
  // then ASTs from different functions may be intersperse when printed.
  {
    DisallowGarbageCollection no_heap_access;
    MaybePrintAst(parse_info(), compilation_info());
  }

  ParkedScopeIfOnBackground parked_scope(local_isolate_);

  generator()->GenerateBytecode(stack_limit()); // 就是这一行

  if (generator()->HasStackOverflow()) {
    return FAILED;
  }
  return SUCCEEDED;
}

src\interpreter\bytecode-generator.cc

void BytecodeGenerator::GenerateBytecode(uintptr_t stack_limit) {
  InitializeAstVisitor(stack_limit);
  if (v8_flags.stress_lazy_compilation && local_isolate_->is_main_thread()) {
    // Trigger stack overflow with 1/stress_lazy_compilation probability.
    // Do this only for the main thread compilations because querying random
    // numbers from background threads will make the random values dependent
    // on the thread scheduling and thus non-deterministic.
    stack_overflow_ = local_isolate_->fuzzer_rng()->NextInt(
                          v8_flags.stress_lazy_compilation) == 0;
  }

  // Initialize the incoming context.
  ContextScope incoming_context(this, closure_scope());

  // Initialize control scope.
  ControlScopeForTopLevel control(this);

  RegisterAllocationScope register_scope(this);

  AllocateTopLevelRegisters();

  builder()->EmitFunctionStartSourcePosition(
      info()->literal()->start_position());

  if (info()->literal()->CanSuspend()) {
    BuildGeneratorPrologue();
  }

  if (NeedsContextInitialization(closure_scope())) {
    // Push a new inner context scope for the function.
    BuildNewLocalActivationContext();
    ContextScope local_function_context(this, closure_scope());
    BuildLocalActivationContextInitialization();
    GenerateBytecodeBody();
  } else {
    GenerateBytecodeBody();
  }

  // Reset variables with hole check bitmap indices for subsequent compilations
  // in the same parsing zone.
  for (Variable* var : vars_in_hole_check_bitmap_) {
    var->ResetHoleCheckBitmapIndex();
  }

  // Check that we are not falling off the end.
  DCHECK(builder()->RemainderOfBlockIsDead());
}

void BytecodeGenerator::GenerateBytecodeBody() {
  // Build the arguments object if it is used.
  VisitArgumentsObject(closure_scope()->arguments());

  // Build rest arguments array if it is used.
  Variable* rest_parameter = closure_scope()->rest_parameter();
  VisitRestArgumentsArray(rest_parameter);

  // Build assignment to the function name or {.this_function}
  // variables if used.
  VisitThisFunctionVariable(closure_scope()->function_var());
  VisitThisFunctionVariable(closure_scope()->this_function_var());

  // Build assignment to {new.target} variable if it is used.
  VisitNewTargetVariable(closure_scope()->new_target_var());

  // Create a generator object if necessary and initialize the
  // {.generator_object} variable.
  FunctionLiteral* literal = info()->literal();
  if (IsResumableFunction(literal->kind())) {
    BuildGeneratorObjectVariableInitialization();
  }

  // Emit tracing call if requested to do so.
  if (v8_flags.trace) builder()->CallRuntime(Runtime::kTraceEnter);

  // Increment the function-scope block coverage counter.
  BuildIncrementBlockCoverageCounterIfEnabled(literal, SourceRangeKind::kBody);

  // Visit declarations within the function scope.
  if (closure_scope()->is_script_scope()) {
    VisitGlobalDeclarations(closure_scope()->declarations());
  } else if (closure_scope()->is_module_scope()) {
    VisitModuleDeclarations(closure_scope()->declarations());
  } else {
    VisitDeclarations(closure_scope()->declarations());
  }

  // Emit initializing assignments for module namespace imports (if any).
  VisitModuleNamespaceImports();

  // The derived constructor case is handled in VisitCallSuper.
  if (IsBaseConstructor(function_kind())) {
    if (literal->class_scope_has_private_brand()) {
      ClassScope* scope = info()->scope()->outer_scope()->AsClassScope();
      DCHECK_NOT_NULL(scope->brand());
      BuildPrivateBrandInitialization(builder()->Receiver(), scope->brand());
    }

    if (literal->requires_instance_members_initializer()) {
      BuildInstanceMemberInitialization(Register::function_closure(),
                                        builder()->Receiver());
    }
  }

  // Visit statements in the function body.
  VisitStatements(literal->body());

  // Emit an implicit return instruction in case control flow can fall off the
  // end of the function without an explicit return being present on all paths.
  if (!builder()->RemainderOfBlockIsDead()) {
    builder()->LoadUndefined();
    BuildReturn(literal->return_position());
  }
}

我们主要关注VisitStatements(literal->body());

void BytecodeGenerator::VisitStatements(
    const ZonePtrList<Statement>* statements) {
  for (int i = 0; i < statements->length(); i++) {
    // Allocate an outer register allocations scope for the statement.
    RegisterAllocationScope allocation_scope(this);
    Statement* stmt = statements->at(i);
    Visit(stmt);
    if (builder()->RemainderOfBlockIsDead()) break;
  }
}

到这里线索似乎断了,因为Visit的实现不好找,BytecodeGenerator和其父类AstVisitor都没有找到有意义的实现。根据参考链接4和动态调试才知道,BytecodeGenerator调用了一个叫DEFINE_AST_VISITOR_SUBCLASS_MEMBERS的宏,这个宏定义在src\ast\ast.h,里面就有Visit方法。

  void Visit(AstNode* node) {                               \
    if (CheckStackOverflow()) return;                       \
    VisitNoStackOverflowCheck(node);                        \
  }                                                         \

这就是BytecodeGenerator.Visit的实现。继续跟进VisitNoStackOverflowCheck

#define GENERATE_VISIT_CASE(NodeType)                                   \
  case AstNode::k##NodeType:                                            \
    return this->impl()->Visit##NodeType(static_cast<NodeType*>(node));

#define GENERATE_AST_VISITOR_SWITCH()        \
  switch (node->node_type()) {               \
    AST_NODE_LIST(GENERATE_VISIT_CASE)       \
    FAILURE_NODE_LIST(GENERATE_FAILURE_CASE) \
  }

#define DEFINE_AST_VISITOR_SUBCLASS_MEMBERS()               \
 public:                                                    \
  void VisitNoStackOverflowCheck(AstNode* node) {           \
    GENERATE_AST_VISITOR_SWITCH()                           \
  }                                                         \

至此我们可以知道,Assignment类型的AstNode对应的遍历函数就是VisitAssignment

void BytecodeGenerator::VisitAssignment(Assignment* expr) {
  AssignmentLhsData lhs_data = PrepareAssignmentLhs(expr->target());

  VisitForAccumulatorValue(expr->value());

  builder()->SetExpressionPosition(expr);
  BuildAssignment(lhs_data, expr->op(), expr->lookup_hoisting_mode());
}

根据上一节的分析结果,expr->target(), expr->op(), expr->value()分别相当于Babel AST AssignmentExpression中的left, op, right属性。

TODO

参考资料

  1. 由ES规范学JavaScript(二):深入理解“连等赋值”问题:segmentfault.com/a/119000000…
  2. es5.github.io/#x11.13.1
  3. 理解ECMAScript规范(4):zhuanlan.zhihu.com/p/373857729
  4. 连载《Chrome V8 原理讲解》第六篇 bytecode字节码生成:zhuanlan.zhihu.com/p/427676313
转载自:https://juejin.cn/post/7240666488336465976
评论
请登录