JEP 394:instanceof的模式匹配

官方原文(英文)地址: https://openjdk.java.net/jeps/394
个人原创翻译,转载请注明出处。

Summary

Enhance the Java programming language with pattern matching for the instanceof operator. Pattern matching allows common logic in a program, namely the conditional extraction of components from objects, to be expressed more concisely and safely.

摘要

instanceof运算符的模式匹配来增强Java编程语言。模式匹配使程序中的通用逻辑,即从对象中有条件地提取组件,得以更简洁、更安全地表示。

History

Pattern matching for instanceof was proposed by JEP 305 and delivered in JDK 14 as a preview feature. It was re-proposed by JEP 375 and delivered in JDK 15 for a second round of preview.

历史

instanceof的模式匹配由JEP 305提出,并在JDK 14中作为预览特性交付。它在JEP 375中被再次提出,并在JDK 15中交付以做第二轮预览。

This JEP proposes to finalize the feature in JDK 16, with the following refinements:

  • Lift the restriction that pattern variables are implicitly final, to reduce asymmetries between local variables and pattern variables.
  • Make it a compile-time error for a pattern instanceof expression to compare an expression of type S against a pattern of type T, where S is a subtype of T. (This instanceof expression will always succeed and is then pointless. The opposite case, where a pattern match will always fail, is already a compile-time error.)

本JEP提出在JDK 16中完成该特性,并作出下列优化:

  • 撤销模式变量必须是隐式final的限制,以减少局部变量和模式变量之间的不对称性。
  • ST的子类型时,使用S类型的表达式与T类型的模式进行比较,将成为instanceof表达式的一个编译器错误。(这种instanceof表达式总是会成功的,所以没有意义。相反的情况,一个总是会失败的模式匹配,已经是编译错误了。)

Other refinements may be incorporated based on further feedback.

可以根据进一步的反馈合并其他反馈。

Motivation

Nearly every program includes some sort of logic that combines testing if an expression has a certain type or structure, and then conditionally extracting components of its state for further processing. For example, all Java programmers are familiar with the instanceof-and-cast idiom:

动机

几乎每个程序都包含某种逻辑,结合了对表达式的类型或结构的测试,然后有条件地提取其中的状态组件以进行进一步处理。例如,所有Java程序员都熟悉“先instanceof再转换”的习惯用法:

if (obj instanceof String) {
    String s = (String) obj;
    // use s
}
if (obj instanceof String) {
    String s = (String) obj;
    // 使用s
}

There are three things going on here: a test (is obj a String?), a conversion (casting obj to String), and the declaration of a new local variable (s) so we can use the string value. This pattern is straightforward and understood by all Java programmers, but is suboptimal for several reasons. It is tedious; doing both the type test and cast should be unnecessary (what else would you do after an instanceof test?). This boilerplate -- in particular, the three occurrences of the type String --- obfuscates the more significant logic that follows. But most importantly, the repetition provides opportunities for errors to creep unnoticed into programs.

这里发生了三件事情:一个测试(obj是不是一个String),一个转换(将obj转换为String),和定义一个新的局部变量(s)以便我们可以使用字符串的值。这种模式很简单,并且所有Java程序员都可以理解,但是由于一些原因,这不是最优的。这很乏味:应该没有必要既做类型测试,同时又做类型转换(你还能在instanceof测试之后做什么其他的呢?)。这些样板代码——特别是出现了三次的String类型——混淆了后面更重要的逻辑。但最重要的是,重复代码为错误提供了机会且不易被察觉。

Rather than reach for ad-hoc solutions, we believe it is time for Java to embrace pattern matching. Pattern matching allows the desired 'shape' of an object to be expressed concisely (the pattern), and for various statements and expressions to test that 'shape' against their input (the matching). Many languages, from Haskell to C#, have embraced pattern matching for its brevity and safety.

与寻求特定的解决方案相比,我们相信是时候让Java拥抱模式匹配了。模式匹配允许简洁地表达对象所需的“形态”(模式),并允许各种语句和表达式针对其输入来测试“形态”(匹配)。从Hashkell到C#,许多语言都出于其简洁性和安全性而拥抱了模式匹配。

Description

A pattern is a combination of (1) a predicate, or test, that can be applied to a target, and (2) a set of local variables, known as pattern variables, that are extracted from the target only if the predicate successfully applies to it.

描述

模式是二者的组合:(1)一个可以应用于目标的谓词条件或测试;(2)一些局部变量,即当谓词条件满足时由目标解构出来的模式变量

A type pattern consists of a predicate that specifies a type, along with a single pattern variable.

一个类型模式由指定类型的谓词和单个模式变量组成。

The instanceof operator (JLS 15.20.2) is extended to take a type pattern instead of just a type.

instanceof运算符(JLS 15.20.2)被扩展以支持类型模式,而不只是一个类型。

This allows us to refactor the tedious code above to the following:

这允许我们将上述单调的代码重构成下面这样:

if (obj instanceof String s) {
    // Let pattern matching do the work!
    ...
}
if (obj instanceof String s) {
    // 让模式匹配做事情!
    ...
}

(In this code, the phrase String s is the type pattern.) The meaning is intuitive. The instanceof operator matches the target obj to the type pattern as follows: If obj is an instance of String, then it is cast to String and the value is assigned to the variable s.

(这段代码中,词组String s就是类型模式。)含义很直观。instanceof运算符将目标obj匹配到该类型模式:如果obj是一个String的实例,那么将其转换为String并将值赋值到变量s上。

The conditionality of pattern matching — if a value does not match a pattern, then the pattern variable is not assigned a value — means that we have to consider carefully the scope of the pattern variable. We could do something simple and say that the scope of the pattern variable is the containing statement and all subsequent statements in the enclosing block. But this has unfortunate poisoning consequences, for example:

模式匹配的条件——如果值不能匹配到模式,那么模式变量不会被赋值——意味着我们需要谨慎考虑模式变量的作用域。我们可以做一些简单的事,让模式变量的作用域包含该语句和所有代码块中随后的语句。但不幸的是这样会污染结果,例如:

if (a instanceof Point p) {
    ...
}
if (b instanceof Point p) { // ERROR - p is in scope
    ...
}
if (a instanceof Point p) {
    ...
}
if (b instanceof Point p) { // 错误 - p 仍在作用域内
    ...
}

In other words, by the second statement the pattern variable p would be in a poisoned state — it is in scope, but it should not be accessible since it may not be assigned a value. But even though it shouldn't be accessed, since it is in scope, we can't just declare it again. This means that a pattern variable can become poisoned after it is declared, so programmers would have to think of lots of distinct names for their pattern variables.

换句话说,第二个语句中模式变量p会处于被污染的状态——它在作用域中,但却不可访问,因为它没有被赋值。但尽管它不能被访问,但因为在作用域中,我们不能再次声明它。这意味着模式变量在声明之后被污染,所以开发者不得不为它们的模式变量考虑很多不重复的名字。

Rather than using a coarse approximation for the scope of pattern variables, pattern variables instead use the concept of flow scoping. A pattern variable is only in scope where the compiler can deduce that the pattern has definitely matched and the variable will have been assigned a value. This analysis is flow sensitive and works in a similar way to existing flow analyses such as definite assignment. Returning to our example:

模式变量不是使用粗略近似作用域,而是使用流程作用域的概念。模式变量仅在编译器可以推断出模式明确会匹配并且变量将会被赋值的作用域内有效。这种分析是流程敏感的,并以现有的流程分析的工作方式类似,例如明确赋值。回到我们的例子:

if (a instanceof Point p) {
    // p is in scope
    ...
}
// p not in scope here
if (b instanceof Point p) { // Sure!
    ...
}
if (a instanceof Point p) {
    // p 在作用域内
    ...
}
// p 这里不在作用域内
if (b instanceof Point p) { // 当然可以!
    ...
}

The motto is: "A pattern variable is in scope where it has definitely matched". This allows for the safe reuse of pattern variables and is both intuitive and familiar, since Java developers are already used to flow sensitive analyses.

这个理念是:“模式变量在它明确匹配的范围内”。这允许安全地重用模式变量,且既直观又熟悉,因为 Java 开发人员已经习惯于流程敏感的分析。

When the conditional expression of the if statement grows more complicated than a single instanceof, the scope of the pattern variable grows accordingly. For example, in this code:

if语句的条件表达式变得比单个instanceof更加复杂时,模式变量的作用域也会对应地改变。例如,在这段代码中:

if (obj instanceof String s && s.length() > 5) {
    flag = s.contains("jdk");
}
if (obj instanceof String s && s.length() > 5) {
    flag = s.contains("jdk");
}

the pattern variable s is in scope on the right hand side of the && operator, as well as in the true block. (The right hand side of the && operator is only evaluated if the pattern match succeeded and assigned a value to s.) On the other hand, the following code does not compile:

模式变量s的作用域包含&&运算符的右边,同时也是整个true的部分。(&&运算符的右边只会在模式匹配成功并赋值到s时才会执行。)另一方面,下面的代码无法编译:

if (obj instanceof String s || s.length() > 5) { // Error!
    ...
}
if (obj instanceof String s || s.length() > 5) { // 错误!
    ...
}

Because of the semantics of the || operator, the pattern variable s might not have been assigned and so the flow analysis dictates that the variable s is not in scope on the right hand side of the || operator.

因为||运算符的语义,模式变量s可能不会被赋值,所以流程分析检测到变量s的作用域不包括||的右侧。

The use of pattern matching in instanceof should significantly reduce the overall number of explicit casts in Java programs. Type test patterns are particularly useful when writing equality methods. Consider the following equality method taken from Item 10 of Effective Java:

instanceof中使用模式匹配,应该会显著减少在Java程序中使用显示转换的次数。类型测试模式在编写相等判断方法时特别有用。考虑下面来自Effective Java中第10条的相等判断方法:

public boolean equals(Object o) {
    return (o instanceof CaseInsensitiveString) &&
        ((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
}
public boolean equals(Object o) {
    return (o instanceof CaseInsensitiveString) &&
        ((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
}

Using a type pattern means it can be rewritten to the clearer:

使用类型模式,意味着它可以重新写成更清楚的样子:

public boolean equals(Object o) {
    return (o instanceof CaseInsensitiveString cis) &&
        cis.s.equalsIgnoreCase(s);
}
public boolean equals(Object o) {
    return (o instanceof CaseInsensitiveString cis) &&
        cis.s.equalsIgnoreCase(s);
}

Other equals methods are even more dramatically improved. Consider the class Point from above, where we might write an equals method as follows:

其他的equals方法更会有戏剧性地提升。考虑上述的Point类,我们可能会编写下面这样的equals方法:

public boolean equals(Object o) {
    if (!(o instanceof Point))
        return false;
    Point other = (Point) o;
    return x == other.x
        && y == other.y;
}
public boolean equals(Object o) {
    if (!(o instanceof Point))
        return false;
    Point other = (Point) o;
    return x == other.x
        && y == other.y;
}

Using pattern matching instead, we can combine these multiple statements into a single expression, eliminating the repetition and simplifying the control flow:

使用模式匹配作为替代,我们可以将多个语句组合为一个表达式,消除重复并简化流程控制:

public boolean equals(Object o) {
    return (o instanceof Point other)
        && x == other.x
        && y == other.y;
}
public boolean equals(Object o) {
    return (o instanceof Point other)
        && x == other.x
        && y == other.y;
}

The flow scoping analysis for pattern variables is sensitive to the notion of whether a statement can complete normally. For example, consider the following method:

模式变量的流程范围分析,对于语句是否正常完成很敏感。例如,考虑下面的方法:

public void onlyForStrings(Object o) throws MyException {
    if (!(o instanceof String s))
        throw new MyException();
    // s is in scope
    System.out.println(s);
    ...
}
public void onlyForStrings(Object o) throws MyException {
    if (!(o instanceof String s))
        throw new MyException();
    // s 在作用域内
    System.out.println(s);
    ...
}

This method tests whether its parameter o is a String, and throws an exception if not. It is only possible to reach the println statement if the conditional statement has completed normally. Because the contained statement of the conditional statement can never complete normally, this can only occur if the conditional expression has evaluated to the value false, which, in turn, means that the pattern matching has succeeded. Accordingly, the scope of the pattern variable s safely includes the statements following the conditional statement in the method block.

该方法测试它的参数o是不是一个String,如果不是则抛出异常。只有当条件语句正常结束时,才有可能到达println语句。因为条件语句所包含的语句永远无法正常完成,这只有在条件语句运算得到false时才会发生,反过来意味着模式匹配已经成功。因此,模式变量s的作用域可以安全地包含代码块中条件语句之后的语句。

Pattern variables are just a special case of local variables, and aside from the definition of their scope, in all other respects pattern variables are treated as local variables. In particular, this means that (1) they can be assigned to, and (2) they can shadow a field declaration. For example:

模式变量只是局部变量的一个特例,除了它作用域的声明之外,在所有其他方面,模式变量都会被视为局部变量。特别是,这意味着:(1)它们可以被赋值;(2)它们可以覆盖字段声明。例如:

class Example1 {
    String s;
    void test1(Object o) {
        if (o instanceof String s) {
            System.out.println(s); // Field s is shadowed
            s = s + "\n";          // Assignment to pattern variable
            ...
        }
        System.out.println(s);     // Refers to field s
        ...
    }
}
class Example1 {
    String s;
    void test1(Object o) {
        if (o instanceof String s) {
            System.out.println(s); // 字段被覆盖了
            s = s + "\n";          // 为模式变量赋值
            ...
        }
        System.out.println(s);     // 引用字段 s
        ...
    }
}

However, the flow scoping nature of pattern variables means that some care must be taken to determine whether a name refers to a pattern variable declaration shadowing a field declaration or to the field declaration itself.

然而,模式变量的流程作用域性质,意味着必须注意区分:名称是指覆盖了字段声明的模式变量,还是字段声明本身。

class Example2 {
    Point p;
    void test2(Object o) {
        if (o instanceof Point p) {
            // p refers to the pattern variable
            ...
        } else {
            // p refers to the field
            ...
        }
    }
}
class Example2 {
    Point p;
    void test2(Object o) {
        if (o instanceof Point p) {
            // p 指的是模式变量
            ...
        } else {
            // p 指的是字段
            ...
        }
    }
}

The instanceofgrammar is extended accordingly:

instanceof的语法对应地扩展为:

RelationalExpression:
    ...
    RelationalExpression instanceof ReferenceType
    RelationalExpression instanceof Pattern

Pattern:
    ReferenceType Identifier
关系表达式:
    ...
    关系表达式 instanceof 引用类型
    关系表达式 instanceof 模式

模式:
引用类型 标识符

Future Work

Future JEPs will enhance the Java programming language with richer forms of patterns, such as deconstruction patterns for record classes, and pattern matching for other language constructs, such as switch expressions and statements.

未来的工作

未来的JEP将通过更丰富的模式来增强Java编程语言,例如记录类的解构模式,以及其他语言结构的模式匹配,例如switch表达式和语句。

Alternatives

The benefits of type patterns could be obtained by flow typing in if statements, or by a type switch construct. Pattern matching generalizes both of these constructs.

备选方案

类型模式的收益,可以体现在if语句的流程键入类型switch结构中得到体现。模式匹配概括了这两种结构。