JEP 378:文本块

官方原文(英文)地址: https://openjdk.java.net/jeps/378
个人原创翻译,转载请注明出处。

Summary

Add text blocks to the Java language. A text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in a predictable way, and gives the developer control over the format when desired.

摘要

文本块添加到Java语言。文本块是多行的字符串字面量,避免了大多数转义序列的需要,以一种可预测的方式自动设置字符串的格式,并在需要时允许开发者控制格式。

History

Text blocks were proposed by JEP 355 in early 2019 as a follow-on to explorations begun in JEP 326 (Raw String Literals), which was initially targeted to JDK 12 but eventually withdrawn and did not appear in that release. JEP 355 was targeted to JDK 13 in June 2019 as a preview feature. Feedback on JDK 13 suggested that text blocks should be previewed again in JDK 14, with the addition of two new escape sequences. Consequently, JEP 368 was targeted to JDK 14 in November 2019 as a preview feature. Feedback on JDK 14 suggested that text blocks were ready to become final and permanent in JDK 15 with no further changes.

历史

文本块以JEP 355在2019年初提出,作为在JEP 326(原始字符串字面量)中试探的后续动作,该试探最初针对JDK 12,但最终被撤回且并未出现在该发行版中。JEP 355于2019年6月在JDK 13中作为预览特性。对JDK 13的反馈建议应再次预览该特性,并增加两个新的转义序列,因而后续的JEP 368用于2019年11月的JDK 14。对JDK 14的反馈表明,文本块特性已经准备好作为最终永久特性。

Goals

  • Simplify the task of writing Java programs by making it easy to express strings that span several lines of source code, while avoiding escape sequences in common cases.
  • Enhance the readability of strings in Java programs that denote code written in non-Java languages.
  • Support migration from string literals by stipulating that any new construct can express the same set of strings as a string literal, interpret the same escape sequences, and be manipulated in the same ways as a string literal.
  • Add escape sequences for managing explicit white space and newline control.

目标

  • 在写Java程序时,让表示跨行源代码的字符串更容易,同时避免常见情况下的转义序列。
  • 增强Java程序中由非Java语言编写的代码字符串的可读性。
  • 规定任何新构造都可以表示与字符串字面量相同的字符串集,解释相同的转义序列并以与字符串字面量相同的方式进行操作,以此来支持从字符串字面量的迁移。
  • 添加转义序列以管理显式空格与换行控制符。

Non-Goals

  • It is not a goal to define a new reference type, distinct from java.lang.String, for the strings expressed by any new construct.
  • It is not a goal to define new operators, distinct from +, that take String operands.
  • Text blocks do not directly support string interpolation. Interpolation may be considered in a future JEP. In the meantime, the new instance method String::formatted aids in situations where interpolation might be desired.
  • Text blocks do not support raw strings, that is, strings whose characters are not processed in any way.

非目标

  • 为任何新构造表示的字符串定义不同于java.lang.String的新引用类型,这不是目标。
  • 定义不同于+的、用于String的新运算符,这不是目标。
  • 文本块不直接支持字符串插值。将来的JEP中可能会考虑内插。同时,新的实例方法String::formatted在可能需要插值的情况下提供了帮助。
  • 文本块不支持原始字符串,也就是字符不以任何方式处理的字符串。

Motivation

In Java, embedding a snippet of HTML, XML, SQL, or JSON in a string literal "..." usually requires significant editing with escapes and concatenation before the code containing the snippet will compile. The snippet is often difficult to read and arduous to maintain.

动机

在Java中,在字符串字面量"..."中嵌入HTML、XML、SQL或JSON片段通常需要先进行转义和串联的大量编辑,然后才能编译包括该片段的代码。这样的代码通常难以阅读且难以维护。

More generally, the need to denote short, medium, and long blocks of text in a Java program is near universal, whether the text is code from other programming languages, structured text representing golden files, or messages in natural languages. On the one hand, the Java language recognizes this need by allowing strings of unbounded size and content; on the other hand, it embodies a design default that strings should be small enough to denote on a single line of a source file (surrounded by " characters), and simple enough to escape easily. This design default is at odds with the large number of Java programs where strings are too long to fit comfortably on a single line.

通常,无论文本是来自于其它编程语言的代码、代表文件的结构化文本还是自然语言的消息,在Java程序中表示短、中、长文本块的需求都十分普遍。一方面,Java语言通过允许非绑定大小和内容的字符串来认识到这一需求。另一方面,它体现了一个默认设计,即字符串应足够小以表示为源文件的一行中(用"字符包围),并且应该足够简单以易于转义。该默认设计与大数字不符。字符串太长而无法很好地放在Java程序的一行中。

Accordingly, it would improve both the readability and the writability of a broad class of Java programs to have a linguistic mechanism for denoting strings more literally than a string literal -- across multiple lines and without the visual clutter of escapes. In essence, a two-dimensional block of text, rather than a one-dimensional sequence of characters.

因此,如果有一种语言层面的机制,可以更直观地表示字符串,且可以跨多行显式,还不会出现转义的视觉混乱,那么这将大大提高Java程序的可读性和可写性。这在本质上是二维文本块,而不是一维字符序列。

Still, it is impossible to predict the role of every string in Java programs. Just because a string spans multiple lines of source code does not mean that newline characters are desirable in the string. One part of a program may be more readable when strings are laid out over multiple lines, but the embedded newline characters may change the behavior of another part of the program. Accordingly, it would be helpful if the developer had precise control over where newlines appear, and, as a related matter, how much white space appears to the left and right of the "block" of text.

尽管如此,仍然无法预测Java程序中每个字符串的角色。仅仅因为一个字符串跨越源代码的多行,并不意味着该字符串中需要换行符。当字符串放在多行中时,程序的一部分可能更具可读性,但是嵌入的换行符可能会更改程序另一部分的行为。因此,这有助于开发者精确控制换行出现的位置,相关的问题,以及可以在文本块的左侧和右侧显示多少空格。

HTML example

Using "one-dimensional" string literals

HTML示例

使用“一维”字符串字面量:

String html = "<html>\n" +
              "    <body>\n" +
              "        <p>Hello, world</p>\n" +
              "    </body>\n" +
              "</html>\n";
String html = "<html>\n" +
              "    <body>\n" +
              "        <p>Hello, world</p>\n" +
              "    </body>\n" +
              "</html>\n";

Using a "two-dimensional" block of text

使用“二维”字符串字面量:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
              """;
String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
              """;

SQL example

Using "one-dimensional" string literals

SQL示例

使用“一维”字符串字面量:

String query = "SELECT \"EMP_ID\", \"LAST_NAME\" FROM \"EMPLOYEE_TB\"\n" +
               "WHERE \"CITY\" = 'INDIANAPOLIS'\n" +
               "ORDER BY \"EMP_ID\", \"LAST_NAME\";\n";
String query = "SELECT \"EMP_ID\", \"LAST_NAME\" FROM \"EMPLOYEE_TB\"\n" +
               "WHERE \"CITY\" = 'INDIANAPOLIS'\n" +
               "ORDER BY \"EMP_ID\", \"LAST_NAME\";\n";

Using a "two-dimensional" block of text

使用“二维”字符串字面量:

String query = """
               SELECT "EMP_ID", "LAST_NAME" FROM "EMPLOYEE_TB"
               WHERE "CITY" = 'INDIANAPOLIS'
               ORDER BY "EMP_ID", "LAST_NAME";
               """;
String query = """
               SELECT "EMP_ID", "LAST_NAME" FROM "EMPLOYEE_TB"
               WHERE "CITY" = 'INDIANAPOLIS'
               ORDER BY "EMP_ID", "LAST_NAME";
               """;

Polyglot language example

Using "one-dimensional" string literals

多语言示例

使用“一维”字符串字面量:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
                         "    print('\"Hello, world\"');\n" +
                         "}\n" +
                         "\n" +
                         "hello();\n");
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
                         "    print('\"Hello, world\"');\n" +
                         "}\n" +
                         "\n" +
                         "hello();\n");

Using a "two-dimensional" block of text

使用“二维”字符串字面量:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
                         function hello() {
                             print('"Hello, world"');
                         }
                         
                         hello();
                         """);
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
                         function hello() {
                             print('"Hello, world"');
                         }
                         
                         hello();
                         """);

Description

This section is identical to the same section in this JEP's predecessor, JEP 355, except for the addition of the subsection on new escape sequences.

描述

此部分与本JEP的前身JEP 355中对应的部分相同,只是添加了新增转义序列的部分。

A text block is a new kind of literal in the Java language. It may be used to denote a string anywhere that a string literal could appear, but offers greater expressiveness and less accidental complexity.

文本块是Java语言中一种新型字面量。在字符串字面量可以出现的任何地方,它都可以用于表示字符串,但是可以提供更高的表现力和更少的意外复杂性。

A text block consists of zero or more content characters, enclosed by opening and closing delimiters.

文本块由零到多个内容字符组成,并由开头和结尾分隔符括起来。

The opening delimiter is a sequence of three double quote characters (""") followed by zero or more white spaces followed by a line terminator. The content begins at the first character after the line terminator of the opening delimiter.

开头分隔符是由三个双引号字符(""")组成的序列,后面跟零到多个空格,后跟一个行终止符。内容从开头分隔符的行终止符之后的第一个字符开始。

The closing delimiter is a sequence of three double quote characters. The content ends at the last character before the first double quote of the closing delimiter.

结尾分隔符是三个双引号字符的序列。内容在结尾分隔符的第一个双引号之前的最后一个字符处结束。

The content may include double quote characters directly, unlike the characters in a string literal. The use of \" in a text block is permitted, but not necessary or recommended. Fat delimiters (""") were chosen so that " characters could appear unescaped, and also to visually distinguish a text block from a string literal.

与字符串字面量中的字符不同,内容可以直接包含双引号字符。允许在文本块中使用\",但不是必须或建议使用。使用胖分隔符(""")可以允许"字符可以不转义就出现,并在视觉上区分文本块和字符串字面量。

The content may include line terminators directly, unlike the characters in a string literal. The use of \n in a text block is permitted, but not necessary or recommended. For example, the text block:

与字符串字面量中的字符不同,内容可以直接包含行终止符。允许在文本块中使用\n,但不是必需或建议使用。例如,文本块:

"""
line 1
line 2
line 3
"""
"""
line 1
line 2
line 3
"""

is equivalent to the string literal:

等效于字符串字面量:

"line 1\nline 2\nline 3\n"
"line 1\nline 2\nline 3\n"

or a concatenation of string literals:

或者字符串字面量的拼接:

"line 1\n" +
"line 2\n" +
"line 3\n"
"line 1\n" +
"line 2\n" +
"line 3\n"

If a line terminator is not required at the end of the string, then the closing delimiter can be placed on the last line of content. For example, the text block:

如果在字符串的末尾不需要行终止符,那么可以将结尾分隔符放在内容的最后一行。例如,文本块:

"""
line 1
line 2
line 3"""
"""
line 1
line 2
line 3"""

is equivalent to the string literal:

等效于字符串字面量:

"line 1\nline 2\nline 3"
"line 1\nline 2\nline 3"

A text block can denote the empty string, although this is not recommended because it needs two lines of source code:

文本块可以表示空字符串,尽管不建议,因为这需要两行代码:

String empty = """
""";
String empty = """
""";

Here are some examples of ill-formed text blocks:

下面是一些格式错误的文本块示例:

String a = """""";   // no line terminator after opening delimiter
String b = """ """;  // no line terminator after opening delimiter
String c = """
           ";        // no closing delimiter (text block continues to EOF)
String d = """
           abc \ def
           """;      // unescaped backslash (see below for escape processing)
String a = """""";   // 开头分隔符后面没有行终止符
String b = """ """;  // 开头分隔符后面没有行终止符
String c = """
           ";        // 没有结尾分隔符(代码块持续到EOF)
String d = """
           abc \ def
           """;      // 未转义的反斜杠(参考下面的转义处理)

Compile-time processing

A text block is a constant expression of type String, just like a string literal. However, unlike a string literal, the content of a text block is processed by the Java compiler in three distinct steps:

  1. Line terminators in the content are translated to LF (\u000A). The purpose of this translation is to follow the principle of least surprise when moving Java source code across platforms.
  2. Incidental white space surrounding the content, introduced to match the indentation of Java source code, is removed.
  3. Escape sequences in the content are interpreted. Performing interpretation as the final step means developers can write escape sequences such as \n without them being modified or deleted by earlier steps.

编译期处理

文本块是String类型的常量表达式,就像字符串字面量。但是与字符串字面量不同,Java编译期通过三个不同的步骤处理文本块的内容:

  1. 内容中的行终止符将转换为LF(\u000A)。这是为了在跨平台移动Java源代码时遵循最小惊喜原则。
  2. 删除内容周围附带的空格,以匹配Java源代码的缩进。
  3. 解释内容中的转义序列。将解释作为最后一步,意味着开发人员可以编写如\n的转义序列,而不会被前置步骤修改或删除。

The processed content is recorded in the class file as a CONSTANT_String_info entry in the constant pool, just like the characters of a string literal. The class file does not record whether a CONSTANT_String_info entry was derived from a text block or a string literal.

处理的内容会作为常量池中的CONSTANT_String_info条目记录在class文件中,就像字符串字面量的字符一样。该class文件不会记录CONSTANT_String_info条目是从文本块还是字符串字面量派生的。

At run time, a text block is evaluated to an instance of String, just like a string literal. Instances of String that are derived from text blocks are indistinguishable from instances derived from string literals. Two text blocks with the same processed content will refer to the same instance of String due to interning, just like for string literals.

在运行时,文本块会成为String类的实例,就像字符串字面量一样。从文本块派生的String实例和从字符串字面量派生的实例没有区别。具有相同内容的两个文本块,会引用同一个String实例而进行内部化,就像字符串字面量

The following sections discuss compile-time processing in more detail.

下面各小节将更详细地讨论编译期处理。

1. Line terminators

Line terminators in the content are normalized from CR (\u000D) and CRLF (\u000D\u000A) to LF (\u000A) by the Java compiler. This ensures that the string derived from the content is equivalent across platforms, even if the source code has been translated to a platform encoding (see javac -encoding).

1. 行终止符

Java编译期将内容中的行终止符从CR(\u000D)和CRLF(\u000D\u000A)标准化为LF(\u000A)。这样可以确保从内容生成的字符串在各个平台上都是等效的,即使源代码已转换为平台编码(参见javac -encoding)。

For example, if Java source code that was created on a Unix platform (where the line terminator is LF) is edited on a Windows platform (where the line terminator is CRLF), then without normalization, the content would become one character longer for each line. Any algorithm that relied on LF being the line terminator might fail, and any test that needed to verify string equality with String::equals would fail.

例如,在Windows平台(行终止符为CRLF)上编辑了在Unix平台(行终止符为LF)上创建的Java源代码,如果不进行标准化,那么内容会在每一行上都多一个字符。任何依赖LF作为行终止符的算法都可能失败,并且任何需要使用String::equals验证字符串相等的测试都将失败。

The escape sequences \n (LF), \f (FF), and \r (CR) are not interpreted during normalization; escape processing happens later.

转义序列\n(LF)、\f(FF)和\r(CR)在标准化过程中不会被解释;转义处理会稍后进行。

2. Incidental white space

The text blocks shown above were easier to read than their concatenated string literal counterparts, but the obvious interpretation for the content of a text block would include the spaces added to indent the embedded string so that it lines up neatly with the opening delimiter. Here is the HTML example using dots to visualize the spaces that the developer added for indentation:

<5>2. 附带空格

上面展示的文本块比分成几块连接的字符串字面量更易于阅读,但显然对文本块内容的解释会包括为了缩进嵌入的字符串而添加的空格,以便与开头分隔符对齐。这是使用点的HTML示例,以可视化开发人员为缩进添加的空格:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";
String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";

Since the opening delimiter is generally positioned to appear on the same line as the statement or expression which consumes the text block, there is no real significance to the fact that 14 visualized spaces start each line. Including those spaces in the content would mean the text block denotes a string different from the one denoted by the concatenated string literals. This would hurt migration, and be a recurring source of surprise: it is overwhelmingly likely that the developer does not want those spaces in the string. Also, the closing delimiter is generally positioned to align with the content, which further suggests that the 14 visualized spaces are insignificant.

由于开头分隔符通常位于和文本块语句或表达式的同一行上,所以在每行开头有14个可视化空格没有任何实际意义。在内容中包含这些空格将意味着,文本块表示的字符串和串联字符串字面量表示的字符串不同。这会阻碍迁移,并且经常会出现意外:开发者很可能希望在字符串中保留这些空格。同样,结束分隔符通常会与内容对齐,这进一步说明了14个可视化空格无关紧要。

Spaces may also appear at the end of each line, especially when a text block is populated by copy-pasting snippets from other files (which may themselves have been formed by copy-pasting from yet more files). Here is the HTML example reimagined with some trailing white space, again using dots to visualize spaces:

空格也可能出现在每一行的末尾,尤其是当文本块是从其他文件中复制粘贴的片段时(这些片段本身可能是从更多文件中复制粘贴的)。这是用一些末尾的空格重新构想的HTML示例,同样使用点来可视化空格:

String html = """
..............<html>...
..............    <body>
..............        <p>Hello, world</p>....
..............    </body>.
..............</html>...
..............""";
String html = """
..............<html>...
..............    <body>
..............        <p>Hello, world</p>....
..............    </body>.
..............</html>...
..............""";

Trailing white space is most often unintentional, idiosyncratic, and insignificant. It is overwhelmingly likely that the developer does not care about it. Trailing white space characters are similar to line terminators, in that both are invisible artifacts of the source code editing environment. With no visual guide to the presence of trailing white space characters, including them in the content would be a recurring source of surprise, as it would affect the length, hash code, etc, of the string.

末尾空格通常是无目的、无意义的。开发者很可能不关心它。末尾空格字符与行终止符类似,因为两者都是源代码编辑环境中不可见的部分。如果没有视觉上对末尾空格字符的提示,那么在内容中包含末尾空格字符将是令人惊讶的重复诱因,因为这会影响字符串的长度、哈希值等。

Accordingly, an appropriate interpretation for the content of a text block is to differentiate incidental white space at the start and end of each line, from essential white space. The Java compiler processes the content by removing incidental white space to yield what the developer intended. String::indent can then be used to further manipulate indentation if desired. Using | to visualize margins:

因此,对文本块内容的适当解释是,将每行开头和结尾处附带的空格必要的空格区分开。Java编译器通过删除附带的空格来处理内容,以生成开发者想要的内容。如果需要,可以使用String::indent进一步处理缩进。使用|可视化边距:

|<html>|
|    <body>|
|        <p>Hello, world</p>|
|    </body>|
|</html>|
|<html>|
|    <body>|
|        <p>Hello, world</p>|
|    </body>|
|</html>|

The re-indentation algorithm takes the content of a text block whose line terminators have been normalized to LF. It removes the same amount of white space from each line of content until at least one of the lines has a non-white space character in the leftmost position. The position of the opening """ characters has no effect on the algorithm, but the position of the closing """ characters does have an effect if placed on its own line. The algorithm is as follows:

  1. Split the content of the text block at every LF, producing a list of individual lines. Note that any line in the content which was just an LF will become an empty line in the list of individual lines.
  2. Add all non-blank lines from the list of individual lines into a set of determining lines. (Blank lines -- lines that are empty or are composed wholly of white space -- have no visible influence on the indentation. Excluding blank lines from the set of determining lines avoids throwing off step 4 of the algorithm.)
  3. If the last line in the list of individual lines (i.e., the line with the closing delimiter) is blank, then add it to the set of determining lines. (The indentation of the closing delimiter should influence the indentation of the content as a whole -- a significant trailing line policy.)
  4. Compute the common white space prefix of the set of determining lines, by counting the number of leading white space characters on each line and taking the minimum count.
  5. Remove the common white space prefix from each non-blank line in the list of individual lines.
  6. Remove all trailing white space from all lines in the modified list of individual lines from step 5. This step collapses wholly-white-space lines in the modified list so that they are empty, but does not discard them.
  7. Construct the result string by joining all the lines in the modified list of individual lines from step 6, using LF as the separator between lines. If the final line in the list from step 6 is empty, then the joining LF from the previous line will be the last character in the result string.

重新缩进算法所处理的文本块中,行终止符已经标准化为LF。它将从内容的每一行中删除相同数量的空格,直到其中至少一行在最左侧位置具有非空格字符。开头的"""字符的位置对算法没有影响,但是结尾"""字符的位置如果放在自己的行上就会有影响。算法如下:

  1. 在每个LF处分割文本块的内容,生成单独行的列表。请注意,内容中任何只有LF的行都将成为列表中的空白行。
  2. 在独立行的列表中,将所有非空白行添加到确定行的集合中。(空白行——空白或完全由空格组成的行——对缩进没有可见影响。从确定行的集合中排除空白行,可以避免引发算法中的第4步。)
  3. 如果独立行列表中的最后一行(即带有结尾分隔符的一行)是空白,那么将其添加到确定行集合中。(结尾分隔符的缩进影响整个内容的缩进——这是重要尾行策略。)
  4. 计算出每行前置空格字符的数目并取得最小值,来确定整个集合的公共空白前缀。
  5. 为独立行列表中的每一个非空白行删除公共空白前缀。
  6. 为第5步中已修改的所有行都删除末尾的空格。该步骤折叠已修改列表中的全空白行,使它们为空,但不丢弃它们。
  7. 用LF作为行之间的分隔符,将第6步中已修改的所有行连接起来,构造结果字符串。如果第6步的列表中最后一行是空的,则前一行连接的LF将是字符串中最后一个字符。

The escape sequences \b (backspace), \t (tab) and \s (space) are not interpreted by the algorithm; escape processing happens later. Similarly, the \<line-terminator> escape sequence does not prevent the splitting of lines on the line-terminator since the sequence is treated as two separate characters until escape processing.

转义序列\b(退格)、\t(Tab)和\s(空格)不会被算法解释;转义会稍后发生。类似地,\<行终止符>转义序列不会阻止行终止符的行拆分,因为在转义处理之前,该序列被视为两个单独的字符。

The re-indentation algorithm will be normative in The Java Language Specification. Developers will have access to it via String::stripIndent, a new instance method.

重新缩进算法在Java语言规范中是规范性的。开发者可以通过新的实例方法String::stripIndent使用它。

Significant trailing line policy

Normally, one would format a text block in two ways: first, position the left edge of the content to appear under the first " of the opening delimiter, and second, place the closing delimiter on its own line to appear exactly under the opening delimiter. The resulting string will have no white space at the start of any line, and will not include the trailing blank line of the closing delimiter.

重要尾行策略

通常可以通过两种方式设置文本块的格式:第一,将左侧边缘放在开头分隔符的第一个"下方;第二,将结尾分隔符放在独立一行,且正好位于开头分隔符的下方。结果字符串在所有行的开头都没有空格,且不包含结尾分隔符的末尾空白行。

However, because the trailing blank line is considered a determining line, moving it to the left has the effect of reducing the common white space prefix, and therefore reducing the the amount of white space that is stripped from the start of every line. In the extreme case, where the closing delimiter is moved all the way to the left, that reduces the common white space prefix to zero, effectively opting out of white space stripping.

但是,由于末尾的空白行被视为确定行,因此将其向左移动可以减少公共空白前缀,从而减少每行开头去除的空格数量。在极端情况下,结束分隔符一直向左移动,可以将公共空格前缀减少为零,从而有效地实现了空白剥离。

For example, with the closing delimiter moved all the way to the left, there is no incidental white space to visualize with dots:

例如,当结束分隔符一直向左移动时,这里没有附带的空格可以用点表示:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
""";
String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
""";

Including the trailing blank line with the closing delimiter, the common white space prefix is zero, so zero white space is removed from the start of each line. The algorithm thus produces: (using | to visualize the left margin)

包括末尾的空白行和结尾分隔符在内,公共空白前缀为零,因此从每行的开头删除零空白。该算法因此生成:(使用|表示左侧边距)

|              <html>
|                  <body>
|                      <p>Hello, world</p>
|                  </body>
|              </html>
|              <html>
|                  <body>
|                      <p>Hello, world</p>
|                  </body>
|              </html>

Alternatively, suppose the closing delimiter is not moved all the way to the left, but rather under the t of html so it is eight spaces deeper than the variable declaration:

或者,假设结尾分隔符没有一直移到最左侧,而是移动到htmlt下面,那么它比变量声明多8个空格:

String html = """
        <html>
            <body>
                <p>Hello, world</p>
            </body>
        </html>
        """;
String html = """
        <html>
            <body>
                <p>Hello, world</p>
            </body>
        </html>
        """;

The spaces visualized with dots are considered to be incidental:

用点表示的空格被认为是附带的:

String html = """
........      <html>
........          <body>
........              <p>Hello, world</p>
........          </body>
........      </html>
........""";
String html = """
........      <html>
........          <body>
........              <p>Hello, world</p>
........          </body>
........      </html>
........""";

Including the trailing blank line with the closing delimiter, the common white space prefix is eight, so eight white spaces are removed from the start of each line. The algorithm thus preserves the essential indentation of the content relative to the closing delimiter:

包括末尾的空白行和结尾分隔符在内,公共空白前缀为8,因此从每行的开头删除了8个空格。因此,该算法保留了内容相对于结尾分隔符的本来缩进:

|      <html>
|          <body>
|              <p>Hello, world</p>
|          </body>
|      </html>
|      <html>
|          <body>
|              <p>Hello, world</p>
|          </body>
|      </html>

Finally, suppose the closing delimiter is moved slightly to the right of the content:

最后,假设结尾分隔符向右稍微移动一点:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
                  """;
String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
                  """;

The spaces visualized with dots are considered to be incidental:

用点表示的空格被认为是附带的:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............    """;
String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............    """;

The common white space prefix is 14, so 14 white spaces are removed from the start of each line. The trailing blank line is stripped to leave an empty line, which being the last line is then discarded. In other words, moving the closing delimiter to the right of the content has no effect, and the algorithm again preserves the essential indentation of the content:

公共空白前缀为14,所以从每行的开头删除了14个空格。删除末尾的空白行以留下空行,然后将其作为最后一行丢弃。也就是说,将结尾分隔符移到内容的右侧没有任何作用,并且算法再次保留了内容的基本缩进:

|<html>
|    <body>
|        <p>Hello, world</p>
|    </body>
|</html>
|<html>
|    <body>
|        <p>Hello, world</p>
|    </body>
|</html>
3. Escape sequences

After the content is re-indented, any escape sequences in the content are interpreted. Text blocks support all of the escape sequences supported in string literals, including \n, \t, \', \", and \\. See section 3.10.6 of the The Java Language Specification for the full list. Developers will have access to escape processing via String::translateEscapes, a new instance method.

3. 转义序列

重新缩进内容后,将会解释内容中所有的转义序列。文本块支持字符串字面量中支持的所有转义序列,包括\n\t\'\"\\。有关完整列表,请参考Java语言规范第3.10.6节。开发者可以用新的实例方法String::translateEscapes来进行转义处理。

Interpreting escapes as the final step allows developers to use \n, \f, and \r for vertical formatting of a string without it affecting the translation of line terminators in step 1, and to use \b and \t for horizontal formatting of a string without it affecting the removal of incidental white space in step 2. For example, consider this text block that contains the \r escape sequence (CR):

解释转义是最后一步,允许开发者使用\n\f\r进行字符串的垂直格式化,而又不影响第1步中行终止符的转换,使用\b\t进行字符串的水平格式化,而不会影响第2步中附带空格的去除。例如,考虑包含转义序列\r(CR)的文本块:

String html = """
              <html>\r
                  <body>\r
                      <p>Hello, world</p>\r
                  </body>\r
              </html>\r
              """;
String html = """
              <html>\r
                  <body>\r
                      <p>Hello, world</p>\r
                  </body>\r
              </html>\r
              """;

The CR escapes are not processed until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), the result is:

直到将行终止符标准化为LF后,转义字符CR才会被处理。使用Unicode转义可视化LF(\u000A)和CR(\u000D),结果是:

|<html>\u000D\u000A
|    <body>\u000D\u000A
|        <p>Hello, world</p>\u000D\u000A
|    </body>\u000D\u000A
|</html>\u000D\u000A
|<html>\u000D\u000A
|    <body>\u000D\u000A
|        <p>Hello, world</p>\u000D\u000A
|    </body>\u000D\u000A
|</html>\u000D\u000A

Note that it is legal to use " and "" freely inside a text block, except immediately before the closing delimiter. For example, the following text blocks are legal:

请注意,在文本块内部甚至在开头或结尾分隔符旁边自由使用"""是合法的,除非是紧接着结束分隔符。例如,下面的文本块是合法的:

String story = """
    "When I use a word," Humpty Dumpty said,
    in rather a scornful tone, "it means just what I
    choose it to mean - neither more nor less."
    "The question is," said Alice, "whether you
    can make words mean so many different things."
    "The question is," said Humpty Dumpty,
    "which is to be master - that's all."
    """;    // Note the newline before the closing delimiter

String code =
    """
    String empty = "";
    """;
String story = """
    "When I use a word," Humpty Dumpty said,
    in rather a scornful tone, "it means just what I
    choose it to mean - neither more nor less."
    "The question is," said Alice, "whether you
    can make words mean so many different things."
    "The question is," said Humpty Dumpty,
    "which is to be master - that's all."
    """;    // 注意,新的一行出现在结尾分隔符前

String code =
    """
    String empty = "";
    """;

However, a sequence of three " characters requires at least one " to be escaped, in order to avoid mimicking the closing delimiter. (A sequence of n " characters requires at least Math.floorDiv(n,3) of them to be escaped.) The use of " immediately before the closing delimiter also requires escaping. For example:

然而,连续3个"字符需要有至少一个"被转义,以避免被当做结尾分隔符。(由n个"字符组成的序列,要求至少Math.floorDiv(n,3)个被转义。)紧邻在结尾分隔符前的"同样需要转义。例如:

String code = 
    """
    String text = \"""
        A text block inside a text block
    \""";
    """;

String tutorial1 =
    """
    A common character
    in Java programs
    is \"""";

String tutorial2 =
    """
    The empty string literal
    is formed from " characters
    as follows: \"\"""";

System.out.println("""
        1 "
        2 ""
        3 ""\"
        4 ""\""
        5 ""\"""
        6 ""\"""\"
        7 ""\"""\""
        8 ""\"""\"""
        9 ""\"""\"""\"
    10 ""\"""\"""\""
    11 ""\"""\"""\"""
    12 ""\"""\"""\"""\"
""");
String code = 
    """
    String text = \"""
        A text block inside a text block
    \""";
    """;

String tutorial1 =
    """
    A common character
    in Java programs
    is \"""";

String tutorial2 =
    """
    The empty string literal
    is formed from " characters
    as follows: \"\"""";

System.out.println("""
        1 "
        2 ""
        3 ""\"
        4 ""\""
        5 ""\"""
        6 ""\"""\"
        7 ""\"""\""
        8 ""\"""\"""
        9 ""\"""\"""\"
    10 ""\"""\"""\""
    11 ""\"""\"""\"""
    12 ""\"""\"""\"""\"
""");

New escape sequences

To allow finer control of the processing of newlines and white space, we introduce two new escape sequences.

新的转义序列

为了更好地控制换行符和空格的处理,我们引入了两个新的转义序列。

First, the \<line-terminator> escape sequence explicitly suppresses the insertion of a newline character.

第一点,转义序列\<行终止符>会显式压制换行符的插入。

For example, it is common practice to split very long string literals into concatenations of smaller substrings, and then hard wrap the resulting string expression onto multiple lines:

例如,常见的做法是将很长的字符串字面量拆分成为较小的子字符串连接在一起,然后将结果字符串表达式表示为多行:

String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                 "elit, sed do eiusmod tempor incididunt ut labore " +
                 "et dolore magna aliqua.";
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                 "elit, sed do eiusmod tempor incididunt ut labore " +
                 "et dolore magna aliqua.";

With the \<line-terminator> escape sequence this could be expressed as:

\<行终止符>转义序列时,这可以表示为:

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing \
                elit, sed do eiusmod tempor incididunt ut labore \
                et dolore magna aliqua.\
                """;
String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing \
                elit, sed do eiusmod tempor incididunt ut labore \
                et dolore magna aliqua.\
                """;

For the simple reason that character literals and traditional string literals don't allow embedded newlines, the \<line-terminator> escape sequence is only applicable to text blocks.

由于字符字面量和传统的字符串字面量不允许嵌入换行符的简单原因,转义序列\<行终止符>仅适用于文本块。

Second, the new \s escape sequence simply translates to a single space (\u0020).

第二点,新的转义序列\s仅转换为一个空格(\u0020)。

Escape sequences aren't translated until after incidental space stripping, so \s can act as fence to prevent the stripping of trailing white space. Using \s at the end of each line in this example guarantees that each line is exactly six characters long:

直到附带空格被去除之后,转义序列才会被转换,因此\s可以充当栅栏以防止末尾的空白。在本例中,每行末尾使用\s可以确保每行正好是6个字符长度:

String colors = """
    red  \s
    green\s
    blue \s
    """;
String colors = """
    red  \s
    green\s
    blue \s
    """;

The \s escape sequence can be used in text blocks, traditional string literals, and character literals.

转义序列\s可以在文本块、传统字符串字面量和字符字面量中使用。

Concatenation of text blocks

Text blocks can be used anywhere a string literal can be used. For example, text blocks and string literals may be concatenated interchangeably:

拼接文本块

所有可以使用字符串字面量的地方都可以使用文本块。例如,文本块和字符串字面量可以互换使用:

String code = "public void print(Object o) {" +
              """
                  System.out.println(Objects.toString(o));
              }
              """;
String code = "public void print(Object o) {" +
              """
                  System.out.println(Objects.toString(o));
              }
              """;

However, concatenation involving a text block can become rather clunky. Take this text block as a starting point:

然而,文本块的拼接可能会显得很笨拙。例如下面的文本块:

String code = """
              public void print(Object o) {
                  System.out.println(Objects.toString(o));
              }
              """;
String code = """
              public void print(Object o) {
                  System.out.println(Objects.toString(o));
              }
              """;

Suppose it needs to be changed so that the type of o comes from a variable. Using concatenation, the text block that contains the trailing code will need to start on a new line. Unfortunately, the straightforward insertion of a newline in the program, as below, will cause a long span of white space between the type and the text beginning o :

假设需要修改,以便o的type是可变的。使用拼接,包含末尾代码的文本块将需要从新行开始。不幸的是,如下所示,在程序中直接插入换行符会导致type和以o开头的文本之间存在很大的空白:

String code = """
              public void print(""" + type + """
                                                 o) {
                  System.out.println(Objects.toString(o));
              }
              """;
String code = """
              public void print(""" + type + """
                                                 o) {
                  System.out.println(Objects.toString(o));
              }
              """;

The white space can be removed manually, but this hurts readability of the quoted code:

可以手动删除空格,但这会损害引用代码的可读性:

String code = """
              public void print(""" + type + """
               o) {
                  System.out.println(Objects.toString(o));
              }
              """;
String code = """
              public void print(""" + type + """
               o) {
                  System.out.println(Objects.toString(o));
              }
              """;

A cleaner alternative is to use String::replace or String::format, as follows:

更清晰的替代方案是用String::replaceString::format,如下所示:

String code = """
              public void print($type o) {
                  System.out.println(Objects.toString(o));
              }
              """.replace("$type", type);
String code = String.format("""
              public void print(%s o) {
                  System.out.println(Objects.toString(o));
              }
              """, type);
String code = """
              public void print($type o) {
                  System.out.println(Objects.toString(o));
              }
              """.replace("$type", type);
String code = String.format("""
              public void print(%s o) {
                  System.out.println(Objects.toString(o));
              }
              """, type);

Another alternative involves the introduction of a new instance method, String::formatted, which could be used as follows:

另一种选择是引入新的实例方法String::formatted,该方法可以按如下方式使用:

String source = """
                public void print(%s object) {
                    System.out.println(Objects.toString(object));
                }
                """.formatted(type);
String source = """
                public void print(%s object) {
                    System.out.println(Objects.toString(object));
                }
                """.formatted(type);

Additional Methods

The following methods will be added to support text blocks:

  • String::stripIndent(): used to strip away incidental white space from the text block content
  • String::translateEscapes(): used to translate escape sequences
  • String::formatted(Object... args): simplify value substitution in the text block

其他方法

下面这些方法会添加对文本块的支持:

  • String::stripIndent():用于去除文本块的附带空格。
  • String::translateEscapes():用于转换转义序列。
  • String::formatted(Object... args):简化文本块中的值替换。

Alternatives

Do nothing

Java has prospered for over 20 years with string literals that required newlines to be escaped. IDEs ease the maintenance burden by supporting automatic formatting and concatenation of strings that span several lines of source code. The String class has also evolved to include methods that simplify the processing and formatting of long strings, such as a method that presents a string as a stream of lines. However, strings are such a fundamental part of the Java language that the shortcomings of string literals are apparent to vast numbers of developers. Other JVM languages have also made advances in how long and complex strings are denoted. Unsurprisingly, then, multi-line string literals have consistently been one of the most requested features for Java. Introducing a multi-line construct of low to moderate complexity would have a high payoff.

备选方案

什么都不做

Java已经繁荣了20多年,其字符串字面量需要使用换行符进行转义。IDE通过支持对跨行源代码的字符串进行自动格式化和拼接来减轻维护负担。String类也经过改进,包括简化长字符串的处理和格式化的方法,例如将字符串显示为包括行的流的方法。但是,字符串时Java语言的基本组成部分,因此字符串字面量的缺点对于许多开发者来说都是显而易见的。其他JVM语言也在表示长度和复杂字符串方面取得了进步。因此毫不奇怪,多行字符串字面量一直是Java最受欢迎的功能之一。引入低或中复杂度的多行构造将获得很高的收益。

Allow a string literal to span multiple lines

Multi-line string literals could be introduced in Java simply by allowing line terminators in existing string literals. However, this would do nothing about the pain of escaping " characters. \" is the most frequently occurring escape sequence after \n, because of frequency of code snippets. The only way to avoid escaping " in a string literal would be to provide an alternate delimiter scheme for string literals. Delimiters were much discussed for JEP 326 (Raw String Literals), and the lessons learned were used to inform the design of text blocks, so it would be misguided to upset the stability of string literals.

允许字符串字面量跨越多行

只需要在现有的字符串字面量中允许行终止符,就可以在Java中引入多行字符串字面量。但是,这对于转义"字符的痛苦来说没有任何帮助。\"\n之后频率最高的转义序列。避免在字符串字面量中转义"的唯一方法是为字符串字面量提供替代的分隔符方案。对于JEP 326(原始字符串字面量),有很多关于分隔符的讨论,并且将所汲取的教训用于设计文本块,所以这回误导字符串字面量的稳定性。

Adopt another language's multi-string literal

According to Brian Goetz:

适配另一种语言的多字符串字面量

Brian Goetz所说:

Many people have suggested that Java should adopt multi-line string literals from Swift or Rust. However, the approach of “just do what language X does” is intrinsically irresponsible; nearly every feature of every language is conditioned by other features of that language. Instead, the game is to learn from how other languages do things, assess the tradeoffs they’ve chosen (explicitly and implicitly), and ask what can be applied to the constraints of the language we have and user expectations within the community we have.

很多人建议Java应该采用Swift或Rust的多行字符串字面量。但是,“X语言怎么做就怎么做”的方法本质上是不负责任的。每种语言的几乎每个特性都以该语言的其他特性为条件。相反,关键应该在于学习其他语言的工作方式,评估(显式或隐式)它们选择的折中方案,并询问可以将哪些方法应用到我们所拥有的语言限制中,以及我们所拥有的社区用户的期望。

For JEP 326 (Raw String Literals), we surveyed many modern programming languages and their support for multi-line string literals. The results of these surveys influenced the current proposal, such as the choice of three " characters for delimiters (although there were other reasons for this choice too) and the recognition of the need for automatic indentation management.

对于JEP 326(原始字符串字面量),我们调查了许多现代编程语言对其多行字符串字面量的支持。这些调查的结果影响了当前的提议,例如为分隔符选择三个"字符(尽管也有其他原因选择该字符),并且认识到需要自动管理缩进。

Do not remove incidental white space

If Java introduced multi-line string literals without support for automatically removing incidental white space, then many developers would write a method to remove it themselves, or lobby for the String class to include a removal method. However, that implies a potentially expensive computation every time the string is instantiated at run time, which would reduce the benefit of string interning. Having the Java language mandate the removal of incidental white space, both in leading and trailing positions, seems the most appropriate solution. Developers can opt out of leading white space removal by careful placement of the closing delimiter.

不删除附带空格

如果Java引入了多行字符串字面量,但不支持自动删除附带空格,那么许多开发者会编写一种自己删除它的方法,或者说服String类包含该删除方法。但是,这意味着每次在运行时实例化字符串时,都可能需要进行昂贵的运算,这会降低字符串插入的收益。让Java语言强制删除开头和结尾位置的附带空格似乎是最合适的解决方案。开发者可以通过仔细放置结尾分隔符来选择不删除主要空格。

Raw string literals

For JEP 326 (Raw String Literals), we took a different approach to the problem of denoting strings without escaping newlines and quotes, focusing on the raw-ness of strings. We now believe that this focus was wrong, because while raw string literals could easily span multiple lines of source code, the cost of supporting unescaped delimiters in their content was extreme. This limited the effectiveness of the feature in the multi-line use case, which is a critical one because of the frequency of embedding multi-line (but not truly raw) code snippets in Java programs. A good outcome of the pivot from raw-ness to multi-line-ness was a renewed focus on having a consistent escape language between string literals, text blocks, and related features that may be added in future.

原始字符串字面量

对于JEP 326(原始字符串字面量),我们采用了另一种方法来解决在表示字符串时不转义换行和引号的问题,重点是字符串的原始性。现在,我们认为这种关注是错误的,因为尽管原始字符串字面量可以轻松跨越源代码的多行,但在内容中支持未转义的分隔符的代价却特别高。这限制了该功能再多行用例中的有效性,这是至关重要的功能,因为在Java程序中嵌入了多行(但不是真正的原始)代码片段的频率很高。从原始性到多行性的转变的一个很好的结果是重新关注字符串字面量、文本块和将来可能添加的相关特性之间使用一致的转义语言。

Testing

Tests that use string literals for the creation, interning, and manipulation of instances of String should be duplicated to use text blocks too. Negative tests should be added for corner cases involving line terminators and EOF.

测试

使用字符串字面量进行String实例的创建、intern和操作的测试,同样应该复用于文本块。对于涉及到行终结符和EOF的用例,应该添加负面测试。

Tests should be added to ensure that text blocks can embed Java-in-Java, Markdown-in-Java, SQL-in-Java, and at least one JVM-language-in-Java.

应该添加测试以确保文本块可以嵌入Java中的Java、Java中的Markdown、Java中的SQL和至少一种Java中的JVM语言。