官方原文(英文)地址:
https://openjdk.java.net/jeps/378
个人原创翻译,转载请注明出处。
Add text blocks to the Java language. A text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in a predictable way, and gives the developer control over the format when desired.
将文本块添加到Java语言。文本块是多行的字符串字面量,避免了大多数转义序列的需要,以一种可预测的方式自动设置字符串的格式,并在需要时允许开发者控制格式。
Text blocks were proposed by JEP 355 in early 2019 as a follow-on to explorations begun in JEP 326 (Raw String Literals), which was initially targeted to JDK 12 but eventually withdrawn and did not appear in that release. JEP 355 was targeted to JDK 13 in June 2019 as a preview feature. Feedback on JDK 13 suggested that text blocks should be previewed again in JDK 14, with the addition of two new escape sequences. Consequently, JEP 368 was targeted to JDK 14 in November 2019 as a preview feature. Feedback on JDK 14 suggested that text blocks were ready to become final and permanent in JDK 15 with no further changes.
文本块以JEP 355在2019年初提出,作为在JEP 326(原始字符串字面量)中试探的后续动作,该试探最初针对JDK 12,但最终被撤回且并未出现在该发行版中。JEP 355于2019年6月在JDK 13中作为预览特性。对JDK 13的反馈建议应再次预览该特性,并增加两个新的转义序列,因而后续的JEP 368用于2019年11月的JDK 14。对JDK 14的反馈表明,文本块特性已经准备好作为最终永久特性。
java.lang.String
, for the strings expressed by any new construct.+
, that take String
operands.String::formatted
aids in situations where interpolation might be desired.java.lang.String
的新引用类型,这不是目标。+
的、用于String
的新运算符,这不是目标。String::formatted
在可能需要插值的情况下提供了帮助。In Java, embedding a snippet of HTML, XML, SQL, or JSON in a string literal "..."
usually requires significant editing with escapes and concatenation before the code containing the snippet will compile. The snippet is often difficult to read and arduous to maintain.
在Java中,在字符串字面量"..."
中嵌入HTML、XML、SQL或JSON片段通常需要先进行转义和串联的大量编辑,然后才能编译包括该片段的代码。这样的代码通常难以阅读且难以维护。
More generally, the need to denote short, medium, and long blocks of text in a Java program is near universal, whether the text is code from other programming languages, structured text representing golden files, or messages in natural languages. On the one hand, the Java language recognizes this need by allowing strings of unbounded size and content; on the other hand, it embodies a design default that strings should be small enough to denote on a single line of a source file (surrounded by " characters), and simple enough to escape easily. This design default is at odds with the large number of Java programs where strings are too long to fit comfortably on a single line.
通常,无论文本是来自于其它编程语言的代码、代表文件的结构化文本还是自然语言的消息,在Java程序中表示短、中、长文本块的需求都十分普遍。一方面,Java语言通过允许非绑定大小和内容的字符串来认识到这一需求。另一方面,它体现了一个默认设计,即字符串应足够小以表示为源文件的一行中(用"字符包围),并且应该足够简单以易于转义。该默认设计与大数字不符。字符串太长而无法很好地放在Java程序的一行中。
Accordingly, it would improve both the readability and the writability of a broad class of Java programs to have a linguistic mechanism for denoting strings more literally than a string literal -- across multiple lines and without the visual clutter of escapes. In essence, a two-dimensional block of text, rather than a one-dimensional sequence of characters.
因此,如果有一种语言层面的机制,可以更直观地表示字符串,且可以跨多行显式,还不会出现转义的视觉混乱,那么这将大大提高Java程序的可读性和可写性。这在本质上是二维文本块,而不是一维字符序列。
Still, it is impossible to predict the role of every string in Java programs. Just because a string spans multiple lines of source code does not mean that newline characters are desirable in the string. One part of a program may be more readable when strings are laid out over multiple lines, but the embedded newline characters may change the behavior of another part of the program. Accordingly, it would be helpful if the developer had precise control over where newlines appear, and, as a related matter, how much white space appears to the left and right of the "block" of text.
尽管如此,仍然无法预测Java程序中每个字符串的角色。仅仅因为一个字符串跨越源代码的多行,并不意味着该字符串中需要换行符。当字符串放在多行中时,程序的一部分可能更具可读性,但是嵌入的换行符可能会更改程序另一部分的行为。因此,这有助于开发者精确控制换行出现的位置,相关的问题,以及可以在文本块的左侧和右侧显示多少空格。
Using "one-dimensional" string literals
使用“一维”字符串字面量:
String html = "<html>\n" +
" <body>\n" +
" <p>Hello, world</p>\n" +
" </body>\n" +
"</html>\n";
String html = "<html>\n" +
" <body>\n" +
" <p>Hello, world</p>\n" +
" </body>\n" +
"</html>\n";
Using a "two-dimensional" block of text
使用“二维”字符串字面量:
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
Using "one-dimensional" string literals
使用“一维”字符串字面量:
String query = "SELECT \"EMP_ID\", \"LAST_NAME\" FROM \"EMPLOYEE_TB\"\n" +
"WHERE \"CITY\" = 'INDIANAPOLIS'\n" +
"ORDER BY \"EMP_ID\", \"LAST_NAME\";\n";
String query = "SELECT \"EMP_ID\", \"LAST_NAME\" FROM \"EMPLOYEE_TB\"\n" +
"WHERE \"CITY\" = 'INDIANAPOLIS'\n" +
"ORDER BY \"EMP_ID\", \"LAST_NAME\";\n";
Using a "two-dimensional" block of text
使用“二维”字符串字面量:
String query = """
SELECT "EMP_ID", "LAST_NAME" FROM "EMPLOYEE_TB"
WHERE "CITY" = 'INDIANAPOLIS'
ORDER BY "EMP_ID", "LAST_NAME";
""";
String query = """
SELECT "EMP_ID", "LAST_NAME" FROM "EMPLOYEE_TB"
WHERE "CITY" = 'INDIANAPOLIS'
ORDER BY "EMP_ID", "LAST_NAME";
""";
Using "one-dimensional" string literals
使用“一维”字符串字面量:
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
" print('\"Hello, world\"');\n" +
"}\n" +
"\n" +
"hello();\n");
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
" print('\"Hello, world\"');\n" +
"}\n" +
"\n" +
"hello();\n");
Using a "two-dimensional" block of text
使用“二维”字符串字面量:
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
function hello() {
print('"Hello, world"');
}
hello();
""");
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
function hello() {
print('"Hello, world"');
}
hello();
""");
This section is identical to the same section in this JEP's predecessor, JEP 355, except for the addition of the subsection on new escape sequences.
A text block is a new kind of literal in the Java language. It may be used to denote a string anywhere that a string literal could appear, but offers greater expressiveness and less accidental complexity.
文本块是Java语言中一种新型字面量。在字符串字面量可以出现的任何地方,它都可以用于表示字符串,但是可以提供更高的表现力和更少的意外复杂性。
A text block consists of zero or more content characters, enclosed by opening and closing delimiters.
文本块由零到多个内容字符组成,并由开头和结尾分隔符括起来。
The opening delimiter is a sequence of three double quote characters ("""
) followed by zero or more white spaces followed by a line terminator. The content begins at the first character after the line terminator of the opening delimiter.
开头分隔符是由三个双引号字符("""
)组成的序列,后面跟零到多个空格,后跟一个行终止符。内容从开头分隔符的行终止符之后的第一个字符开始。
The closing delimiter is a sequence of three double quote characters. The content ends at the last character before the first double quote of the closing delimiter.
结尾分隔符是三个双引号字符的序列。内容在结尾分隔符的第一个双引号之前的最后一个字符处结束。
The content may include double quote characters directly, unlike the characters in a string literal. The use of \"
in a text block is permitted, but not necessary or recommended. Fat delimiters ("""
) were chosen so that "
characters could appear unescaped, and also to visually distinguish a text block from a string literal.
与字符串字面量中的字符不同,内容可以直接包含双引号字符。允许在文本块中使用\"
,但不是必须或建议使用。使用胖分隔符("""
)可以允许"
字符可以不转义就出现,并在视觉上区分文本块和字符串字面量。
The content may include line terminators directly, unlike the characters in a string literal. The use of \n
in a text block is permitted, but not necessary or recommended. For example, the text block:
与字符串字面量中的字符不同,内容可以直接包含行终止符。允许在文本块中使用\n
,但不是必需或建议使用。例如,文本块:
"""
line 1
line 2
line 3
"""
"""
line 1
line 2
line 3
"""
is equivalent to the string literal:
等效于字符串字面量:
"line 1\nline 2\nline 3\n"
"line 1\nline 2\nline 3\n"
or a concatenation of string literals:
或者字符串字面量的拼接:
"line 1\n" +
"line 2\n" +
"line 3\n"
"line 1\n" +
"line 2\n" +
"line 3\n"
If a line terminator is not required at the end of the string, then the closing delimiter can be placed on the last line of content. For example, the text block:
如果在字符串的末尾不需要行终止符,那么可以将结尾分隔符放在内容的最后一行。例如,文本块:
"""
line 1
line 2
line 3"""
"""
line 1
line 2
line 3"""
is equivalent to the string literal:
等效于字符串字面量:
"line 1\nline 2\nline 3"
"line 1\nline 2\nline 3"
A text block can denote the empty string, although this is not recommended because it needs two lines of source code:
文本块可以表示空字符串,尽管不建议,因为这需要两行代码:
String empty = """
""";
String empty = """
""";
Here are some examples of ill-formed text blocks:
下面是一些格式错误的文本块示例:
String a = """"""; // no line terminator after opening delimiter
String b = """ """; // no line terminator after opening delimiter
String c = """
"; // no closing delimiter (text block continues to EOF)
String d = """
abc \ def
"""; // unescaped backslash (see below for escape processing)
String a = """"""; // 开头分隔符后面没有行终止符
String b = """ """; // 开头分隔符后面没有行终止符
String c = """
"; // 没有结尾分隔符(代码块持续到EOF)
String d = """
abc \ def
"""; // 未转义的反斜杠(参考下面的转义处理)
A text block is a constant expression of type String
, just like a string literal. However, unlike a string literal, the content of a text block is processed by the Java compiler in three distinct steps:
\u000A
). The purpose of this translation is to follow the principle of least surprise when moving Java source code across platforms.\n
without them being modified or deleted by earlier steps.文本块是String
类型的常量表达式,就像字符串字面量。但是与字符串字面量不同,Java编译期通过三个不同的步骤处理文本块的内容:
\u000A
)。这是为了在跨平台移动Java源代码时遵循最小惊喜原则。\n
的转义序列,而不会被前置步骤修改或删除。The processed content is recorded in the class
file as a CONSTANT_String_info
entry in the constant pool, just like the characters of a string literal. The class
file does not record whether a CONSTANT_String_info
entry was derived from a text block or a string literal.
处理的内容会作为常量池中的CONSTANT_String_info
条目记录在class
文件中,就像字符串字面量的字符一样。该class
文件不会记录CONSTANT_String_info
条目是从文本块还是字符串字面量派生的。
At run time, a text block is evaluated to an instance of String
, just like a string literal. Instances of String
that are derived from text blocks are indistinguishable from instances derived from string literals. Two text blocks with the same processed content will refer to the same instance of String
due to interning, just like for string literals.
The following sections discuss compile-time processing in more detail.
下面各小节将更详细地讨论编译期处理。
Line terminators in the content are normalized from CR (\u000D
) and CRLF (\u000D\u000A
) to LF (\u000A
) by the Java compiler. This ensures that the string derived from the content is equivalent across platforms, even if the source code has been translated to a platform encoding (see javac -encoding
).
Java编译期将内容中的行终止符从CR(\u000D
)和CRLF(\u000D\u000A
)标准化为LF(\u000A
)。这样可以确保从内容生成的字符串在各个平台上都是等效的,即使源代码已转换为平台编码(参见javac -encoding
)。
For example, if Java source code that was created on a Unix platform (where the line terminator is LF) is edited on a Windows platform (where the line terminator is CRLF), then without normalization, the content would become one character longer for each line. Any algorithm that relied on LF being the line terminator might fail, and any test that needed to verify string equality with String::equals
would fail.
例如,在Windows平台(行终止符为CRLF)上编辑了在Unix平台(行终止符为LF)上创建的Java源代码,如果不进行标准化,那么内容会在每一行上都多一个字符。任何依赖LF作为行终止符的算法都可能失败,并且任何需要使用String::equals
验证字符串相等的测试都将失败。
The escape sequences \n
(LF), \f
(FF), and \r
(CR) are not interpreted during normalization; escape processing happens later.
转义序列\n
(LF)、\f
(FF)和\r
(CR)在标准化过程中不会被解释;转义处理会稍后进行。
The text blocks shown above were easier to read than their concatenated string literal counterparts, but the obvious interpretation for the content of a text block would include the spaces added to indent the embedded string so that it lines up neatly with the opening delimiter. Here is the HTML example using dots to visualize the spaces that the developer added for indentation:
上面展示的文本块比分成几块连接的字符串字面量更易于阅读,但显然对文本块内容的解释会包括为了缩进嵌入的字符串而添加的空格,以便与开头分隔符对齐。这是使用点的HTML示例,以可视化开发人员为缩进添加的空格:
String html = """
..............<html>
.............. <body>
.............. <p>Hello, world</p>
.............. </body>
..............</html>
..............""";
String html = """
..............<html>
.............. <body>
.............. <p>Hello, world</p>
.............. </body>
..............</html>
..............""";
Since the opening delimiter is generally positioned to appear on the same line as the statement or expression which consumes the text block, there is no real significance to the fact that 14 visualized spaces start each line. Including those spaces in the content would mean the text block denotes a string different from the one denoted by the concatenated string literals. This would hurt migration, and be a recurring source of surprise: it is overwhelmingly likely that the developer does not want those spaces in the string. Also, the closing delimiter is generally positioned to align with the content, which further suggests that the 14 visualized spaces are insignificant.
由于开头分隔符通常位于和文本块语句或表达式的同一行上,所以在每行开头有14个可视化空格没有任何实际意义。在内容中包含这些空格将意味着,文本块表示的字符串和串联字符串字面量表示的字符串不同。这会阻碍迁移,并且经常会出现意外:开发者很可能不希望在字符串中保留这些空格。同样,结束分隔符通常会与内容对齐,这进一步说明了14个可视化空格无关紧要。
Spaces may also appear at the end of each line, especially when a text block is populated by copy-pasting snippets from other files (which may themselves have been formed by copy-pasting from yet more files). Here is the HTML example reimagined with some trailing white space, again using dots to visualize spaces:
空格也可能出现在每一行的末尾,尤其是当文本块是从其他文件中复制粘贴的片段时(这些片段本身可能是从更多文件中复制粘贴的)。这是用一些末尾的空格重新构想的HTML示例,同样使用点来可视化空格:
String html = """
..............<html>...
.............. <body>
.............. <p>Hello, world</p>....
.............. </body>.
..............</html>...
..............""";
String html = """
..............<html>...
.............. <body>
.............. <p>Hello, world</p>....
.............. </body>.
..............</html>...
..............""";
Trailing white space is most often unintentional, idiosyncratic, and insignificant. It is overwhelmingly likely that the developer does not care about it. Trailing white space characters are similar to line terminators, in that both are invisible artifacts of the source code editing environment. With no visual guide to the presence of trailing white space characters, including them in the content would be a recurring source of surprise, as it would affect the length, hash code, etc, of the string.
末尾空格通常是无目的、无意义的。开发者很可能不关心它。末尾空格字符与行终止符类似,因为两者都是源代码编辑环境中不可见的部分。如果没有视觉上对末尾空格字符的提示,那么在内容中包含末尾空格字符将是令人惊讶的重复诱因,因为这会影响字符串的长度、哈希值等。
Accordingly, an appropriate interpretation for the content of a text block is to differentiate incidental white space at the start and end of each line, from essential white space. The Java compiler processes the content by removing incidental white space to yield what the developer intended. String::indent
can then be used to further manipulate indentation if desired. Using |
to visualize margins:
因此,对文本块内容的适当解释是,将每行开头和结尾处附带的空格与必要的空格区分开。Java编译器通过删除附带的空格来处理内容,以生成开发者想要的内容。如果需要,可以使用String::indent
进一步处理缩进。使用|
可视化边距:
|<html>|
| <body>|
| <p>Hello, world</p>|
| </body>|
|</html>|
|<html>|
| <body>|
| <p>Hello, world</p>|
| </body>|
|</html>|
The re-indentation algorithm takes the content of a text block whose line terminators have been normalized to LF. It removes the same amount of white space from each line of content until at least one of the lines has a non-white space character in the leftmost position. The position of the opening """
characters has no effect on the algorithm, but the position of the closing """
characters does have an effect if placed on its own line. The algorithm is as follows:
重新缩进算法所处理的文本块中,行终止符已经标准化为LF。它将从内容的每一行中删除相同数量的空格,直到其中至少一行在最左侧位置具有非空格字符。开头的"""
字符的位置对算法没有影响,但是结尾"""
字符的位置如果放在自己的行上就会有影响。算法如下:
The escape sequences \b
(backspace), \t
(tab) and \s
(space) are not interpreted by the algorithm; escape processing happens later. Similarly, the \<line-terminator>
escape sequence does not prevent the splitting of lines on the line-terminator since the sequence is treated as two separate characters until escape processing.
转义序列\b
(退格)、\t
(Tab)和\s
(空格)不会被算法解释;转义会稍后发生。类似地,\<行终止符>
转义序列不会阻止行终止符的行拆分,因为在转义处理之前,该序列被视为两个单独的字符。
The re-indentation algorithm will be normative in The Java Language Specification. Developers will have access to it via String::stripIndent
, a new instance method.
重新缩进算法在Java语言规范中是规范性的。开发者可以通过新的实例方法String::stripIndent
使用它。
Normally, one would format a text block in two ways: first, position the left edge of the content to appear under the first "
of the opening delimiter, and second, place the closing delimiter on its own line to appear exactly under the opening delimiter. The resulting string will have no white space at the start of any line, and will not include the trailing blank line of the closing delimiter.
通常可以通过两种方式设置文本块的格式:第一,将左侧边缘放在开头分隔符的第一个"
下方;第二,将结尾分隔符放在独立一行,且正好位于开头分隔符的下方。结果字符串在所有行的开头都没有空格,且不包含结尾分隔符的末尾空白行。
However, because the trailing blank line is considered a determining line, moving it to the left has the effect of reducing the common white space prefix, and therefore reducing the the amount of white space that is stripped from the start of every line. In the extreme case, where the closing delimiter is moved all the way to the left, that reduces the common white space prefix to zero, effectively opting out of white space stripping.
但是,由于末尾的空白行被视为确定行,因此将其向左移动可以减少公共空白前缀,从而减少每行开头去除的空格数量。在极端情况下,结束分隔符一直向左移动,可以将公共空格前缀减少为零,从而有效地实现了空白剥离。
For example, with the closing delimiter moved all the way to the left, there is no incidental white space to visualize with dots:
例如,当结束分隔符一直向左移动时,这里没有附带的空格可以用点表示:
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
Including the trailing blank line with the closing delimiter, the common white space prefix is zero, so zero white space is removed from the start of each line. The algorithm thus produces: (using |
to visualize the left margin)
包括末尾的空白行和结尾分隔符在内,公共空白前缀为零,因此从每行的开头删除零空白。该算法因此生成:(使用|
表示左侧边距)
| <html>
| <body>
| <p>Hello, world</p>
| </body>
| </html>
| <html>
| <body>
| <p>Hello, world</p>
| </body>
| </html>
Alternatively, suppose the closing delimiter is not moved all the way to the left, but rather under the t
of html
so it is eight spaces deeper than the variable declaration:
或者,假设结尾分隔符没有一直移到最左侧,而是移动到html
的t
下面,那么它比变量声明多8个空格:
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
The spaces visualized with dots are considered to be incidental:
用点表示的空格被认为是附带的:
String html = """
........ <html>
........ <body>
........ <p>Hello, world</p>
........ </body>
........ </html>
........""";
String html = """
........ <html>
........ <body>
........ <p>Hello, world</p>
........ </body>
........ </html>
........""";
Including the trailing blank line with the closing delimiter, the common white space prefix is eight, so eight white spaces are removed from the start of each line. The algorithm thus preserves the essential indentation of the content relative to the closing delimiter:
包括末尾的空白行和结尾分隔符在内,公共空白前缀为8,因此从每行的开头删除了8个空格。因此,该算法保留了内容相对于结尾分隔符的本来缩进:
| <html>
| <body>
| <p>Hello, world</p>
| </body>
| </html>
| <html>
| <body>
| <p>Hello, world</p>
| </body>
| </html>
Finally, suppose the closing delimiter is moved slightly to the right of the content:
最后,假设结尾分隔符向右稍微移动一点:
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
The spaces visualized with dots are considered to be incidental:
用点表示的空格被认为是附带的:
String html = """
..............<html>
.............. <body>
.............. <p>Hello, world</p>
.............. </body>
..............</html>
.............. """;
String html = """
..............<html>
.............. <body>
.............. <p>Hello, world</p>
.............. </body>
..............</html>
.............. """;
The common white space prefix is 14, so 14 white spaces are removed from the start of each line. The trailing blank line is stripped to leave an empty line, which being the last line is then discarded. In other words, moving the closing delimiter to the right of the content has no effect, and the algorithm again preserves the essential indentation of the content:
公共空白前缀为14,所以从每行的开头删除了14个空格。删除末尾的空白行以留下空行,然后将其作为最后一行丢弃。也就是说,将结尾分隔符移到内容的右侧没有任何作用,并且算法再次保留了内容的基本缩进:
|<html>
| <body>
| <p>Hello, world</p>
| </body>
|</html>
|<html>
| <body>
| <p>Hello, world</p>
| </body>
|</html>
After the content is re-indented, any escape sequences in the content are interpreted. Text blocks support all of the escape sequences supported in string literals, including \n
, \t
, \'
, \"
, and \\
. See section 3.10.6 of the The Java Language Specification for the full list. Developers will have access to escape processing via String::translateEscapes
, a new instance method.
重新缩进内容后,将会解释内容中所有的转义序列。文本块支持字符串字面量中支持的所有转义序列,包括\n
、\t
、\'
、\"
和\\
。有关完整列表,请参考Java语言规范的第3.10.6节。开发者可以用新的实例方法String::translateEscapes
来进行转义处理。
Interpreting escapes as the final step allows developers to use \n
, \f
, and \r
for vertical formatting of a string without it affecting the translation of line terminators in step 1, and to use \b
and \t
for horizontal formatting of a string without it affecting the removal of incidental white space in step 2. For example, consider this text block that contains the \r
escape sequence (CR):
解释转义是最后一步,允许开发者使用\n
、\f
和\r
进行字符串的垂直格式化,而又不影响第1步中行终止符的转换,使用\b
和\t
进行字符串的水平格式化,而不会影响第2步中附带空格的去除。例如,考虑包含转义序列\r
(CR)的文本块:
String html = """
<html>\r
<body>\r
<p>Hello, world</p>\r
</body>\r
</html>\r
""";
String html = """
<html>\r
<body>\r
<p>Hello, world</p>\r
</body>\r
</html>\r
""";
The CR escapes are not processed until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A
) and CR (\u000D
), the result is:
直到将行终止符标准化为LF后,转义字符CR才会被处理。使用Unicode转义可视化LF(\u000A
)和CR(\u000D
),结果是:
|<html>\u000D\u000A
| <body>\u000D\u000A
| <p>Hello, world</p>\u000D\u000A
| </body>\u000D\u000A
|</html>\u000D\u000A
|<html>\u000D\u000A
| <body>\u000D\u000A
| <p>Hello, world</p>\u000D\u000A
| </body>\u000D\u000A
|</html>\u000D\u000A
Note that it is legal to use "
and ""
freely inside a text block, except immediately before the closing delimiter. For example, the following text blocks are legal:
请注意,在文本块内部甚至在开头或结尾分隔符旁边自由使用"
和""
是合法的,除非是紧接着结束分隔符。例如,下面的文本块是合法的:
String story = """
"When I use a word," Humpty Dumpty said,
in rather a scornful tone, "it means just what I
choose it to mean - neither more nor less."
"The question is," said Alice, "whether you
can make words mean so many different things."
"The question is," said Humpty Dumpty,
"which is to be master - that's all."
"""; // Note the newline before the closing delimiter
String code =
"""
String empty = "";
""";
String story = """
"When I use a word," Humpty Dumpty said,
in rather a scornful tone, "it means just what I
choose it to mean - neither more nor less."
"The question is," said Alice, "whether you
can make words mean so many different things."
"The question is," said Humpty Dumpty,
"which is to be master - that's all."
"""; // 注意,新的一行出现在结尾分隔符前
String code =
"""
String empty = "";
""";
However, a sequence of three "
characters requires at least one "
to be escaped, in order to avoid mimicking the closing delimiter. (A sequence of n "
characters requires at least Math.floorDiv(n,3)
of them to be escaped.) The use of "
immediately before the closing delimiter also requires escaping. For example:
然而,连续3个"
字符需要有至少一个"
被转义,以避免被当做结尾分隔符。(由n个"
字符组成的序列,要求至少Math.floorDiv(n,3)
个被转义。)紧邻在结尾分隔符前的"
同样需要转义。例如:
String code =
"""
String text = \"""
A text block inside a text block
\""";
""";
String tutorial1 =
"""
A common character
in Java programs
is \"""";
String tutorial2 =
"""
The empty string literal
is formed from " characters
as follows: \"\"""";
System.out.println("""
1 "
2 ""
3 ""\"
4 ""\""
5 ""\"""
6 ""\"""\"
7 ""\"""\""
8 ""\"""\"""
9 ""\"""\"""\"
10 ""\"""\"""\""
11 ""\"""\"""\"""
12 ""\"""\"""\"""\"
""");
String code =
"""
String text = \"""
A text block inside a text block
\""";
""";
String tutorial1 =
"""
A common character
in Java programs
is \"""";
String tutorial2 =
"""
The empty string literal
is formed from " characters
as follows: \"\"""";
System.out.println("""
1 "
2 ""
3 ""\"
4 ""\""
5 ""\"""
6 ""\"""\"
7 ""\"""\""
8 ""\"""\"""
9 ""\"""\"""\"
10 ""\"""\"""\""
11 ""\"""\"""\"""
12 ""\"""\"""\"""\"
""");
To allow finer control of the processing of newlines and white space, we introduce two new escape sequences.
为了更好地控制换行符和空格的处理,我们引入了两个新的转义序列。
First, the \<line-terminator>
escape sequence explicitly suppresses the insertion of a newline character.
第一点,转义序列\<行终止符>
会显式压制换行符的插入。
For example, it is common practice to split very long string literals into concatenations of smaller substrings, and then hard wrap the resulting string expression onto multiple lines:
例如,常见的做法是将很长的字符串字面量拆分成为较小的子字符串连接在一起,然后将结果字符串表达式表示为多行:
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
"elit, sed do eiusmod tempor incididunt ut labore " +
"et dolore magna aliqua.";
String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
"elit, sed do eiusmod tempor incididunt ut labore " +
"et dolore magna aliqua.";
With the \<line-terminator>
escape sequence this could be expressed as:
用\<行终止符>
转义序列时,这可以表示为:
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing \
elit, sed do eiusmod tempor incididunt ut labore \
et dolore magna aliqua.\
""";
String text = """
Lorem ipsum dolor sit amet, consectetur adipiscing \
elit, sed do eiusmod tempor incididunt ut labore \
et dolore magna aliqua.\
""";
For the simple reason that character literals and traditional string literals don't allow embedded newlines, the \<line-terminator>
escape sequence is only applicable to text blocks.
由于字符字面量和传统的字符串字面量不允许嵌入换行符的简单原因,转义序列\<行终止符>
仅适用于文本块。
Second, the new \s
escape sequence simply translates to a single space (\u0020
).
第二点,新的转义序列\s
仅转换为一个空格(\u0020
)。
Escape sequences aren't translated until after incidental space stripping, so \s
can act as fence to prevent the stripping of trailing white space. Using \s
at the end of each line in this example guarantees that each line is exactly six characters long:
直到附带空格被去除之后,转义序列才会被转换,因此\s
可以充当栅栏以防止末尾的空白。在本例中,每行末尾使用\s
可以确保每行正好是6个字符长度:
String colors = """
red \s
green\s
blue \s
""";
String colors = """
red \s
green\s
blue \s
""";
The \s
escape sequence can be used in text blocks, traditional string literals, and character literals.
转义序列\s
可以在文本块、传统字符串字面量和字符字面量中使用。
Text blocks can be used anywhere a string literal can be used. For example, text blocks and string literals may be concatenated interchangeably:
所有可以使用字符串字面量的地方都可以使用文本块。例如,文本块和字符串字面量可以互换使用:
String code = "public void print(Object o) {" +
"""
System.out.println(Objects.toString(o));
}
""";
String code = "public void print(Object o) {" +
"""
System.out.println(Objects.toString(o));
}
""";
However, concatenation involving a text block can become rather clunky. Take this text block as a starting point:
然而,文本块的拼接可能会显得很笨拙。例如下面的文本块:
String code = """
public void print(Object o) {
System.out.println(Objects.toString(o));
}
""";
String code = """
public void print(Object o) {
System.out.println(Objects.toString(o));
}
""";
Suppose it needs to be changed so that the type of o
comes from a variable. Using concatenation, the text block that contains the trailing code will need to start on a new line. Unfortunately, the straightforward insertion of a newline in the program, as below, will cause a long span of white space between the type and the text beginning o
:
假设需要修改,以便o
的type是可变的。使用拼接,包含末尾代码的文本块将需要从新行开始。不幸的是,如下所示,在程序中直接插入换行符会导致type和以o
开头的文本之间存在很大的空白:
String code = """
public void print(""" + type + """
o) {
System.out.println(Objects.toString(o));
}
""";
String code = """
public void print(""" + type + """
o) {
System.out.println(Objects.toString(o));
}
""";
The white space can be removed manually, but this hurts readability of the quoted code:
可以手动删除空格,但这会损害引用代码的可读性:
String code = """
public void print(""" + type + """
o) {
System.out.println(Objects.toString(o));
}
""";
String code = """
public void print(""" + type + """
o) {
System.out.println(Objects.toString(o));
}
""";
A cleaner alternative is to use String::replace
or String::format
, as follows:
更清晰的替代方案是用String::replace
或String::format
,如下所示:
String code = """
public void print($type o) {
System.out.println(Objects.toString(o));
}
""".replace("$type", type);
String code = String.format("""
public void print(%s o) {
System.out.println(Objects.toString(o));
}
""", type);
String code = """
public void print($type o) {
System.out.println(Objects.toString(o));
}
""".replace("$type", type);
String code = String.format("""
public void print(%s o) {
System.out.println(Objects.toString(o));
}
""", type);
Another alternative involves the introduction of a new instance method, String::formatted
, which could be used as follows:
另一种选择是引入新的实例方法String::formatted
,该方法可以按如下方式使用:
String source = """
public void print(%s object) {
System.out.println(Objects.toString(object));
}
""".formatted(type);
String source = """
public void print(%s object) {
System.out.println(Objects.toString(object));
}
""".formatted(type);
The following methods will be added to support text blocks:
String::stripIndent()
: used to strip away incidental white space from the text block contentString::translateEscapes()
: used to translate escape sequencesString::formatted(Object... args)
: simplify value substitution in the text block下面这些方法会添加对文本块的支持:
String::stripIndent()
:用于去除文本块的附带空格。String::translateEscapes()
:用于转换转义序列。String::formatted(Object... args)
:简化文本块中的值替换。Java has prospered for over 20 years with string literals that required newlines to be escaped. IDEs ease the maintenance burden by supporting automatic formatting and concatenation of strings that span several lines of source code. The String
class has also evolved to include methods that simplify the processing and formatting of long strings, such as a method that presents a string as a stream of lines. However, strings are such a fundamental part of the Java language that the shortcomings of string literals are apparent to vast numbers of developers. Other JVM languages have also made advances in how long and complex strings are denoted. Unsurprisingly, then, multi-line string literals have consistently been one of the most requested features for Java. Introducing a multi-line construct of low to moderate complexity would have a high payoff.
Multi-line string literals could be introduced in Java simply by allowing line terminators in existing string literals. However, this would do nothing about the pain of escaping "
characters. \"
is the most frequently occurring escape sequence after \n
, because of frequency of code snippets. The only way to avoid escaping "
in a string literal would be to provide an alternate delimiter scheme for string literals. Delimiters were much discussed for JEP 326 (Raw String Literals), and the lessons learned were used to inform the design of text blocks, so it would be misguided to upset the stability of string literals.
只需要在现有的字符串字面量中允许行终止符,就可以在Java中引入多行字符串字面量。但是,这对于转义"
字符的痛苦来说没有任何帮助。\"
是\n
之后频率最高的转义序列。避免在字符串字面量中转义"
的唯一方法是为字符串字面量提供替代的分隔符方案。对于JEP 326(原始字符串字面量),有很多关于分隔符的讨论,并且将所汲取的教训用于设计文本块,所以这回误导字符串字面量的稳定性。
According to Brian Goetz:
据Brian Goetz所说:
Many people have suggested that Java should adopt multi-line string literals from Swift or Rust. However, the approach of “just do what language X does” is intrinsically irresponsible; nearly every feature of every language is conditioned by other features of that language. Instead, the game is to learn from how other languages do things, assess the tradeoffs they’ve chosen (explicitly and implicitly), and ask what can be applied to the constraints of the language we have and user expectations within the community we have.
很多人建议Java应该采用Swift或Rust的多行字符串字面量。但是,“X语言怎么做就怎么做”的方法本质上是不负责任的。每种语言的几乎每个特性都以该语言的其他特性为条件。相反,关键应该在于学习其他语言的工作方式,评估(显式或隐式)它们选择的折中方案,并询问可以将哪些方法应用到我们所拥有的语言限制中,以及我们所拥有的社区用户的期望。
For JEP 326 (Raw String Literals), we surveyed many modern programming languages and their support for multi-line string literals. The results of these surveys influenced the current proposal, such as the choice of three "
characters for delimiters (although there were other reasons for this choice too) and the recognition of the need for automatic indentation management.
对于JEP 326(原始字符串字面量),我们调查了许多现代编程语言对其多行字符串字面量的支持。这些调查的结果影响了当前的提议,例如为分隔符选择三个"
字符(尽管也有其他原因选择该字符),并且认识到需要自动管理缩进。
If Java introduced multi-line string literals without support for automatically removing incidental white space, then many developers would write a method to remove it themselves, or lobby for the String
class to include a removal method. However, that implies a potentially expensive computation every time the string is instantiated at run time, which would reduce the benefit of string interning. Having the Java language mandate the removal of incidental white space, both in leading and trailing positions, seems the most appropriate solution. Developers can opt out of leading white space removal by careful placement of the closing delimiter.
如果Java引入了多行字符串字面量,但不支持自动删除附带空格,那么许多开发者会编写一种自己删除它的方法,或者说服String
类包含该删除方法。但是,这意味着每次在运行时实例化字符串时,都可能需要进行昂贵的运算,这会降低字符串插入的收益。让Java语言强制删除开头和结尾位置的附带空格似乎是最合适的解决方案。开发者可以通过仔细放置结尾分隔符来选择不删除主要空格。
For JEP 326 (Raw String Literals), we took a different approach to the problem of denoting strings without escaping newlines and quotes, focusing on the raw-ness of strings. We now believe that this focus was wrong, because while raw string literals could easily span multiple lines of source code, the cost of supporting unescaped delimiters in their content was extreme. This limited the effectiveness of the feature in the multi-line use case, which is a critical one because of the frequency of embedding multi-line (but not truly raw) code snippets in Java programs. A good outcome of the pivot from raw-ness to multi-line-ness was a renewed focus on having a consistent escape language between string literals, text blocks, and related features that may be added in future.
对于JEP 326(原始字符串字面量),我们采用了另一种方法来解决在表示字符串时不转义换行和引号的问题,重点是字符串的原始性。现在,我们认为这种关注是错误的,因为尽管原始字符串字面量可以轻松跨越源代码的多行,但在内容中支持未转义的分隔符的代价却特别高。这限制了该功能再多行用例中的有效性,这是至关重要的功能,因为在Java程序中嵌入了多行(但不是真正的原始)代码片段的频率很高。从原始性到多行性的转变的一个很好的结果是重新关注字符串字面量、文本块和将来可能添加的相关特性之间使用一致的转义语言。
Tests that use string literals for the creation, interning, and manipulation of instances of String
should be duplicated to use text blocks too. Negative tests should be added for corner cases involving line terminators and EOF.
使用字符串字面量进行String
实例的创建、intern和操作的测试,同样应该复用于文本块。对于涉及到行终结符和EOF的用例,应该添加负面测试。
Tests should be added to ensure that text blocks can embed Java-in-Java, Markdown-in-Java, SQL-in-Java, and at least one JVM-language-in-Java.
应该添加测试以确保文本块可以嵌入Java中的Java、Java中的Markdown、Java中的SQL和至少一种Java中的JVM语言。