Class String
Object
An immutable sequence of Unicode codepoints. Each Unicode codepoint is a number between 0 and
1,114,112; a String
may consist of any sequence of zero or more codepoints, regardless of whether
this sequence forms a sensible Unicode string. Note that a single logical character may be composed
of multiple Unicode codepoints, such as a string consisting of REGIONAL INDICATOR SYMBOL LETTER B
followed by REGIONAL INDICATOR SYMBOL LETTER R, which many platforms will render as a Brazilian flag
Emoji. Even though this string will generally be displayed as a single logical character, the String
class deals with Unicode codepoints and thus considers the string to have a length of 2.
Internally, String
s are stored using the UTF-8 encoding. The fact that UTF-8 is a variable-length
encoding impacts the performance of some operations, as determining the offset of a given codepoint
requires traversing the string from the beginning. Because of this, it can be much faster to use
String.Index
as opposed to numeric offsets to index into String
s. For instance, the code:
for i in 0 .. string.length {
process(string[i])
}
is relatively slow, as the repeated calls to s[i]
constantly re-scan the String
from the
beginning to find each successive character. We can rewrite this code using String.Index
:
var index := string.start
while index != string.end {
process(string[index])
index := string.next(index)
}
This avoids the expensive re-scan of the string. Of course, iteration over the string is even simpler:
for c in string {
process(c)
}
As String
is immutable, assembling a string via repeated concatenation is a very slow operation
that creates many temporary objects. Instead create a MutableString
, repeatedly call append
on
it, and then finally turn it into an immutable String
using finish.
- Source Code:
- View Source
Inner Classes
String .Index - Represents the position of a Unicode codepoint within a
String
. String .Match - Represents a regular expression match within a string.
Class Method Summary
-- multiply operator --
* (count :
,Int s :
):String String - Returns a string consisting of
count
copies ofs
. -- add operator --
+ (o :
,Object s :
):String String - Returns the concatenation of another object's string representation and this string.
Initializer Summary
init (chars :
)ListView<Char8> - Creates a new string containing the given characters.
init (chars :
)ListView<Char32> - Creates a new string containing the given characters.
Field Summary
utf8 :ListView<Char8> - A view of the UTF8 bytes this string contains.
utf16 :Iterator<Char16> - A view of the UTF16 words this string contains.
length :Int - The number of Unicode codepoints this string contains.
byteLength :Int - The number of UTF8 bytes this string contains.
start :Index - An
Index
representing the beginning of the string. end :Index - An
Index
representing the end of the string. trimmed :String - A copy of this string with leading and trailing whitespace characters removed.
asInt :Int? - Converts this string to a signed number.
asInt64 :Int64? - Converts this string to a signed number.
asUInt64 :UInt64? - Converts this string to an unsigned number.
asReal64 :Real64? - Converts this string to a real number.
Instance Method Summary
format (fmt :
):String String - Returns a formatted representation of this string.
startsWith (other :
):String Bit - Returns
true
if this string begins withother
. endsWith (other :
):String Bit - Returns
true
if this string ends withother
. -- add operator --
+ (other :
):String String - Returns the concatenation of this string and another string.
-- add operator --
+ (other :
):Object String - Returns the concatenation of this string and another object's string representation.
-- multiply operator --
* (count :
):Int String - Returns a string consisting of
count
copies of this string. -- equals operator --
= (other :
):String Bit - Returns true if these two strings are equal (contain the same sequence of codepoints).
-- greater than operator --
> (other :
):String Bit - Returns
true
if this string is greater than the other string when considered in a codepoint-by-codepoint fashion. -- index operator --
[] (index :
):Index Char32 - Returns the Unicode codepoint at the given offset within the string.
-- index operator --
[] (index :
):Int Char32 - Returns the Unicode codepoint at the given offset within the string.
substring (r :
):Range<Index> String - Returns a 'dependent' substring of a string.
substring (r :
):Range<Index?> String - Returns a 'dependent' substring of a string.
-- index operator --
[] (r :
):Range<Index> String - Returns a substring of a string.
-- index operator --
[] (r :
):Range<Index?> String - Returns a substring of a string.
-- index operator --
[] (r :
):SteppedRange<Index?, Int> String - Returns a substring of a string.
-- index operator --
[] (r :
):Range<Int> String - Returns a substring of a string.
-- index operator --
[] (r :
):Range<Int?> String - Returns a substring of a string.
-- index operator --
[] (r :
):SteppedRange<Int?, Int> String - Returns a substring of a string.
contains (c :
):Char8 Bit - Returns
true
if this string contains at least one occurrence of the given character. contains (s :
):String Bit - Returns
true
if this string contains at least one occurrence of the given substring. indexOf (s :
):String Index? - Returns the index of the first occurrence of the string
s
within this string, ornull
if not found. indexOf (s :
,String start :
):Index Index? - Returns the index of the first occurrence of the string
s
within this string, starting from the specifiedindex
, ornull
if not found. lastIndexOf (s :
):String Index? - Returns the index of the last occurrence of the string
s
within this string, ornull
if not found. lastIndexOf (s :
,String start :
):Index Index? - Returns the index of the last occurrence of the string
s
within this string, starting the search backwards from the specifiedindex
, ornull
if not found. matches (regex :
):RegularExpression Bit - Returns
true
if this string matches the given regular expression. contains (needle :
):RegularExpression Bit - Returns
true
if this string contains a match for the given regular expression. parse (regex :
):RegularExpression Array<String?>? - Matches the string against the given regular expression, returning an array of its capture groups.
replace (search :
,String replacement :
):String String - Returns a new string with every occurrence of
search
replaced withreplacement
. replace (search :
,RegularExpression replacement :
):String String - Returns a new string with every match of
search
replaced withreplacement
. replace (search :
,RegularExpression replacement :
,String allowGroupReferences :
):Bit String replace (search :
,RegularExpression replacement :
):(String)=>(Object) String - Searches the string for a regular expression, replacing occurrences of the regular expression with new text determined by a function.
replace (search :
,RegularExpression replacement :
):(String)=&>(Object) String replace (search :
,RegularExpression replacement :
):(ListView<String?>)=>(Object) String - As
replace(RegularExpression, (String)=>(Object))
, but the replacement function receives the capture groups from the regular expression rather than the raw matched text. replace (search :
,RegularExpression replacement :
):(ListView<String?>)=&>(Object) String find (needle :
):String Iterator<Index> find (needle :
,String overlapping :
):Bit Iterator<Index> find (needle :
):RegularExpression Iterator<Match> find (needle :
,RegularExpression overlapping :
):Bit Iterator<Match> next (i :
):Index Index - Returns the index of the Unicode codepoint after the given index.
previous (i :
):Index Index - Returns the index of the Unicode codepoint before the given index.
offset (index :
,Index offset :
):Int Index - Returns the index offset by
offset
Unicode codepoints. leftAlign (width :
):Int String - Returns a new string consisting of this string left-justified in a field of at least
width
characters. leftAlign (width :
,Int fill :
):Char32 String - Returns a new string consisting of this string left-justified in a field of at least
width
characters, filled with the specified character. rightAlign (width :
):Int String - Returns a new string consisting of this string right-justified in a field of at least
width
characters. rightAlign (width :
,Int fill :
):Char32 String - Returns a new string consisting of this string right-justified in a field of at least
width
characters, filled with the specified character. centerAlign (width :
):Int String - Returns a new string consisting of this string centered in a field of at least
width
characters. centerAlign (width :
,Int fill :
):Char32 String - Returns a new string consisting of this string centered in a field of at least
width
characters, filled with the specified character. split (delimiter :
):String Array<String> - Splits this string into tokens separated by a delimiter.
split (delimiter :
,String maxResults :
):Int Array<String> - Splits this string into tokens separated by a delimiter.
split (delimiter :
,RegularExpression maxResults :
):Int Array<String> - Splits this string into tokens separated by a delimiter.
split (delimiter :
):RegularExpression Array<String> - Splits this string into tokens separated by a delimiter.
Initializers
init
(chars :ListView<Char8>
)
Creates a new string containing the given characters.
- Parameters:
-
- value of typechars ListView<Char8>
init
(chars :ListView<Char32>
)
Creates a new string containing the given characters.
- Parameters:
-
- value of typechars ListView<Char32>
Fields
A view of the UTF8 bytes this string contains.
A view of the UTF16 words this string contains.
The number of Unicode codepoints this string contains. As the string is internally stored in the variable-width UTF8 format, determining the length of the string takes an amount of time proportional to the number of characters it contains.
The number of UTF8 bytes this string contains.
An Index
representing the beginning of the string.
An Index
representing the end of the string.
A copy of this string with leading and trailing whitespace characters removed.
Converts this string to a signed number. The string must be a sequence of decimal digits,
optionally preceded by a minus sign (-
), whose numeric representation can fit into an Int64
.
Returns null
if the conversion fails.
Converts this string to a signed number. The string must be a sequence of decimal digits,
optionally preceded by a minus sign (-
), whose numeric representation can fit into an Int64
.
Returns null
if the conversion fails.
Converts this string to an unsigned number. The string must be a sequence of decimal digits
whose numeric representation can fit into a UInt64
. Returns null
if the conversion fails.
Converts this string to a real number. The string must be a valid Frost real literal. Returns
null
if the conversion fails.
Class Methods
Returns the concatenation of another object's string representation and this string. The object's string representation is computed using its toString property.
Instance Methods
Returns a formatted representation of this string. With an empty format string, the raw string
is returned. With the format string "frost"
, a representation of the string as it would appear
in Frost source code is returned.
- Parameters:
-
-fmt the format string
- Returns:
- a formatted string
- Overrides:
- frost.core.Formattable.format
Returns true
if this string begins with other
.
- Parameters:
-
- value of typeother String
Returns true
if this string ends with other
.
- Parameters:
-
- value of typeother String
Returns the concatenation of this string and another string.
- Parameters:
-
- value of typeother String
Returns the concatenation of this string and another object's string representation. The object's string representation is computed using its toString property.
- Parameters:
-
- value of typeother Object
Returns a string consisting of count
copies of this string.
- Parameters:
-
- value of typecount Int
Returns true if these two strings are equal (contain the same sequence of codepoints). Strings which logically mean the same thing but contain different codepoints are not equal. For instance, the string with Unicode codepoint LATIN CAPITAL LETTER A WITH ACUTE and the string with Unicode codepoints LATIN CAPITAL LETTER A followed by COMBINING ACUTE ACCENT will (in most programs) display and behave exactly the same, but they do not contain the same sequence of codepoints and therefore are not equal.
- Parameters:
-
- value of typeother String
- Overrides:
- frost.core.Equatable.=
Returns true
if this string is greater than the other string when considered in a
codepoint-by-codepoint fashion. This is sufficient to provide some ordering of strings, and
will generally work acceptably for pure ASCII strings, but will not yield the expected sort
order in most locales.
- Parameters:
-
- value of typeother String
- Overrides:
- frost.core.Comparable.>
Returns the Unicode codepoint at the given offset within the string.
- Parameters:
-
- value of typeindex Index
Returns the Unicode codepoint at the given offset within the string. This overload of the []
operator is slower than the overload that accepts an Index
parameter, as it must scan the
(internally UTF-8) string from the beginning to find the correct index.
- Parameters:
-
- value of typeindex Int
@pre(r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength &
r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))
function substring
(r :Range<Index>
):String
Returns a 'dependent' substring of a string. string.substring(range)
behaves exactly the same
as the more-common string[range]
, except that substring
does not copy the characters into a
new memory buffer, instead referring directly to the memory held by the "parent" string. This
means that the parent string will remain in memory as long as any of its substrings do.
string.substring(range)
is therefore much more efficient than string[range]
, provided that
forcing the parent string to remain in memory is acceptable.
- Parameters:
-
- value of typer Range<Index>
@pre((r.min == null | (r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength)) &
(r.max == null | (r.max.byteOffset >= 0 & r.max.byteOffset < byteLength +
r.inclusive.choose(0, 1))))
function substring
(r :Range<Index?>
):String
Returns a 'dependent' substring of a string. string.substring(range)
behaves exactly the same
as the more-common string[range]
, except that substring
does not copy the characters into a
new memory buffer, instead referring directly to the memory held by the "parent" string. This
means that the parent string will remain in memory as long as any of its substrings do.
string.substring(range)
is therefore much more efficient than string[range]
, provided that
forcing the parent string to remain in memory is acceptable.
As with other Range
methods, a null min
starts at the beginning of the string, and a null
max
ends at the end of the string.
- Parameters:
-
- value of typer Range<Index?>
-- index operator --
@pre(r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength &
r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))
function []
(r :Range<Index>
):String
Returns a substring of a string. If Range.min
is greater than Range.max
, the resulting
substring will be empty.
- Parameters:
-
- value of typer Range<Index>
-- index operator --
@pre((r.min == null | (r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength)) &
(r.max == null | (r.max.byteOffset >= 0 & r.max.byteOffset < byteLength +
r.inclusive.choose(0, 1))))
function []
(r :Range<Index?>
):String
Returns a substring of a string. If Range.min
is not specified, the substring will start at
the beginning of the string. If Range.max
is not specified, the substring will end at the end
of the string. If Range.min
is greater than Range.max
, the resulting substring will be
empty.
- Parameters:
-
- value of typer Range<Index?>
-- index operator --
@pre((r.start == null | (r.start.byteOffset >= 0 & r.start.byteOffset <= byteLength)) &
(r.end == null | (r.end.byteOffset >= 0 & r.end.byteOffset < byteLength +
r.inclusive.choose(0, 1))))
function []
(r :SteppedRange<Index?, Int>
):String
Returns a substring of a string. The Range.step
value is interpreted in terms of Unicode
codepoints: that is, s[... by 2]
will return a String
consisting of every other Unicode
codepoint in s
. As some Unicode characters consist of more than one codepoint (e.g. when using
combining diacriticals, Emoji skin tone modifiers, or Emoji flags), this will mangle such
characters when the step is not 1
.
A negative range value will scan the string backwards from the starting point, thus [.. by -1]
will reverse the Unicode codepoints in the input string. Note again that this will mangle
Unicode characters which consist of more than one Unicode codepoint.
- Parameters:
-
- value of typer SteppedRange<Index?, Int>
-- index operator --
function []
(r :Range<Int>
):String
Returns a substring of a string. This version of the []
operator is slower than the one that
accepts a Range<Index>
parameter, as it must scan the (internally UTF-8) string from the
beginning to find the right offsets.
- Parameters:
-
- value of typer Range<Int>
-- index operator --
function []
(r :Range<Int?>
):String
Returns a substring of a string. This version of the []
operator is slower than the one that
accepts a Range<Index?>
parameter, as it must scan the (internally UTF-8) string from the
beginning to find the right offsets.
- Parameters:
-
- value of typer Range<Int?>
-- index operator --
function []
(r :SteppedRange<Int?, Int>
):String
Returns a substring of a string. This version of the []
operator is slower than the one that
accepts a Range<Index?>
parameter, as it must scan the (internally UTF-8) string from the
beginning to find the right offsets.
- Parameters:
-
- value of typer SteppedRange<Int?, Int>
Returns true
if this string contains at least one occurrence of the given character.
- Parameters:
-
- value of typec Char8
Returns true
if this string contains at least one occurrence of the given substring.
- Parameters:
-
- value of types String
Returns the index of the first occurrence of the string s
within this string, or null
if not
found.
- Parameters:
-
-s the string to search for
- Returns:
- the index of the match, or
null
if not found
Returns the index of the first occurrence of the string s
within this string, starting from
the specified index
, or null
if not found.
- Parameters:
-
-s the string to search for
-start the index to begin searching from
- Returns:
- the index of the match, or
null
if not found
Returns the index of the last occurrence of the string s
within this string, or null
if not
found.
- Parameters:
-
-s the string to search for
- Returns:
- the index of the match, or
null
if not found
Returns the index of the last occurrence of the string s
within this string, starting the
search backwards from the specified index
, or null
if not found.
- Parameters:
-
-s the string to search for
-start the index to begin searching from
- Returns:
- the index of the match, or
null
if not found
function matches
(regex :RegularExpression
):Bit
Returns true
if this string matches the given regular expression. The regular expression must
match the entire string.
- Parameters:
-
-regex the regular expression to compare against
- Returns:
-
true
if the string matches
function contains
(needle :RegularExpression
):Bit
Returns true
if this string contains a match for the given regular expression. The regular
expression may match zero or more characters of the string, starting at any point.
- Parameters:
-
-needle the regular expression to search for
- Returns:
-
true
if the string contains a match
function parse
(regex :RegularExpression
):Array<String?>?
Matches the string against the given regular expression, returning an array of its capture
groups. Group 0, the group containing the entire string, is not returned. If the string does not
match the regular expression, returns null
. For example, "1,2,34".parse(/(\d+),(\d+),(\d+)/)
will return a list consisting of "1"
, "2"
, and "34"
.
- Parameters:
-
-regex the regular expression to parse against
- Returns:
- a list of the capture groups, or
null
if the string did not match
Returns a new string with every occurrence of search
replaced with replacement
.
- Parameters:
-
-search the string to search for
-replacement the replacement text
- Returns:
- a string with all matches replaced
function replace
(search :RegularExpression
,
replacement :String
):String
Returns a new string with every match of search
replaced with replacement
. The replacement
string may contain $1
-style regular expression group references; for instance
s.replace(regex, "$1")
will replace every occurrence of the regex with the contents of its
first group.
- Parameters:
-
-search the regular expression to search for
-replacement the replacement text
- Returns:
- a string with all matches replaced
function replace
(search :RegularExpression
,
replacement :String
,
allowGroupReferences :Bit
):String
- Parameters:
-
- value of typesearch RegularExpression
- value of typereplacement String
- value of typeallowGroupReferences Bit
function replace
(search :RegularExpression
,
replacement :(String)=>(Object)
):String
Searches the string for a regular expression, replacing occurrences of the regular expression with new text determined by a function. For instance, given:
"This is a test!".replace(/\w+/, word => word.length)
The regular expression /\w+/
matches sequences of one or more word characters; in other words,
it matches all words occurring in the string. The replacement function word => word.length
replaces each matched sequence with the number of characters in the sequence, resulting in the
text:
4 2 1 4!
- Parameters:
-
-search the regular expression to match the string with
-replacement a function generating the replacement text
- Returns:
- a new string with all occurrences of the regular expression replaced
method replace
(search :RegularExpression
,
replacement :(String)=&>(Object)
):String
- Parameters:
-
- value of typesearch RegularExpression
- value of typereplacement (String)=&>(Object)
function replace
(search :RegularExpression
,
replacement :(ListView<String?>)=>(Object)
):String
As replace(RegularExpression, (String)=>(Object))
, but the replacement function receives the
capture groups from the regular expression rather than the raw matched text. The groups list
includes the special whole-match group at index 0
, with the first set of parentheses in the
regular expression corresponding to index 1
.
- Parameters:
-
-search the regular expression to match the string with
-replacement a function generating the replacement text
- Returns:
- a new string with all occurrences of the regular expression replaced
method replace
(search :RegularExpression
,
replacement :(ListView<String?>)=&>(Object)
):String
- Parameters:
-
- value of typesearch RegularExpression
- value of typereplacement (ListView<String?>)=&>(Object)
function find
(needle :String
):Iterator<Index>
- Parameters:
-
- value of typeneedle String
function find
(needle :String
,
overlapping :Bit
):Iterator<Index>
function find
(needle :RegularExpression
):Iterator<Match>
- Parameters:
-
- value of typeneedle RegularExpression
function find
(needle :RegularExpression
,
overlapping :Bit
):Iterator<Match>
- Parameters:
-
- value of typeneedle RegularExpression
- value of typeoverlapping Bit
Returns the index of the Unicode codepoint after the given index. It is an error to call
next()
when already at the end of the string. Note that because a logical character can
consist of multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING ACUTE
ACCENT), this may return an index in the middle of such a compound character.
- Parameters:
-
- value of typei Index
Returns the index of the Unicode codepoint before the given index. It is an error to call
previous()
when already at the beginning of the string. Note that because a logical character
can consist of multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING
ACUTE ACCENT), this may return an index in the middle of such a compound character.
- Parameters:
-
- value of typei Index
Returns the index offset by offset
Unicode codepoints. It is an error to index before the
beginning or after the end of the string. Note that because a logical character can consist of
multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING ACUTE ACCENT),
this may return an index in the middle of such a compound character.
Returns a new string consisting of this string left-justified in a field of at least width
characters. If this string has a length greater than or equal to width
, this string is
returned. If this string is shorter than width
, space characters are appended until the
resulting string is width
characters long.
- Parameters:
-
-width the minimum width of the string
- Returns:
- a string at least
width
characters long
Returns a new string consisting of this string left-justified in a field of at least width
characters, filled with the specified character. If this string has a length greater than or
equal to width
, this string is returned. If this string is shorter than width
, fill
characters are appended until the resulting string is width
characters long.
- Parameters:
-
-width the minimum width of the string
-fill the fill character
- Returns:
- a string at least
width
characters long
Returns a new string consisting of this string right-justified in a field of at least width
characters. If this string has a length greater than or equal to width
, this string is
returned. If this string is shorter than width
, space characters are prepended until the
resulting string is width
characters long.
- Parameters:
-
-width the minimum width of the string
- Returns:
- a string at least
width
characters long
Returns a new string consisting of this string right-justified in a field of at least width
characters, filled with the specified character. If this string has a length greater than or
equal to width
, this string is returned. If this string is shorter than width
, fill
characters are prepended until the resulting string is width
characters long.
- Parameters:
-
-width the minimum width of the string
-fill the fill character
- Returns:
- a string at least
width
characters long
Returns a new string consisting of this string centered in a field of at least width
characters. If this string has a length greater than or equal to width
, this string is
returned. If this string is shorter than width
, space characters are added as equally as
possible to the left and right until the resulting string is width
characters long. If the
number of characters to be added is odd, the right side of the string will receive one more
space than the left side.
- Parameters:
-
-width the minimum width of the string
- Returns:
- a string at least
width
characters long
Returns a new string consisting of this string centered in a field of at least width
characters, filled with the specified character. If this string has a length greater than or
equal to width
, this string is returned. If this string is shorter than width
, fill
characters are added as equally as possible to the left and right until the resulting string is
width
characters long. If the number of characters to be added is odd, the right side of the
string will receive one more fill
character than the left side.
- Parameters:
-
-width the minimum width of the string
-fill the fill character
- Returns:
- a string at least
width
characters long
function split
(delimiter :String
):Array<String>
Splits this string into tokens separated by a delimiter. For instance,
"This is a long string".split(" ")
yields "This"
, "is"
, "a"
, "long"
, and "string"
.
- Parameters:
-
-delimiter the token delimiter
- Returns:
- the split tokens
function split
(delimiter :String
,
maxResults :Int
):Array<String>
Splits this string into tokens separated by a delimiter. At most maxResults
results will be
returned; any additional delimiters beyond that point will be ignored. For instance,
"This is a long string".split(" ", 3)
yields "This"
, "is"
, and "a long string"
.
- Parameters:
-
-delimiter the token delimiter
-maxResults the maximum number of results to return
- Returns:
- the split tokens
@pre(maxResults > 0)
function split
(delimiter :RegularExpression
,
maxResults :Int
):Array<String>
Splits this string into tokens separated by a delimiter. At most maxResults
different strings
will be returned; any additional delimiters beyond that point will be ignored. For instance,
"This is a long string".split(/\s+/, 3)
yields "This"
, "is"
, and "a long string"
.
- Parameters:
-
-delimiter the token delimiter
-maxResults the maximum number of results to return
- Returns:
- the split tokens
function split
(delimiter :RegularExpression
):Array<String>
Splits this string into tokens separated by a delimiter. For instance,
"This is a long string".split(/\s+/)
yields "This"
, "is"
, "a"
, "long"
, and
"string"
.
- Parameters:
-
-delimiter the token delimiter
- Returns:
- the split tokens