frost.core

Class String

    └ Immutable
         └ Object

Implemented Interfaces:

An immutable sequence of Unicode codepoints. Each Unicode codepoint is a number between 0 and 1,114,112; a String may consist of any sequence of zero or more codepoints, regardless of whether this sequence forms a sensible Unicode string. Note that a single logical character may be composed of multiple Unicode codepoints, such as a string consisting of REGIONAL INDICATOR SYMBOL LETTER B followed by REGIONAL INDICATOR SYMBOL LETTER R, which many platforms will render as a Brazilian flag Emoji. Even though this string will generally be displayed as a single logical character, the String class deals with Unicode codepoints and thus considers the string to have a length of 2.

Internally, Strings are stored using the UTF-8 encoding. The fact that UTF-8 is a variable-length encoding impacts the performance of some operations, as determining the offset of a given codepoint requires traversing the string from the beginning. Because of this, it can be much faster to use String.Index as opposed to numeric offsets to index into Strings. For instance, the code:

for i in 0 .. string.length {
    process(string[i])
}

is relatively slow, as the repeated calls to s[i] constantly re-scan the String from the beginning to find each successive character. We can rewrite this code using String.Index:

var index := string.start
while index != string.end {
    process(string[index])
    index := string.next(index)
}

This avoids the expensive re-scan of the string. Of course, iteration over the string is even simpler:

for c in string {
    process(c)
}

As String is immutable, assembling a string via repeated concatenation is a very slow operation that creates many temporary objects. Instead create a MutableString, repeatedly call append on it, and then finally turn it into an immutable String using finish.

Source Code:
View Source

Inner Classes

String.Index
Represents the position of a Unicode codepoint within a String.
String.Match
Represents a regular expression match within a string.

Class Method Summary

-- multiply operator --
*(count:Int, s:String):String
Returns a string consisting of count copies of s.
-- add operator --
+(o:Object, s:String):String
Returns the concatenation of another object's string representation and this string.

Initializer Summary

init(chars:ListView<Char8>)
Creates a new string containing the given characters.
init(chars:ListView<Char32>)
Creates a new string containing the given characters.

Field Summary

utf8:ListView<Char8>
A view of the UTF8 bytes this string contains.
utf16:Iterator<Char16>
A view of the UTF16 words this string contains.
length:Int
The number of Unicode codepoints this string contains.
byteLength:Int
The number of UTF8 bytes this string contains.
start:Index
An Index representing the beginning of the string.
end:Index
An Index representing the end of the string.
trimmed:String
A copy of this string with leading and trailing whitespace characters removed.
asInt:Int?
Converts this string to a signed number.
asInt64:Int64?
Converts this string to a signed number.
asUInt64:UInt64?
Converts this string to an unsigned number.
asReal64:Real64?
Converts this string to a real number.
Inherited Fields:

Instance Method Summary

format(fmt:String):String
Returns a formatted representation of this string.
startsWith(other:String):Bit
Returns true if this string begins with other.
endsWith(other:String):Bit
Returns true if this string ends with other.
-- add operator --
+(other:String):String
Returns the concatenation of this string and another string.
-- add operator --
+(other:Object):String
Returns the concatenation of this string and another object's string representation.
-- multiply operator --
*(count:Int):String
Returns a string consisting of count copies of this string.
-- equals operator --
=(other:String):Bit
Returns true if these two strings are equal (contain the same sequence of codepoints).
-- greater than operator --
>(other:String):Bit
Returns true if this string is greater than the other string when considered in a codepoint-by-codepoint fashion.
-- index operator --
[](index:Index):Char32
Returns the Unicode codepoint at the given offset within the string.
-- index operator --
[](index:Int):Char32
Returns the Unicode codepoint at the given offset within the string.
substring(r:Range<Index>):String
Returns a 'dependent' substring of a string.
substring(r:Range<Index?>):String
Returns a 'dependent' substring of a string.
-- index operator --
[](r:Range<Index>):String
Returns a substring of a string.
-- index operator --
[](r:Range<Index?>):String
Returns a substring of a string.
-- index operator --
[](r:SteppedRange<Index?, Int>):String
Returns a substring of a string.
-- index operator --
[](r:Range<Int>):String
Returns a substring of a string.
-- index operator --
[](r:Range<Int?>):String
Returns a substring of a string.
-- index operator --
[](r:SteppedRange<Int?, Int>):String
Returns a substring of a string.
contains(c:Char8):Bit
Returns true if this string contains at least one occurrence of the given character.
contains(s:String):Bit
Returns true if this string contains at least one occurrence of the given substring.
indexOf(s:String):Index?
Returns the index of the first occurrence of the string s within this string, or null if not found.
indexOf(s:String, start:Index):Index?
Returns the index of the first occurrence of the string s within this string, starting from the specified index, or null if not found.
lastIndexOf(s:String):Index?
Returns the index of the last occurrence of the string s within this string, or null if not found.
lastIndexOf(s:String, start:Index):Index?
Returns the index of the last occurrence of the string s within this string, starting the search backwards from the specified index, or null if not found.
matches(regex:RegularExpression):Bit
Returns true if this string matches the given regular expression.
contains(needle:RegularExpression):Bit
Returns true if this string contains a match for the given regular expression.
parse(regex:RegularExpression):Array<String?>?
Matches the string against the given regular expression, returning an array of its capture groups.
replace(search:String, replacement:String):String
Returns a new string with every occurrence of search replaced with replacement.
replace(search:RegularExpression, replacement:String):String
Returns a new string with every match of search replaced with replacement.
replace(search:RegularExpression, replacement:String, allowGroupReferences:Bit):String
replace(search:RegularExpression, replacement:(String)=>(Object)):String
Searches the string for a regular expression, replacing occurrences of the regular expression with new text determined by a function.
replace(search:RegularExpression, replacement:(String)=&>(Object)):String
replace(search:RegularExpression, replacement:(ListView<String?>)=>(Object)):String
As replace(RegularExpression, (String)=>(Object)), but the replacement function receives the capture groups from the regular expression rather than the raw matched text.
replace(search:RegularExpression, replacement:(ListView<String?>)=&>(Object)):String
find(needle:String):Iterator<Index>
find(needle:String, overlapping:Bit):Iterator<Index>
find(needle:RegularExpression):Iterator<Match>
find(needle:RegularExpression, overlapping:Bit):Iterator<Match>
next(i:Index):Index
Returns the index of the Unicode codepoint after the given index.
previous(i:Index):Index
Returns the index of the Unicode codepoint before the given index.
offset(index:Index, offset:Int):Index
Returns the index offset by offset Unicode codepoints.
leftAlign(width:Int):String
Returns a new string consisting of this string left-justified in a field of at least width characters.
leftAlign(width:Int, fill:Char32):String
Returns a new string consisting of this string left-justified in a field of at least width characters, filled with the specified character.
rightAlign(width:Int):String
Returns a new string consisting of this string right-justified in a field of at least width characters.
rightAlign(width:Int, fill:Char32):String
Returns a new string consisting of this string right-justified in a field of at least width characters, filled with the specified character.
centerAlign(width:Int):String
Returns a new string consisting of this string centered in a field of at least width characters.
centerAlign(width:Int, fill:Char32):String
Returns a new string consisting of this string centered in a field of at least width characters, filled with the specified character.
split(delimiter:String):Array<String>
Splits this string into tokens separated by a delimiter.
split(delimiter:String, maxResults:Int):Array<String>
Splits this string into tokens separated by a delimiter.
split(delimiter:RegularExpression, maxResults:Int):Array<String>
Splits this string into tokens separated by a delimiter.
split(delimiter:RegularExpression):Array<String>
Splits this string into tokens separated by a delimiter.
Inherited Methods:

Initializers

init (chars:ListView<Char8>)

Creates a new string containing the given characters.

Parameters:
chars - value of type ListView<Char8>
init (chars:ListView<Char32>)

Creates a new string containing the given characters.

Parameters:
chars - value of type ListView<Char32>

Fields

property utf8:ListView<Char8>

A view of the UTF8 bytes this string contains.

property utf16:Iterator<Char16>

A view of the UTF16 words this string contains.

property length:Int

The number of Unicode codepoints this string contains. As the string is internally stored in the variable-width UTF8 format, determining the length of the string takes an amount of time proportional to the number of characters it contains.

property byteLength:Int

The number of UTF8 bytes this string contains.

property start:Index

An Index representing the beginning of the string.

property end:Index

An Index representing the end of the string.

property trimmed:String

A copy of this string with leading and trailing whitespace characters removed.

property asInt:Int?

Converts this string to a signed number. The string must be a sequence of decimal digits, optionally preceded by a minus sign (-), whose numeric representation can fit into an Int64. Returns null if the conversion fails.

property asInt64:Int64?

Converts this string to a signed number. The string must be a sequence of decimal digits, optionally preceded by a minus sign (-), whose numeric representation can fit into an Int64. Returns null if the conversion fails.

property asUInt64:UInt64?

Converts this string to an unsigned number. The string must be a sequence of decimal digits whose numeric representation can fit into a UInt64. Returns null if the conversion fails.

property asReal64:Real64?

Converts this string to a real number. The string must be a valid Frost real literal. Returns null if the conversion fails.

Class Methods

-- multiply operator --
@class
@pre(count >= 0)
function * (count:Int,
 s:String
):String

Returns a string consisting of count copies of s.

Parameters:
count - value of type Int
s - value of type String
-- add operator --
@class
function + (o:Object,
 s:String
):String

Returns the concatenation of another object's string representation and this string. The object's string representation is computed using its toString property.

Parameters:
o - value of type Object
s - value of type String

Instance Methods

@override
function format (fmt:String
):String

Returns a formatted representation of this string. With an empty format string, the raw string is returned. With the format string "frost", a representation of the string as it would appear in Frost source code is returned.

Parameters:
fmt - the format string
Returns:
a formatted string
Overrides:
frost.core.Formattable.format
function startsWith (other:String
):Bit

Returns true if this string begins with other.

Parameters:
other - value of type String
function endsWith (other:String
):Bit

Returns true if this string ends with other.

Parameters:
other - value of type String
-- add operator --
function + (other:String
):String

Returns the concatenation of this string and another string.

Parameters:
other - value of type String
-- add operator --
function + (other:Object
):String

Returns the concatenation of this string and another object's string representation. The object's string representation is computed using its toString property.

Parameters:
other - value of type Object
-- multiply operator --
@pre(count >= 0)
function * (count:Int
):String

Returns a string consisting of count copies of this string.

Parameters:
count - value of type Int
-- equals operator --
@override
function = (other:String
):Bit

Returns true if these two strings are equal (contain the same sequence of codepoints). Strings which logically mean the same thing but contain different codepoints are not equal. For instance, the string with Unicode codepoint LATIN CAPITAL LETTER A WITH ACUTE and the string with Unicode codepoints LATIN CAPITAL LETTER A followed by COMBINING ACUTE ACCENT will (in most programs) display and behave exactly the same, but they do not contain the same sequence of codepoints and therefore are not equal.

Parameters:
other - value of type String
Overrides:
frost.core.Equatable.=
-- greater than operator --
@override
function > (other:String
):Bit

Returns true if this string is greater than the other string when considered in a codepoint-by-codepoint fashion. This is sufficient to provide some ordering of strings, and will generally work acceptably for pure ASCII strings, but will not yield the expected sort order in most locales.

Parameters:
other - value of type String
Overrides:
frost.core.Comparable.>
-- index operator --
function [] (index:Index
):Char32

Returns the Unicode codepoint at the given offset within the string.

Parameters:
index - value of type Index
-- index operator --
function [] (index:Int
):Char32

Returns the Unicode codepoint at the given offset within the string. This overload of the [] operator is slower than the overload that accepts an Index parameter, as it must scan the (internally UTF-8) string from the beginning to find the correct index.

Parameters:
index - value of type Int
@pre(r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength & r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))
function substring (r:Range<Index>
):String

Returns a 'dependent' substring of a string. string.substring(range) behaves exactly the same as the more-common string[range], except that substring does not copy the characters into a new memory buffer, instead referring directly to the memory held by the "parent" string. This means that the parent string will remain in memory as long as any of its substrings do.

string.substring(range) is therefore much more efficient than string[range], provided that forcing the parent string to remain in memory is acceptable.

Parameters:
r - value of type Range<Index>
@pre((r.min == null | (r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength)) & (r.max == null | (r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))))
function substring (r:Range<Index?>
):String

Returns a 'dependent' substring of a string. string.substring(range) behaves exactly the same as the more-common string[range], except that substring does not copy the characters into a new memory buffer, instead referring directly to the memory held by the "parent" string. This means that the parent string will remain in memory as long as any of its substrings do.

string.substring(range) is therefore much more efficient than string[range], provided that forcing the parent string to remain in memory is acceptable.

As with other Range methods, a null min starts at the beginning of the string, and a null max ends at the end of the string.

Parameters:
r - value of type Range<Index?>
-- index operator --
@pre(r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength & r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))
function [] (r:Range<Index>
):String

Returns a substring of a string. If Range.min is greater than Range.max, the resulting substring will be empty.

Parameters:
r - value of type Range<Index>
-- index operator --
@pre((r.min == null | (r.min.byteOffset >= 0 & r.min.byteOffset <= byteLength)) & (r.max == null | (r.max.byteOffset >= 0 & r.max.byteOffset < byteLength + r.inclusive.choose(0, 1))))
function [] (r:Range<Index?>
):String

Returns a substring of a string. If Range.min is not specified, the substring will start at the beginning of the string. If Range.max is not specified, the substring will end at the end of the string. If Range.min is greater than Range.max, the resulting substring will be empty.

Parameters:
r - value of type Range<Index?>
-- index operator --
@pre((r.start == null | (r.start.byteOffset >= 0 & r.start.byteOffset <= byteLength)) & (r.end == null | (r.end.byteOffset >= 0 & r.end.byteOffset < byteLength + r.inclusive.choose(0, 1))))
function [] (r:SteppedRange<Index?, Int>
):String

Returns a substring of a string. The Range.step value is interpreted in terms of Unicode codepoints: that is, s[... by 2] will return a String consisting of every other Unicode codepoint in s. As some Unicode characters consist of more than one codepoint (e.g. when using combining diacriticals, Emoji skin tone modifiers, or Emoji flags), this will mangle such characters when the step is not 1.

A negative range value will scan the string backwards from the starting point, thus [.. by -1] will reverse the Unicode codepoints in the input string. Note again that this will mangle Unicode characters which consist of more than one Unicode codepoint.

Parameters:
r - value of type SteppedRange<Index?, Int>
-- index operator --
function [] (r:Range<Int>
):String

Returns a substring of a string. This version of the [] operator is slower than the one that accepts a Range<Index> parameter, as it must scan the (internally UTF-8) string from the beginning to find the right offsets.

Parameters:
r - value of type Range<Int>
-- index operator --
function [] (r:Range<Int?>
):String

Returns a substring of a string. This version of the [] operator is slower than the one that accepts a Range<Index?> parameter, as it must scan the (internally UTF-8) string from the beginning to find the right offsets.

Parameters:
r - value of type Range<Int?>
-- index operator --
function [] (r:SteppedRange<Int?, Int>
):String

Returns a substring of a string. This version of the [] operator is slower than the one that accepts a Range<Index?> parameter, as it must scan the (internally UTF-8) string from the beginning to find the right offsets.

Parameters:
r - value of type SteppedRange<Int?, Int>
function contains (c:Char8
):Bit

Returns true if this string contains at least one occurrence of the given character.

Parameters:
c - value of type Char8
function contains (s:String
):Bit

Returns true if this string contains at least one occurrence of the given substring.

Parameters:
s - value of type String
function indexOf (s:String
):Index?

Returns the index of the first occurrence of the string s within this string, or null if not found.

Parameters:
s - the string to search for
Returns:
the index of the match, or null if not found
function indexOf (s:String,
 start:Index
):Index?

Returns the index of the first occurrence of the string s within this string, starting from the specified index, or null if not found.

Parameters:
s - the string to search for
start - the index to begin searching from
Returns:
the index of the match, or null if not found
function lastIndexOf (s:String
):Index?

Returns the index of the last occurrence of the string s within this string, or null if not found.

Parameters:
s - the string to search for
Returns:
the index of the match, or null if not found
function lastIndexOf (s:String,
 start:Index
):Index?

Returns the index of the last occurrence of the string s within this string, starting the search backwards from the specified index, or null if not found.

Parameters:
s - the string to search for
start - the index to begin searching from
Returns:
the index of the match, or null if not found
function matches (regex:RegularExpression
):Bit

Returns true if this string matches the given regular expression. The regular expression must match the entire string.

Parameters:
regex - the regular expression to compare against
Returns:
true if the string matches
function contains (needle:RegularExpression
):Bit

Returns true if this string contains a match for the given regular expression. The regular expression may match zero or more characters of the string, starting at any point.

Parameters:
needle - the regular expression to search for
Returns:
true if the string contains a match
function parse (regex:RegularExpression
):Array<String?>?

Matches the string against the given regular expression, returning an array of its capture groups. Group 0, the group containing the entire string, is not returned. If the string does not match the regular expression, returns null. For example, "1,2,34".parse(/(\d+),(\d+),(\d+)/) will return a list consisting of "1", "2", and "34".

Parameters:
regex - the regular expression to parse against
Returns:
a list of the capture groups, or null if the string did not match
function replace (search:String,
 replacement:String
):String

Returns a new string with every occurrence of search replaced with replacement.

Parameters:
search - the string to search for
replacement - the replacement text
Returns:
a string with all matches replaced
function replace (search:RegularExpression,
 replacement:String
):String

Returns a new string with every match of search replaced with replacement. The replacement string may contain $1-style regular expression group references; for instance s.replace(regex, "$1") will replace every occurrence of the regex with the contents of its first group.

Parameters:
search - the regular expression to search for
replacement - the replacement text
Returns:
a string with all matches replaced
function replace (search:RegularExpression,
 replacement:String,
 allowGroupReferences:Bit
):String
Parameters:
search - value of type RegularExpression
replacement - value of type String
allowGroupReferences - value of type Bit
function replace (search:RegularExpression,
 replacement:(String)=>(Object)
):String

Searches the string for a regular expression, replacing occurrences of the regular expression with new text determined by a function. For instance, given:

"This is a test!".replace(/\w+/, word => word.length)

The regular expression /\w+/ matches sequences of one or more word characters; in other words, it matches all words occurring in the string. The replacement function word => word.length replaces each matched sequence with the number of characters in the sequence, resulting in the text:

4 2 1 4!
Parameters:
search - the regular expression to match the string with
replacement - a function generating the replacement text
Returns:
a new string with all occurrences of the regular expression replaced
method replace (search:RegularExpression,
 replacement:(String)=&>(Object)
):String
Parameters:
search - value of type RegularExpression
replacement - value of type (String)=&>(Object)
function replace (search:RegularExpression,
 replacement:(ListView<String?>)=>(Object)
):String

As replace(RegularExpression, (String)=>(Object)), but the replacement function receives the capture groups from the regular expression rather than the raw matched text. The groups list includes the special whole-match group at index 0, with the first set of parentheses in the regular expression corresponding to index 1.

Parameters:
search - the regular expression to match the string with
replacement - a function generating the replacement text
Returns:
a new string with all occurrences of the regular expression replaced
method replace (search:RegularExpression,
 replacement:(ListView<String?>)=&>(Object)
):String
Parameters:
search - value of type RegularExpression
replacement - value of type (ListView<String?>)=&>(Object)
function find (needle:String
):Iterator<Index>
Parameters:
needle - value of type String
function find (needle:String,
 overlapping:Bit
):Iterator<Index>
Parameters:
needle - value of type String
overlapping - value of type Bit
function find (needle:RegularExpression
):Iterator<Match>
Parameters:
needle - value of type RegularExpression
function find (needle:RegularExpression,
 overlapping:Bit
):Iterator<Match>
Parameters:
needle - value of type RegularExpression
overlapping - value of type Bit
function next (i:Index
):Index

Returns the index of the Unicode codepoint after the given index. It is an error to call next() when already at the end of the string. Note that because a logical character can consist of multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING ACUTE ACCENT), this may return an index in the middle of such a compound character.

Parameters:
i - value of type Index
function previous (i:Index
):Index

Returns the index of the Unicode codepoint before the given index. It is an error to call previous() when already at the beginning of the string. Note that because a logical character can consist of multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING ACUTE ACCENT), this may return an index in the middle of such a compound character.

Parameters:
i - value of type Index
function offset (index:Index,
 offset:Int
):Index

Returns the index offset by offset Unicode codepoints. It is an error to index before the beginning or after the end of the string. Note that because a logical character can consist of multiple Unicode codepoints (such as LATIN SMALL LETTER A followed by COMBINING ACUTE ACCENT), this may return an index in the middle of such a compound character.

Parameters:
index - value of type Index
offset - value of type Int
function leftAlign (width:Int
):String

Returns a new string consisting of this string left-justified in a field of at least width characters. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, space characters are appended until the resulting string is width characters long.

Parameters:
width - the minimum width of the string
Returns:
a string at least width characters long
function leftAlign (width:Int,
 fill:Char32
):String

Returns a new string consisting of this string left-justified in a field of at least width characters, filled with the specified character. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, fill characters are appended until the resulting string is width characters long.

Parameters:
width - the minimum width of the string
fill - the fill character
Returns:
a string at least width characters long
function rightAlign (width:Int
):String

Returns a new string consisting of this string right-justified in a field of at least width characters. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, space characters are prepended until the resulting string is width characters long.

Parameters:
width - the minimum width of the string
Returns:
a string at least width characters long
function rightAlign (width:Int,
 fill:Char32
):String

Returns a new string consisting of this string right-justified in a field of at least width characters, filled with the specified character. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, fill characters are prepended until the resulting string is width characters long.

Parameters:
width - the minimum width of the string
fill - the fill character
Returns:
a string at least width characters long
function centerAlign (width:Int
):String

Returns a new string consisting of this string centered in a field of at least width characters. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, space characters are added as equally as possible to the left and right until the resulting string is width characters long. If the number of characters to be added is odd, the right side of the string will receive one more space than the left side.

Parameters:
width - the minimum width of the string
Returns:
a string at least width characters long
function centerAlign (width:Int,
 fill:Char32
):String

Returns a new string consisting of this string centered in a field of at least width characters, filled with the specified character. If this string has a length greater than or equal to width, this string is returned. If this string is shorter than width, fill characters are added as equally as possible to the left and right until the resulting string is width characters long. If the number of characters to be added is odd, the right side of the string will receive one more fill character than the left side.

Parameters:
width - the minimum width of the string
fill - the fill character
Returns:
a string at least width characters long
function split (delimiter:String
):Array<String>

Splits this string into tokens separated by a delimiter. For instance, "This is a long string".split(" ") yields "This", "is", "a", "long", and "string".

Parameters:
delimiter - the token delimiter
Returns:
the split tokens
function split (delimiter:String,
 maxResults:Int
):Array<String>

Splits this string into tokens separated by a delimiter. At most maxResults results will be returned; any additional delimiters beyond that point will be ignored. For instance, "This is a long string".split(" ", 3) yields "This", "is", and "a long string".

Parameters:
delimiter - the token delimiter
maxResults - the maximum number of results to return
Returns:
the split tokens
@pre(maxResults > 0)
function split (delimiter:RegularExpression,
 maxResults:Int
):Array<String>

Splits this string into tokens separated by a delimiter. At most maxResults different strings will be returned; any additional delimiters beyond that point will be ignored. For instance, "This is a long string".split(/\s+/, 3) yields "This", "is", and "a long string".

Parameters:
delimiter - the token delimiter
maxResults - the maximum number of results to return
Returns:
the split tokens
function split (delimiter:RegularExpression
):Array<String>

Splits this string into tokens separated by a delimiter. For instance, "This is a long string".split(/\s+/) yields "This", "is", "a", "long", and "string".

Parameters:
delimiter - the token delimiter
Returns:
the split tokens