Strings and Symbols

Strings and Symbols#

Strings#

String is a data type probably most commonly used by Sysadmins. Most of the scripts are about processing the strings - file names, SQLs, commands output, etc.

Exactly the same as in Shell Scripts, the syntax for string is to put it between single or double quote. Double-quoted strings are parsed, so you can put some non-printable characters (like \n for newline or \xnn to put any character which Ascii code is nn hexadecimal). You may use the expression substitution to include any Ruby expression between #{ and }. The expression substitution is similar to the $VARIABLE inside double-quoted string in the Shell script - but you can use any Ruby expression, not only the variable!

If the evaluated expression does not return the string object, Ruby will look for to_s method. If not found, it will insert the result of calling inspect on the value, so in the worst case you got something like #<Server:0x007fdd9> inside the string.

Single-quoted strings are not processed - you can only escape single quote.

puts 'Single \'quoted\' string.\nNo newline parsed.'  # \n is just a backslash and the letter n
Single 'quoted' string.\nNo newline parsed.
puts "Double \"quoted\" string.\nWith newline."       # escape double quotes with \"
Double "quoted" string.
With newline.
"2 + 2 = #{ 2 + 2 }"       # Ruby evaluates the expression between #{ and }
#=> "2 + 2 = 4"
# and puts the value of it inside
ip = Ip.new(192, 168, 1, 1)
#=> #<Ip:0x007fdd90820e90 @p1=192, @p2=168, @p3=1, @p4=1>
"My IP is '#{ip}'"         # it could be any expression which returns string
#=> "My IP is '192.168.1.1'"   # or have 'to_s' method defined

String Formatting#

Another way to put some values inside the Ruby string is to use the % operator. It will construct the new Ruby string with given pattern and arguments. The pattern is similar to printf instruction in Bash, where %s means string, %d - decimal number, %f - floating point number, etc. There is a good description of this format in the documentation: ri Kernel.sprintf.

If there is more than one argument, you must use the Array or Hash (for named arguments, which may be handy for bigger expressions). Arrays and Hashes will be described in next chapters, for now just take a look on the examples:

"The answer is %d" % 42     # simple substitution of the number
#=> "The answer is 42"

"The highest address in 8-bit Atari is %X" % 65535   # convert to hex
#=> "The highest address in 8-bit Atari is FFFF"         # %x would give the lowercase characters

"%d/%d equals to %.3f" % [5, 3, 5.0/3]   # more than one arguments - in Array
#=> "5/3 equals to 1.667"                # note the 5/3 is rounded

"The answer for %<question>s is %<answer>d." % { question: 'everything', answer: 42 }
#=> "The answer for everything is 42."

The similar to % operator is the sprinf function. It returns string formatted with the pattern and attributes. There is a prinf method as well, but not widely used - most Rubyists prefer puts. In sprintf and printf, unlike in the % operator, you don’t have to use Array to pass more than one arguments.

s = sprintf('42 in binary: %08B', 42)  # 08 - put leading zeros to fill 8 characters
#=> "42 in binary: 00101010"
printf("42 = %8B\n84 = %8B\n", 42, 84)
42 =   101010
84 =  1010100

Substrings#

To get specified character or characters from the string, use the substring operator []. This operator works on the given string, gets the integer number - the position of the character in the string, starting from zero (zero means the first character). You can pass a range of integers as well - in this case it will return the substring from/to given numbers. And finally, giving the negative integers, counts from the end: -1 means the last character, -2 second last, etc.

In the similar way you can replace the substring in the string - with []= method: it works on single characters and on ranges.

s = "The ultimate answer"
#=> "The ultimate answer"
s[0]        # the first character
#=> "T"

s[4]         # the fifth character - counting from zero
#=> "u"

s[-1]        # the last one
#=> "r"

s[0..2]      # the range from the first to third
#=> "The"

s[4..11]     # from 5 to 12
#=> "ultimate"

s[-6..-1]    # from 6th from the end to the last one
#=> "answer"

s[4..11] = 'only true'   # replace the characters from 5 to twelve with a new substring
#=> "only true"          # note this substitution returns the value on substring, not 's'

s                        # string 's' is now modified
#=> "The only true answer"

More String Operators#

There is a number of operators on strings, most of them works as you exprected, for example >= checks if the string on the left is greater or equal to the one on the right side (of course it is a comparition in the Ascii way: both strings are compared character by character). Plus + operator contatenates the strings and returns the value of this operation as a new string, and << operator concatenates the right string to the left one, modifying the content of it. Finally, the star operator is used to multiple string.

'abc' >= 'abb'     # it is greater, because 'c' is greater than 'b' on the third position
#=> true

'foo'  'fooz'     # the same beginnig, but 'fooz' is longer and wins
#=> true

s = 'foo'          # creating object 'foo' to play with
#=> "foo"
s + "bar"          # concatenation s with 'bar' produces 'foobar'
#=> "foobar"

s                  # but leaves the object untouched
#=> "foo"

s << "bar"         # and this operator will add 'foo' to the object 's'
#=> "foobar"
s                  # modified content of the object s
#=> "foobar"

"0123456789" * 8   # multiple the string 8 times
#=> "01234567890123456789012345678901234567890123456789012345678901234567890123456789"

String Methods#

Ruby provides a bunch of string manipulation methods (see ri String for the whole list) and it is time to get familiar with another method naming convention: the method, which name ends with question mark, are the questions and should return true or false; method ends with exclamation usually are modifying the content of the object on which they are called (you may think about it as a warning: something will change!). All other functions normally are not changing the existing objects, but constructing the new ones.

Remember that this naming is only a convention and there are methods to modify objects not finishing with exclamation.

Here are the most interesting String methods. Notice that many of them are provided with two variants - returning new objects or modifying existing one.

length - returns integer with string size (number of characters, not bytes - it may not to be the same for non-ascii strings, like UTF-8
bytesize - the size of the string, in bytes
chomp, chomp! - drops newlines and carriage returns at the end of the string
chop, chop! - removes the last character
crypt(salt) - creates one-way cryptographic hash (like encoded password in /etc/passwd) with the given salt
downcase, downcase! - downcase the content of the string
upcase, upcase! - converts the string to uppercase
empty? - returns true, if the string has zero length
include?(other_string) - true, if the string includes the other one
index(other_string) - returns the index of first occurence of the other one or nil if not found
rindex(other_string) - the index of last occurence, or nil
insert(index, other_string) - inserts other_string at the specified index (note this method modifies the content of the string and it is not finished by exclamation: this is because its name cleary idicates, that the string will be changed)
lstrip, lstrip! - removes leading whitespaces (spaces, tabs)
rstrip, rstrip! - trailing whitespaces to be removed
strip, strip! - deletes whitespaces from both left and right side
reverse, reverse! - reverse the string
split(delimiter) - divides the string to a substrings, based on delimiter (like awk -F'delimiter')
tr(from, to), tr!(from_characters, to_string) - just like Unix tr command, changes all characters in the first argument to the string given in the second argument

s = "200\u20ac"  # '\u' is an escape Unicode characters in Ruby strings: \uXXXX, where XXXX is hex code for the character
#=> "200\u20AC"      # 20AC (hex) is an euro sign (€) in Unicode
puts s           # puts does not escape the non-ascii characters
200€
s.length         # '200€' is 4-characters string
#=> 4
s.bytesize       # but in memory occupies 6 bytes
#=> 6

s = "Hello world.\r\n" # CRLF at the end of the string
#=> "Hello world.\r\n"
s.chomp                # chomp removes CR and LF, creates a new string object and returns it
#=> "Hello world."
s                      # but the variable 's' remains untouched
#=> "Hello world.\r\n"
s.chomp!               # chomp! removes CRLF from existing object
#=> "Hello world."
s                      # variable 's' has been modified
#=> "Hello world."

'secret'.crypt('salt') # creates one-way hash for the string 'secret' (similar way how /etc/passwd keeps the hashes)
#=> "saHW9GdxihkGQ"

'small'.upcase  # uppercase the string, method upcase! would modify the object itself
#=> "SMALL"
'Big'.downcase  # lowecase the string 'Big'
#=> "big"

"".empty?       # only "" is an empty string
#=> true

"Hello world".include? "Hell"          # search for a substring 'Hell'
#=> true
"Hello world".include? "hell"          # all functions are case-sensitive
#=> false
"Hello world".downcase.include? "hell" # to do a case-insensitive search, just lowecase the object first
#=> true                               # so the method 'include?' will search on the lowecased object

"Hello world".index 'o'    # first occurence of 'o' is on 4th, counting from zero, position
#=> 4
"Hello world".rindex 'o'   # last occurence of 'o' is on 7th, counting from zero
#=> 7

"Hello world".insert(5, ' cruel')  # inserts substring into 5th position (from zero)
#=> "Hello cruel world"

" Hello  \n".lstrip   # removes whitespaces from the left
#=> "Hello  \n"
" Hello  \n".rstrip   # all trailing whitespaces, including newline, to be removed
#=> " Hello"
" Hello  \n".strip    # remove both from the beginning and from the end
#=> "Hello"

"kayak ".reverse      # create new string by reversing the given object
#=> " kayak"

"Jun 17 02:47:59 joe kernel[0]".split         # splits the string with default delimiter (whitespace)
#=> ["Jun", "17", "02:47:59", "joe", "kernel[0]"]
"02:47:59".split(":")                         # and with given delimiter
#=> ["02", "47", "59"]

'Hello world'.tr 'aeiouy', '*'  # change any of chars: 'aeiouy' with star
#=> "H*ll* w*rld"
'Hello world'.tr 'a-z', '*'     # change all lowercase letters to stars
#=> "H**** *****"

There are many more string processing methods, but most of them are base on Ruby Regular Expressions. Because this is quite big and complicated topic, we are going to elaborate it later, in the Regular Expressions.

Instances and Reference#

Strings, like everything in Ruby, are objects, but it is important to understand that even the same literals are not the same object instances:

'string'.object_id     # construct the first 'string' object
#=> 70281130395700

'string'.object_id     # the second one, different than the previous one
#=> 70281130382560

'string'.object_id == 'string'.object_id   # there are the different objects
#=> false

'string' == 'string'   # but comparision works as expected
#=> true

It is worth to make a note about the reference. In Ruby, variables contains references to the object, not the objects themselves. Thus, while assigning one variable to other, only the reference is copied. In the example below both variables refers to the same object in the memory.

a = 'string'                 # a is the reference to object 'string'
#=> "string"

b = a                        # b is the reference to the same object 'string'
#=> "string"

a.object_id == b.object_id   # true, because a and b referes to the same object,
#=> true

Symbols#

Symbols are immutable strings (so the kind of string which can not be changed). Symbols are commonly used to describe method and variable names. You may think about it as some kind of constant string. Syntax for symbol is the colon followed by the symbol itself, like :symbol. If you want to have whitespaces inside the symbol, use apostrophes: :’the symbol’, but the symbols with spaces are not in a common use.

Unlike strings, symbols are created in memory only once, so every symbol with the same name refers to the same object. This is a main advantage of the symbols compared to strings - the operations on them are much faster than on character strings. For example, to check the equality of two strings Ruby must iterate and check them character by character. The same operation of Symbol object requires only comparition of object identifiers (memory address).

s1 = :the_symbol               # s1 is the reference to object :the_symbol
#=> :the_symbol

s2 = :the_symbol               # s2 is, unlike the String, the same object
#=> :the_symbol

s1.object_id                   # both s1 and s2 refers the same object in the memory
#=> 527848

s2.object_id                   # the same object identifier as s1
#=> 527848