Strings are sequences of characters contained within two quotes (either double or single - just be consistent!)
Strings may contain letters, digits, spaces or special characters
"String with a 3 in it!"Strings can be bound to variables
string = "String with a 3 in it!"Strings are best thought of as an ordered sequence. Each element in the sequence can be accessed using an index represented by the array of numbers.
| S | t | r | i | n | g | w | i | t | h | a | 3 | i | n | i | t | ! | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
It is helpful to think of a string as an ordered sequence. Each element in the sequence can be accessed using an index represented by the array of numbers:
Items in the string can be accessed by referencing their index; indexing starts at 0.
Print the first element in the string and the 5th element in the string:
print(string[0])
print(string[5])Will output:
S
g
Negative Indexing
Negative indexing can be used with strings, negative index can count the element from the end of the string.
| S | t | r | i | n | g | w | i | t | h | a | 3 | i | n | i | t | ! | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -22 | -21 | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
The last element is given by the index -1, The first element can be obtained by index -21.
Print the last element in the string and the first elemnt in the string:
print(string[-1])
print(string[-21])Will output:
!
S
The number of characters in a string can be found using len(), short for length:
Find the length of string:
len(string)Will output:
22
Multiple characters can be obtained from a string using slicing.
When taking a slice, the first number indicates the starting index (start at 0), and the second number indicates the length of the slice (start at 1).
Take the first 4 characters of the variable and slice from index 8-11:
string[0:4]
string[8:12]Will output:
'Stri'
'ith '
Stride values can be found using by indicating a step after the slice (with or without selecting the starting index and length)
Select every second element, and select every second element in the range from index 0 to index 4:
string[::2]
string[0:5:2]Will output:
'Srn iha3i t'
'Srn'
Two or more strings can be concatenated by using the + symbol. The result is a new string that is a combination of strings. String values can also be replicated by multiplying the variable or string by the number of times it should be replicated.
Concatenate two strings, and print the string 3 times:
string = "String with a 3 in it!"
new_string = string + " With this too."
print(new_string)
print(3 * string)Will output:
'String with a 3 in it! With this too.'
'String with a 3 in it!String with a 3 in it!String with a 3 in it!'
A new string can be created by setting it to the original variable, concatenated with a new string. Replace string with concatenation
string = string + " With this too."
print(string)Will output:
'String with a 3 in it! With this too.'
Back slashes represent the beginning of escape sequences. Escape sequences represent strings that may be difficult to input. For example, \n represents a new line. The output is given by a new line after the \n is encountered. Similarly, \t represents a tab:
New line escape sequence and tab escape sequence:
print("String\nwith a 3 in it!")
print("String\twith a 3 in it!")Will output:
String
with a 3 in it!
String with a 3 in it!
To place a back slash in a string, a double backslash is used, or an r can be placed before the string:
Include back slash in string:
print("String \\ with a 3 in it!")
# Or
print(r"String \ with a 3 in it!")Will output:
String \ with a 3 in it!
See also f-strings
There are many string operation methods in Python that can be used to manipulate data. Some basic string operations include:
Upper()
the upper() methods conbverts all text characters to upper case.
# Convert all the characters in string to upper case
a = "String with a 3 in it!"
print("before upper:", a)
b = a.upper()
print("After upper:", b)Will output:
before upper: String with a 3 in it!
After upper: STRING WITH A 3 IN IT!
Lower()
the lower() methods conbverts all text characters to lower case.
# Convert all the characters in string to upper case
a = "STRING WITH A 3 IN IT!"
print("before lower:", a)
b = a.lower()
print("After lower:", b)Will output:
before lower: STRING WITH A 3 IN IT!
After lower: string with a 3 in it!
Find()
The method find() finds a sub-string. The argument is the substring you would like to find, and the output is the first index of the sequence. We can find the sub-string with or i.
# Find the substring in the string. Only the index of the first elment of substring in string will be the output
string = "String with a 3 in it!"
string.find('with')
# Find the substring in the string.
string.find('i')Will output:
7
3
If the sub-string is not in the string then the output is a negative one. For example, the string 'Jasd' is not a substring:
string = "String with a 3 in it!"
string.find('Jasd')Will output:
-1
Replace()
The method replace replaces a segment of a string; i.e. a substring, with a new string. The first argument represents the substring to be replaced, the second argument is the substring it is to be replaced with. The result is a new string with the segment changed.
# Replace the word 'string' with 'biskit'
a = "String with a 3 in it!"
b = a.replace('String', 'Biskit')
print(b)Will output:
'Biskit with a 3 in it!'
Similarly:
# Replace the old substring with the new target substring by removing some punctuations.
a = "String, with a 3. in it!"
print(a)
b = a.replace('!','').replace(',','').replace('.','')
print(b)Will output:
String, with a 3. in it!
String with a 3 in it
In Python, RegEx (Regular Expression) is a tool for matching and manipulating strings. Python provides a built-in module called re, which allows working with regular expressions.
The RegEx module provides several functions for working with regular expressions, including search, split, findall, and sub.
There are several special sequences in RegEx that can be used to match specific characters or patterns.
| Special Sequence | Meaning | Example |
|---|---|---|
| \d | Matches any digit character (0-9) | "123" matches "\d\d\d" |
| \D | Matches any non-digit character | "hello" matches "\D\D\D\D\D" |
| \w | Matches any word character (a-z, A-Z, 0-9, and _) | "hello_world" matches "\w\w\w\w\w\w\w\w\w\w\w" |
| \W | Matches any non-word character | "@#$%" matches "\W\W\W\W" |
| \s | Matches any whitespace character (space, tab, newline, etc.) | "hello world" matches "\w\w\w\w\w\s\w\w\w\w\w" |
| \S | Matches any non-whitespace character | "hello_world" matches "\S\S\S\S\S\S\S\S\S" |
| \b | Matches the boundary between a word character and a non-word character | "cat" matches "\bcat\b" in "The cat sat on the mat" |
| \B | Matches any position that is not a word boundary | "cat" matches "\Bcat\B" in "category" but not in "The cat sat on the mat" |
A simple example of using the \d special sequence in a regular expression pattern with Python code.
The regular expression pattern is defined as r"\d\d\d\d\d\d\d\d\d\d", which uses the \d special sequence to match any digit character (0-9), and the \d sequence is repeated ten times to match ten consecutive digits:
import re
pattern = r"\d\d\d\d\d\d\d\d\d\d" # Matches any ten consecutive digits
text = "My Phone number is 1234567890"
match = re.search(pattern, text)
if match:
print("Phone number found:", match.group())
else:
print("No match")Will output:
Phone number found: 1234567890
A simple example of using the \W special sequence to match any character that is not a word character (a-z, A-Z, 0-9, or _). The string we're searching for matches in is "Hello, world!".
pattern = r"\W" # Matches any non-word character
text = "Hello, world!"
matches = re.findall(pattern, text)
print("Matches:", matches)Will output:
Matches: [',', ' ', '!']
The search() function searches for specified patterns within a string. Here is an example that explains how to use the search() function to search for the word "with" in the string "String with a 3 in it!".
string = "String with a 3 in it!"
# Define the pattern to search for
pattern = r"with"
# Use the search() function to search for the pattern in the string
result = re.search(pattern, string)
# Check if a match was found
if result:
print("Match found!")
else:
print("Match not found.")Will output:
Match found!
The findall() function finds all occurrences of a specified pattern within a string.
string = "String with some threes and also a biskit in it!"
# Use the findall() function to find all occurrences of the "it" in the string
result = re.findall("it", string)
# Print out the list of matched words
print(result)Will output:
['it', 'it', 'it']
The method split() splits the string at the specified separator, and returns a list
#Split the substring into list
string = "String with a 3 in it!"
split_string = (string.split())
print(split_string)Will output:
['String', 'with', 'a', '3', 'in', 'it!']
A regular expression's split() function splits a string into an array of substrings based on a specified pattern.
# Use the split function to split the string by the "\s" (Whitespace)
split_array = re.split("\s", string)
# The split_array contains all the substrings, split by whitespace characters
print(split_array)Will output:
['String', 'with', 'a', '3', 'in', 'it!']
The sub() function of a regular expression in Python is used to replace all occurrences of a pattern in a string with a specified replacement.
string = "String with a 3 in it!"
# Define the regular expression pattern to search for
pattern = "String"
# Define the replacement string
replacement = "Biskit"
# Use the sub function to replace the pattern with the replacement string
new_string = re.sub(pattern, replacement, string, flags=re.IGNORECASE)
# The new_string contains the original string with the pattern replaced by the replacement string
print(new_string) Will output:
Biskit with a 3 in it!
While performing similar tasks, sub() Supports regular expressions, allowing for complex pattern matching, it can match and replace based on patterns like character classes, repetitions, and groupings it also allows the use of back references and can handle more complex replacements.
While more powerful and flexible than replace(), the sub() method is slower and has higher resource overheads; so for simple find and replace tasks replace() may be the better option.
Exercise 1:
What is the values of the variables a, b and c in the following code?
a = "1"
b = "2"
c = a + bClick here for the solution
"1"
"2"
"12"
Exercise 2:
Use slicing to print out the first three elements of the variable d:
d = "ABCDEFG"Click here for the solution
d = "ABCDEFG"
print(d[:3])
# or
print(d[0:3])Will output:
ABC
Exercise 3:
Use a stride value of 2 to print out every second character of the string e:
e = 'clocrkr1e1c1t'Click here for the solution
e = 'clocrkr1e1c1t'
print(e[::2])Will output:
correct
Exercise 4:
Print out a backslash:
Click here for the solution
print("\\")
or
print(r"\ ")Will output:
\
Exercise 5:
Convert the variable f to uppercase:
f = "You are wrong"
Click here for the solution
f = "You are wrong"
f.upper()Will output:
'YOU ARE WRONG'
Exercise 6:
Convert the variable f2 to lowercase:
f2="YOU ARE RIGHT"
Click here for the solution
f2="YOU ARE RIGHT"
f2.lower()Will output:
'you are right'
Exercise 7:
find the first index of the sub-string snow in the variable g:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
Click here for the solution
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
g.find("snow")Will output:
95
Exercise 8:
In the variable g, replace the sub-string Mary with Bob:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
Click here for the solution
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
g.replace("Mary", "Bob")Will output:
'Bob had a little lamb Little lamb, little lamb Bob had a little lamb Its fleece was white as snow And everywhere that Bob went Bob went, Bob went Everywhere that Bob went The lamb was sure to go'
Exercise 9:
In the variable g, replace the sub-string , with .:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
Click here for the solution
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
g.replace(',','.')Will output:
'Mary had a little lamb Little lamb. little lamb Mary had a little lamb Its fleece was white as snow And everywhere that Mary went Mary went. Mary went Everywhere that Mary went The lamb was sure to go'
Exercise 10:
In the variable g, split the substring to a list:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
Click here for the solution
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
g.split()Will output:
['Mary', 'had', 'a', 'little', 'lamb', 'Little', 'lamb,', 'little', 'lamb', 'Mary', 'had', 'a', 'little', 'lamb', 'Its', 'fleece', 'was', 'white', 'as', 'snow', 'And', 'everywhere', 'that', 'Mary', 'went', 'Mary', 'went,', 'Mary', 'went', 'Everywhere', 'that', 'Mary', 'went', 'The', 'lamb', 'was', 'sure', 'to', 'go']
Exercise 11:
In the string s3, find the four consecutive digit character using \d and search() function:
s3 = "House number- 1105"
Click here for the solution
import re
s3 = "House number- 1105"
# Use the search() function to search for the "\d" in the string
result = re.search("\d", s3)
# Check if a match was found
if result:
print("Digit found")
else:
print("Digit not found.")Will output:
Digit found
Exercise 12:
In the string str1, replace the sub-string fox with bear using sub() function:
str1= "The quick brown fox jumps over the lazy dog."
Click here for the solution
import re
str1= "The quick brown fox jumps over the lazy dog."
# Use re.sub() to replace "fox" with "bear"
new_str1 = re.sub(r"fox", "bear", str1)
print(new_str1)Will output:
The quick brown bear jumps over the lazy dog.
Exercise 13:
In the string str2 find all the occurrences of woo using findall() function:
str2= "How much wood would a woodchuck chuck, if a woodchuck could chuck wood?"
Click here for the solution
import re
str2= "How much wood would a woodchuck chuck, if a woodchuck could chuck wood?"
# Use re.findall() to find all occurrences of "woo"
matches = re.findall(r"woo", str2)
print(matches)Will output:
['woo', 'woo', 'woo', 'woo']