3.4.2. Strings

We have already been introduced to strings as a basic data type. Now we take a look them again from a different point of view. Strings are containers. This means you can look at their insides and do things like check whether the first character is capitalized and whether the third character is “e”.

3.4.2.1. Indexing strings, string slices

To get at the inner components of strings Python uses the same syntax and operators as lists. The Pythonic conception is that both lists and strings belong to a ‘super’ data type, sequences. Sequence types are containers that contain elements in a particular order, so indexing by number makes sense for all sequences:

>>> X = 'dogs'
>>> X[0]
'd'
>>> X[1]
'o'
>>> X[-1]
's'

The following raises an IndexError, as it would with a 4-element list:

>>> X[4]
...
IndexError: string index out of range

Strings can also be one element long:

>>> Y = 'd'

Note

Unlike C, there is no special type for characters in Python. Characters are just one-element strings.

And they can be empty, just as lists can:

>>> Z = ''

As with lists, you can check the contents of strings. So:

>>> 'd' in X
True
>>> 'do' in X
True
>>> 'dg' in X
False

So not just any character (like ‘d’) but any substring (like ‘do’) of a string is regarded as in the string. However, such a substring must contain all the characters starting at one index up to an including the character at some high index, without skipping any. So ‘dg’ is not in X. Such continuous substrings are called slices. Python provides easy access to slices of a string, just as it does for lists. The following examples illustrate how to make such references:

>>> X[0:2] # string of 1st and 2nd characters
'do'
>>> X[:-1] # string excluding last character
'dog'
>>> X[1:3] # string 2nd and 3rd characters
'og'

Keep in mind the following rule when picking slices of a Pythonic sequence X. The slice X[i:j] will start at X[i] and it will have length j-i. Thus, it will not include X[j].

Guido va Rossum says: “The best way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n”:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

The first row of numbers gives the position of the indices 0…5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For nonnegative indices, the length of a slice is the difference of the indices, if both are within bounds, e.g., the length of word[1:3] is 2.

The built-in function len() returns the length of a string:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

Strings can also be concatenated into longer sequences, just as lists can:

>>> X + Y
'dogsd'

Using the name of the type as a function gives us a way of making strings, just as it did with lists:

>>> One = str(1)

One is no longer an int!

>>> One
'1'

Python reminds us of this when printing it by including string quotes. We can turn the string back into an integer just using the official name of the integer type int. Note the string quotes disappear:

>>> I = int(str(1))
>>> I
1

And as with lists, calling the type with no arguments produces the empty string:

>>> Empty = str()
>>> Empty
''

There is one thing that can be done with lists that canNOT be done with strings. Assignment of values:

>>> 'spin'[2]= 'a'
...
TypeError: object does not support item assignment

This can be fixed, by avoiding the assignment or making the string into a mutable sequence, such as a list, which contains the relevant information.

See also

Section Mutability (advanced).

3.4.2.2. Splitting and Joining

There is an easy and important way to go from a string to a list. This is the split method, which returns the list gotten by splitting the string up at given separator characters. The default separator character is a space. Thus:

>>> 'cats are fun'.split()
['cats', 'are', 'fun']

The inverse operation is the join operation, which joins the elments of a list of strings into a single string. To undo the above we do:

>>> ' '.join(['cats', 'are', 'fun'])
'cats are fun'

The most common use for join is to take a list of strings and produce a single string with line breaks:

>>> print ('\n'.join(['Roses are red.', 'Violets are blue.', 'Sugar is sweet', 'but not so pooh.']))
Roses are red.
Violets are blue.
Sugar is sweet,
But no so pooh.

3.4.2.3. Unicode

Python 2.X maintained distinct unicode and string types. That distinction has bee abolished in Python 3.X Unicode-bearing strings are written the same way strings are written. Everything that has been said about strings is also true of strings containing unicode. Strings containing unicode character are still immutable sequences of characters. Indexing and splicing works the same. As we will see below, unicode strings have the same methods as ordinary strings.

What is the point, then? The point is that unicode can represent characters (and writing systems) that strings can’t. There are 128 official ASCII characters, extended to 256 in various semi-official standards. There are 1, 114, 112 possible characters in modern unicode (17 times the original unicode setup of 65,536 characters ), and about 10% of this space has been assigned to characters in various writing systems, international symbols, and emoji.

Here’s one way to define string that includes Cyrillic characters. Each unicode character is associated with a number called its code point; we simply type the code point numbers into the string, preceding each of them with “u”:

>>> print ('\u0420\u043e\u0441\u0441\u0438\u044f')
Россия
>>> russia = '\u0420\u043e\u0441\u0441\u0438\u044f'
>>> russia
'\u0420\u043e\u0441\u0441\u0438\u044f'
>>> print (russia)
Россия
>>> print (russia[0])
Р
>>> russia[0]
'\u0420'

3.4.2.4. String/Unicode methods

In all the following examples, S is a string. This is just a sample. See the official Python docs for the complete list of string methods. Or just type help(str) at the Python prompt!

S.capitalize()

Return a string just like S, except that it is capitalized. If S is already capitalized, the result is identical to S.

S.count(x)

Return the number of times x appears in the string S.

S.index(x)

Return the index in L of the first substring whose identicql to x. It is an error if there is no such item.

S.find(t)

Return index of first instance of t in S, or -1 if not found

S.rfind(t)

Return index of last instance of t in S, or -1 if not found

S.join(Seq)

Combine the strings of Seq into single string using S as the glue. ‘ ‘.join([“See”,”John”,”run”]) produces:

"See John run"

S.replace(x,y)

Return a string in which every instance of the substring x in L is replaced with y:

>>> X = 'abracadabra'
>>> X.replace('dab','bad')
'abracabadra'
>>> X.replace('a','b')
'bbrbcbdbbrb'

S.split(t)

Split S into a list wherever a t is found. If t is not supplied, split wherever a space is found.

S.splitlines()

Split S into a list of strings, one per line.

S.strip()

Copy of S without leading or trailing whitespace.

S.title()

Return a string just like S in which all words are capitalized:

>>> 'los anGeles'.title()
'Los Angeles'

S.istitle()

Return True is every word in S is capitalized. Otherwise, return False:

>>> 'los anGeles'.istitle()
False
>>> 'Los AnGeles'.istitle()
False
>>> 'Los Angeles'.istitle()
True

S.reverse()

Reverse the elements of the list S, in place. Note that what “in place” means here is that unlike the methods above, which all return new lists or different objects or copies, this method permanently changes S.