This article describes how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.
- Split by delimiter:
split()
- Specify the delimiter:
sep
- Specify the maximum number of split:
maxsplit
- Specify the delimiter:
- Split from right by delimiter:
rsplit()
- Split by line break:
splitlines()
- Split by regular expression:
re.split()
- Split by multiple different delimiters
- Concatenate list of strings
- Split based on the number of characters: slice
Split by delimiter: split()
Use split()
method to split by single delimiter.
If the argument is omitted, it will be separated by whitespace. Whitespace include spaces, newlines \n
and tabs \t
, and consecutive whitespace are processed together.
A list of the words is returned.
1 2 3 4 5 6 7 8 9 10 11 12 |
s_blank = 'one two three\nfour\tfive' print(s_blank) # one two three # four five print(s_blank.split()) # ['one', 'two', 'three', 'four', 'five'] print(type(s_blank.split())) # <class 'list'> |
Use join()
, described below, to concatenate a list into string.
Specify the delimiter: sep
Specify a delimiter for the first parameter sep
.
1 2 3 4 5 6 7 8 9 |
s_comma = 'one,two,three,four,five' print(s_comma.split(',')) # ['one', 'two', 'three', 'four', 'five'] print(s_comma.split('three')) # ['one,two,', ',four,five'] |
If you want to specify multiple delimiters, use regular expressions as described later.
Specify the maximum number of split: maxsplit
Specify the maximum number of split for the second parameter maxsplit
.
If maxsplit
is given, at most maxsplit
splits are done.
1 2 3 4 |
print(s_comma.split(',', 2)) # ['one', 'two', 'three,four,five'] |
For example, it is useful when you want to delete the first line from a string.
If sep='\n'
, maxsplit=1
, you can get a list of strings split by the first newline character \n
. The second element [1]
of this list is a string excluding the first line. As it is the last element, it can be specified as [-1]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
s_lines = 'one\ntwo\nthree\nfour' print(s_lines) # one # two # three # four print(s_lines.split('\n', 1)) # ['one', 'two\nthree\nfour'] print(s_lines.split('\n', 1)[0]) # one print(s_lines.split('\n', 1)[1]) # two # three # four print(s_lines.split('\n', 1)[-1]) # two # three # four |
Similarly, to delete the first two lines:
1 2 3 4 5 |
print(s_lines.split('\n', 2)[-1]) # three # four |
rsplit()
splits from the right of the string.
The result is different from split()
only when the second parameter maxsplit
is given.
In the same way as split()
, if you want to delete the last line, use rsplit()
.
1 2 3 4 5 6 7 8 9 10 11 |
# ['one\ntwo\nthree', 'four'] print(s_lines.rsplit('\n', 1)[0]) # one # two # three print(s_lines.rsplit('\n', 1)[1]) # four |
To delete the last two lines:
1 2 3 4 5 |
print(s_lines.rsplit('\n', 2)[0]) # one # two |
Split by line break: splitlines()
There is also a splitlines()
for splitting by line boundaries.
As in the previous examples, split()
and rsplit()
split by default with whitespace including line break, and you can also specify line break with the parameter sep
.
However, it is often better to use splitlines()
.
For example, split string that contains \n
(LF) used by Unix OS including Mac and \r\n
(CR + LF) used by WIndows OS.
1 2 3 4 5 6 7 |
s_lines_multi = '1 one\n2 two\r\n3 three\n' print(s_lines_multi) # 1 one # 2 two # 3 three |
When split()
is applied by default, it is split not only by line breaks but also by spaces.
1 2 3 4 |
print(s_lines_multi.split()) # ['1', 'one', '2', 'two', '3', 'three'] |
Since only one newline character can be specified in sep
, it can not be split if there are mixed newline characters. It is also split at the end of the newline character.
1 2 3 4 |
print(s_lines_multi.split('\n')) # ['1 one', '2 two\r', '3 three', ''] |
splitlines()
splits at various newline characters but not at other whitespace.
1 2 3 4 5 6 7 8 |
print(s_lines_multi<strong>.</strong>splitlines()) # ['1 one', '2 two', '3 three'] If the first argument keepends is set to True, the result includes a newline character at the end of the line. |
1 2 3 4 |
print(s_lines_multi.splitlines(True)) # ['1 one\n', '2 two\r\n', '3 three\n'] |
Split by regular expression: re.split()
split()
and rsplit()
split only when sep
matches completely.
If you want to split a string that matches a regular expression instead of perfect match, use the split()
of the re module.
In re.split()
, specify the regular expression pattern in the first parameter and the target character string in the second parameter.
An example of split by consecutive numbers is as follows.
1 2 3 4 5 6 7 8 |
import re s_nums = 'one1two22three333four' print(re.split('\d+', s_nums)) # ['one', 'two', 'three', 'four'] |
The maximum number of splits can be specified in the third parameter maxsplit
.
1 2 3 4 |
print(re.split('\d+', s_nums, 2)) # ['one', 'two', 'three333four'] |
Split by multiple different delimiters
The following two are useful to remember even if you are not familiar with regular expressions.
Enclose a string with []
to match any single character in it. It can be used to split by multiple different characters.
1 2 3 4 5 6 |
s_marks = 'one-two+three#four' print(re.split('[-+#]', s_marks)) # ['one', 'two', 'three', 'four'] |
If patterns are delimited by |
, it matches any pattern. Of course, it is possible to use special characters of regular expression for each pattern, but it is OK even if normal string is specified as it is. It can be used to split multiple different strings.
1 2 3 4 5 6 |
s_strs = 'oneXXXtwoYYYthreeZZZfour' print(re.split('XXX|YYY|ZZZ', s_strs)) # ['one', 'two', 'three', 'four'] |
Concatenate list of strings
In the previous examples, we split the string and got the list.
If you want to concatenate a list of strings into one string, use the string method join()
.
Call join()
method from 'separator'
, pass a list of strings to be concatenated to argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
l = ['one', 'two', 'three'] print(','.join(l)) # one,two,three print('\n'.join(l)) # one # two # three print(''.join(l)) # onetwothree |
Split based on the number of characters: slice
Use slice to split strings based on the number of characters.
1 2 3 4 5 6 7 8 9 |
s = 'abcdefghij' print(s[:5]) # abcde print(s[5:]) # fghij |
It can be obtained as a tuple or assigned to a variable respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
s_tuple = s[:5], s[5:] print(s_tuple) # ('abcde', 'fghij') print(type(s_tuple)) # <class 'tuple'> s_first, s_last = s[:5], s[5:] print(s_first) # abcde print(s_last) # fghij |
Split into three:
1 2 3 4 5 6 7 8 9 10 11 12 |
s_first, s_second, s_last = s[:3], s[3:6], s[6:] print(s_first) # abc print(s_second) # def print(s_last) # ghij |
The number of characters can be obtained with the built-in function len()
. It can also be split into halves using this.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
half = len(s) // 2 print(half) # 5 s_first, s_last = s[:half], s[half:] print(s_first) # abcde print(s_last) # fghij |
If you want to concatenate strings, use the +
operator.
1 2 3 4 |
print(s_first + s_last) # abcdefghij |