To get the substring within a string, we can use different functions like contains, find, index and in. In this article we will look at the code examples of all these methods and calculate the time complexity. This way you will know which method is good for overall code performance.
Code Example 1 – Using in
def in_(haystack, needle): return needle in haystack print(in_("Captain America is the first Avenger", "first"))
In this code example, we have defined a python function in_
which is accepting two parameters – haystack
and needle
. It is checking if needle
is in the haystack
. Notice that we have defined our function with name in_
and not in
. This is because in
is a reserved keyword and we should not use it for naming our functions. Learn more about it here and here.
Code Example 2- Using contains
def contains_(haystack, needle): return haystack.__contains__(needle) print(contains_("Captain America is the first Avenger", "first"))
Here we are using Python function __contains__
.
Code Example 3 – Using find
def find_(haystack, needle): return haystack.find(needle) != -1 print(find_("Captain America is the first Avenger", "first"))
Code Example 4 – Using index
def index_(haystack, needle): try: haystack.index(needle) except ValueError: return False else: return True print(index_("Captain America is the first Avenger", "first"))
Now lets compare all these methods and get their time performance –
import timeit import json def in_(haystack, needle): return needle in haystack def contains_(haystack, needle): return haystack.__contains__(needle) def find_(haystack, needle): return haystack.find(needle) != -1 def index_(haystack, needle): try: haystack.index(needle) except ValueError: return False else: return True perf_dict = { 'in:True': min(timeit.repeat(lambda: in_('Captain America is the first Avenger', 'first'), number=1000)), 'in:False': min(timeit.repeat(lambda: in_('Captain America is the first Avenger', 'second'), number=1000)), '__contains__:True': min(timeit.repeat(lambda: contains_('Captain America is the first Avenger', 'first'), number=1000)), '__contains__:False': min(timeit.repeat(lambda: contains_('Captain America is the first Avenger', 'second'), number=1000)), 'find:True': min(timeit.repeat(lambda: find_('Captain America is the first Avenger', 'first'), number=1000)), 'find:False': min(timeit.repeat(lambda: find_('Captain America is the first Avenger', 'second'), number=1000)), 'index:True': min(timeit.repeat(lambda: index_('Captain America is the first Avenger', 'first'), number=1000)), 'index:False': min(timeit.repeat(lambda: index_('Captain America is the first Avenger', 'second'), number=1000)), } print(json.dumps(perf_dict, indent=2))
For calculating the minimum time required by any function we are using timeit.repeat. When I run the code, I got this output –
{ "in:True": 0.00028002727776765823, "in:False": 0.0002694167196750641, "__contains__:True": 0.0004137204959988594, "__contains__:False": 0.00040215253829956055, "find:True": 0.00045281555503606796, "find:False": 0.000452638603746891, "index:True": 0.0004562549293041229, "index:False": 0.000827358104288578 }
From the benchmark, you can see that in
function is nearly twice faster than __contains__
, find
and index
.