Regular Expression in Python
In the early days of computing, text processing and text pattern matching was a huge challenge. There were no standards or pattern matching characters designed at that time. Matching text patterns in a bulk-sized file was an extremely difficult task at that time.
Until in the 1950s, an American mathematician named Stephen Kleene invented Regular Expressions which entirely revolutionized text processing, pattern matching, and bulk data manipulation. The regular expression meta character called Kleene Star(*) is named after him.
What is the Python Regular Expression?
A Python regular expression is a sequence of meta characters, that defines a search pattern. We use these patterns in a string-searching algorithm to “find” or “find and replace” on strings. They are strings in which “what to match” is defined or written.
The term “regular expressions” is frequently shortened to “regex” at some places.
In this, we will learn the basics of regular expressions in Python. For this, we will use the ‘re’ module. So Let’s import our module.
Regular expressions are typically used in many applications that involve a lot of text processing. Many programming languages include support for regular expressions in the language syntax (Perl, Ruby, etc). Where as some languages like C, C++, and Python, support regular expressions through extension libraries.
What are the uses of Regular Expressions?
As the task of regex is to find and/or replace the given pattern, they can be used in a lot of places where pattern matching is at top priority. Some of its major applications are as follows:
- Text Editors
- Search Engines and Search Mechanism back-ends of websites and APIs
- Code Editors and IDE’s
- Data Entry Software
- Form and User Input data validation
- Data Analytics, Web Scraping
Python Regex – Meta characters
Every character in a Python Regex is either a meta character or a regular character. A meta character has a special meaning whereas a regular character matches itself.A raw string does not handle backslashes in any special way. For this, prepend an ‘r’ before the pattern. Without this, you may have to use ‘\\\\’ for a single backslash character. But with this, you only need r’\’.
Some of the basic metacharacters used in regex are:
Finding Patterns in Text
The most common use for regex is to search for patterns in text. The search() function takes the pattern and text to scan, and returns the Match object when the pattern is found. If the pattern is not found, the search() function returns None.
Each Match object holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs.
Here is an instance of our pattern and the text in which we want to perform the regex function.
Now, we will use the search() function of regex, storing it in variable.
Taking two variables to get the index number of the match pattern in our text data.
\
We will get output with the help of print() function.
The start() and end() methods will give the indexes into the string showing where the text matched by the pattern occurs.
So our output will be like this.
Compiling Expressions
Our re module includes module-level functions to work with regular expressions as text strings, it is more preferable to compile the expressions a program uses frequently. The compile() function converts an expression string into a Regex Object.
Take a look at one example of compiling expression, we are using the listed comprehension to compile two text pattern, using the compile() function.
Now we will state our text data
We are now taking the print of our text data
Now we are applying a for loop over the regexes variable, and taking out a proper print of what are seeking in the text data first.
We will now use if conditional statement to get the answer when our compile pattern will be found in the text data.
Then we will state else conditional statements to get the answer when the compiled pattern will not be found in the text data.
Finally, the total output will be:
The module-level functions maintain a cache of compiled expressions, but its does have a limited size and using compiled expressions directly avoids the overhead associated with cache lookup. By compiling all of the expressions when the module is loaded, the compilation work is shifted to application start time, instead of occurring at a point where the program may be responding to a user action, this is the advantage of using precompiled regex object.
Multiple Matches
The example patterns have all used search() to look for single instances of text strings. The find all() function returns all of the substrings of the input that match the pattern without any overlapping.
For instance, we are taking a pattern and text data
Now, we will use find all() function to find all ‘ab’ from our text data.
The output will be like:
The finder() function returns an iterator that produces Match instances instead of the strings returned by finding all().
To see the use of finder(), we will again use a pattern and text data.
Now, we will run a for loop over finditer() function, containing our pattern and string data.
We will get the output of both ‘ab’, along with their starting and ending index number in the string.
Conclusion :
These were the basics of Python regular expressions. Try experimenting, and make a small project with it.
For instance, take a whole bulk of data which may contain Name of people, their phone number or email address, then try to find out all the phone numbers/email-addresses of everyone from that. Honestly, it is really cool to have such a tool in hand.
Comments
Post a Comment