[ Curiosity,Experimentation ]

Random stuff from the parallel universe of Ones and Zeroes

A Case of Regular Expression [Python]

Posted by appusajeev on March 30, 2010


Regular expressions can be so cool at times,when there a is need to search and find certain text patterns.Today there was this situation where it was needed to extract the Microprocessor Based Design(ironically,theres absolutely no design involved :D) external marks of 66 students from different 66 web pages(as the result is obtained for each student separately  from the university site).
Python was of course the obvious choice. 🙂
First i thought of using  file operations and string matching to do the job but later decided use regular expressions. I tried to find all occurrences of the string between  “<td>MICROPROCESSOR BASED DESIGN</td>” and the first following “</td>” section in the html code(that was the pattern to be matched in this case if you observe the html code) of each page.
This would produce the section containing the external mark but also would contain certain html tags in between which needs to be removed,the mark has to be extracted from the result and for this,usual string matching would suffice. The mark extracted is saved into a file.

The regular expresion for this would be “<td>MICROPROCESOR BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>“. Or better still,you could also try “BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>
The expression cannot start with “DESIGN” cuz it would match COMPUTER ORGANIZATION AND DESIGN,another subject !
Symbols <>,\n,\t,/ need to be be included because the section to be extracted contains html tags.

Heres the complete code. As obvious,regular expressions make things a lot easier as opposed to the usual approach.

Regular expression put to use

Regular expression put to use

Advertisements

6 Responses to “A Case of Regular Expression [Python]”

  1. Alex P said

    Great work.. Was really useful.. Would have taken a hell of time to make this list had it not been for this program..

  2. vidya s said

    well done dear!!:) wit in a short time its great what you have worked..:)
    go ahead:)

  3. I’ve always loved the power and flexibility that Regular Expressions give you as a developer.

    Nice job btw 🙂

  4. Aswin said

    This is great!! Glad to know that someone among us actually learned to use Regular Expressions. 🙂

    “it was needed to extract the Microprocessor Based Design(ironically,theres absolutely no design involved 😀 )” – 😀

  5. Nice work buddy!

  6. devidas said

    That was indeed nice application of reg exp.. good job ! 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: