A Case of Regular Expression [Python]

Posted by appusajeev on March 30, 2010

Regular expressions can be so cool at times,when there a is need to search and find certain text patterns.Today there was this situation where it was needed to extract the Microprocessor Based Design(ironically,theres absolutely no design involved :D) external marks of 66 students from different 66 web pages(as the result is obtained for each student separately  from the university site).
Python was of course the obvious choice. 🙂
First i thought of using  file operations and string matching to do the job but later decided use regular expressions. I tried to find all occurrences of the string between  “<td>MICROPROCESSOR BASED DESIGN</td>” and the first following “</td>” section in the html code(that was the pattern to be matched in this case if you observe the html code) of each page.
This would produce the section containing the external mark but also would contain certain html tags in between which needs to be removed,the mark has to be extracted from the result and for this,usual string matching would suffice. The mark extracted is saved into a file.

The regular expresion for this would be “<td>MICROPROCESOR BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>“. Or better still,you could also try “BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>
The expression cannot start with “DESIGN” cuz it would match COMPUTER ORGANIZATION AND DESIGN,another subject !
Symbols <>,\n,\t,/ need to be be included because the section to be extracted contains html tags.

Heres the complete code. As obvious,regular expressions make things a lot easier as opposed to the usual approach.

Regular expression put to use

  1. Alex P said

    Great work.. Was really useful.. Would have taken a hell of time to make this list had it not been for this program..

  2. vidya s said

    well done dear!!:) wit in a short time its great what you have worked..:)
    go ahead:)

  3. I’ve always loved the power and flexibility that Regular Expressions give you as a developer.

    Nice job btw 🙂

  4. Aswin said

    This is great!! Glad to know that someone among us actually learned to use Regular Expressions. 🙂

    “it was needed to extract the Microprocessor Based Design(ironically,theres absolutely no design involved 😀 )” – 😀

  5. Nice work buddy!

  6. devidas said

    That was indeed nice application of reg exp.. good job ! 🙂

