Regular expressions can be so cool at times,when there a is need to search and find certain text patterns.Today there was this situation where it was needed to extract the Microprocessor Based Design(ironically,theres absolutely no design involved
) external marks of 66 students from different 66 web pages(as the result is obtained for each student separately from the university site).
Python was of course the obvious choice. ![]()
First i thought of using file operations and string matching to do the job but later decided use regular expressions. I tried to find all occurrences of the string between “<td>MICROPROCESSOR BASED DESIGN</td>” and the first following “</td>” section in the html code(that was the pattern to be matched in this case if you observe the html code) of each page.
This would produce the section containing the external mark but also would contain certain html tags in between which needs to be removed,the mark has to be extracted from the result and for this,usual string matching would suffice. The mark extracted is saved into a file.
The regular expresion for this would be “<td>MICROPROCESOR BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>“. Or better still,you could also try “BASED DESIGN</td>[a-zA-Z0-9=<>\s\n\t/]*<\td>“
The expression cannot start with “DESIGN” cuz it would match COMPUTER ORGANIZATION AND DESIGN,another subject !
Symbols <>,\n,\t,/ need to be be included because the section to be extracted contains html tags.
Heres the complete code. As obvious,regular expressions make things a lot easier as opposed to the usual approach.




![dos Port Scanning [Screenshot]](http://appusajeev.files.wordpress.com/2009/08/dos1.jpg)
