Thursday, 29 March 2012

Extracting All Hyperlinks From Webpages - Python

In this example, I am going to show how easily you can extract all the links in a webpage using python. If you are learning to write some small scale crawler, this can be a quick startup on how you can extract the links in any webpage.

Basically, we will send the http request to any webpage and we will read the HTML response except in the case when the connection can not be established. In such case, we will simply inform the user that we could not connect to the website.

For all these stuffs, we will import few modules and most important ones are re and urllib2 for regular expression stuff and HTTP request/response stuffs respectively.

We then write the regex for the hyperlinks for which we will make a search in the HTML data we get back after sending the request from the server. Note the <a href=[\'"]?([^\'" >]+). The small brackets are there to let us capture our necessary information i.e. the actual links.

Now you understood what we'll be doing, below is the python script to extract the hyperlinks from any webpage.

#!/usr/bin/python

import re, urllib2
from sys import argv

if (len(argv) != 2):
    print "No URL specified. Taking default URL for link extraction"
    url = "http://www.techgaun.com"
else:
    url = str(argv[1])
    
links_regex = re.compile('<a href=[\'"]?([^\'" >]+)', re.IGNORECASE)
url_request = urllib2.Request(url)
try:
    response = urllib2.urlopen(url_request)
    html = response.read()
    links = links_regex.findall(html)
    print '\n'.join(links)
except urllib2.URLError:
    print "Can't Connect to the website"

Now run the script as python extracter.py http://www.techgaun.com or any URL you wish to.

So isn't it a good start for writing your own simple web crawler? :P


Read more...

Wednesday, 28 March 2012

How To Fix NTFS Disk Partition From Linux

If you have problematic NTFS partition in your hard disk, you can fix many of the common NTFS inconsistencies from linux. Linux consists of a set of tools that allow you to manipulate and perform different types of actions on the NTFS partitions. This package is known as ntfsprogs.

If your linux distribution does not consist of the ntfsprogs package, you can install it by using the package manager tool that comes in your distribution or from command line. Debian and ubuntu users can type the following command:

sudo apt-get install ntfsprogs

Now to fix the NTFS drive, we must first determine the partition we want to fix. We can use the simplest one, the fdisk utility to determine the partition of hard disk we want to fix. Type the following command to view the list of partitions:

sudo fdisk -l

If you have more than one HDDs and want to view partitions of specific HDD, you can always do so by issuing the commands such as sudo fdisk -l /dev/sda or sudo fdisk -l /dev/sdb and so on.

Now lets suppose its /dev/sdb5 we need to fix. We can now use the ntfsfix command that comes in the ntfsprogs package.

sudo ntfsprogs /dev/sdb5

Note that it only repairs some fundamental NTFS inconsistencies, resets the NTFS journal file and schedules an NTFS consistency check for the first boot into Windows. You may run ntfsfix on an NTFS volume if you think it was damaged by Windows or some other way and it cannot be mounted.


Read more...

Saturday, 24 March 2012

Thoughts On Combining Compression and Encryption

One of the issues while talking about encryption and cryptography is how should we combine compression with encryption. Data compression is one of the tasks people often do. Combining compression and encryption needs some addressing since compression should always be done before the encryption and not the other way.

The results are generally not good if encryption is done before compressing the data. This is because of the nature of the encryption. Compression takes advantage of non-randomness of data but a good encryption generates the random stream of data which is unlikely to get good compression in cases of loss-less compressions. Of course, some image compression which are not loss-less will still get some compression.

Compression technology looks for the repeatability of data and performs compression by looking such patterns. Most encryption schemes transform the data such that it is random or very very close to being random. Output of good encryption scheme must be indistinguishable from truly random. And compressing the truly random data would not produce effective result. Hence, compress first and then do the encryption. :)


Read more...

Friday, 23 March 2012

How To Copy Text To Clipboard From Command Prompt

I had earlier posted about alternate data streams and the post consisted of texts copied from command line. I was on local IRC channel, one guy was curious if I was using the redirection operator to get the content from the command prompt. So I thought to share this simple tip to copy text from command prompt in windows. Follow the steps as below: 1) Right click anywhere on the command prompt window and then select the Mark option.
2) Now start selecting the text you need to copy using your mouse. You could keep on holding mouse and then do the selection. Alternatively, you could click on the starting point and then while holding the SHIFT key, click on the end of text you wish to copy.
3) After selecting the required text, just press Enter. Alternatively, you can right click on the top title bar of command prompt and then go to Edit -> Copy. If you are looking for copy pasting methods in linux terminals, you can read my article. I hope this helps some of you guys. :)


Read more...

Tuesday, 20 March 2012

Some Fun With Alternate Data Streams

I have not been blogging for a while because of exams but now I'm free for few days so here comes another post back from my home village. This time, I'm going to share some basic funs with alternate data streams from theory to some practical stuffs.

What is Alternate Data Stream Alternate Data Stream(ADS) is a kind of file system fork which allows more than one data stream to be associated with a single filename. Alternate Data Stream was introduced by Microsoft as a part of its NTFS file system. Alternate Data Streams are not shown by Windows Explorer and even the dir command and size of ADS is also excluded from the file size. The dir command however allows us to view the alternate data streams using the dir /R command in Windows Vista and above.

One use of ADS could be hiding the information as alternate data streams in the file but beware that copying the file to non-NTFS file systems will make you loose the information in the ADS. ADS was originally introduced to store file information and properties however any user can hide any kind of information in the ADS. Some malwares have utilized the ADS to hide their code so most antiviruses today also scan the ADS of any file to find anything fishy.

Note that the format used to create(and access) ADS is filename:ADSname. A relatively simple guide I had written a while ago is HERE.

Now lets move on to some interesting stuffs and for that, I am creating a directory named "samar" in Desktop. We will first create a simple text file by using the command below:

echo An ordinary text file > ads.txt

Now lets add an alternate data stream by issuing the following command:

echo I am secret > ads.txt:private.txt

Lets issue the dir command to see what it lists:

Volume in drive C has no label.
 Volume Serial Number is 90E7-CBCA

 Directory of C:\Users\SINDHUS\Desktop\samar

03/20/2012  09:58 AM    <DIR>          .
03/20/2012  09:58 AM    <DIR>          ..
03/20/2012  09:56 AM                24 ads.txt
               3 File(s)             24 bytes
               3 Dir(s)  22,683,332,608 bytes free

We can see no information regarding the alternate data stream we just added to the file and lets see if the type command shows anything by just opening the file.

C:\Users\SINDHUS\Desktop\samar>type ads.txt
An ordinary text file

So where is the private stuff we've put as ADS in the file? Even viewing the file from windows explorer does not show the content in ADS and of course the size is also not included. The point here is the malicious user might add something bad in the alternate data stream and send to a normal PC user. The unsuspecting user will not know if there's anything other than just the text file. Now lets see how we can see the alternate data stream.

For a while, lets pretend that we don't know that the ADS is added in the file. So first we will use the commands to see if there's any ADS in the file. The simplest one is to use dir /R command as below:

C:\Users\SINDHU'S\Desktop\samar>dir /R
 Volume in drive C has no label.
 Volume Serial Number is 90E7-CBCA

 Directory of C:\Users\SINDHU'S\Desktop\samar

03/20/2012  09:58 AM    <DIR>          .
03/20/2012  09:58 AM    <DIR>          ..
03/20/2012  09:56 AM                24 ads.txt
                                    14 ads.txt:private.txt:$DATA
03/20/2012  09:58 AM               496 info.txt
               2 File(s)            520 bytes
               3 Dir(s)  22,881,669,120 bytes free

We can see that besides the ads.txt file, there is another entry ads.txt:private:$DATA. By examining this file, we come to know that the alternate data stream with the name private is present in the file ads.txt and the alternate data stream is nothing but just the data. However, as stated earlier, only Vista and above contain the dir command that lets us list the alternate data streams. In such case, you can download a small utility named streams from Microsoft Technet. The streams tool also allows us to delete the ADS easily which is possible but a bit obscure for normal PC user. Now to view the content of the alternate data stream, we will use notepad:

C:\Users\SINDHUS\Desktop\samar>notepad ads.txt:private.txt

Note that this time we didn't use type command since it does not support the use of colon in the command. We used the notepad but we could also use another command known as more as below:

C:\Users\SINDHU'S\Desktop\samar>more < ads.txt:private

I am secret

The fun with ADS just does not stop here. We could do much more than this but the basic idea is same. We can embed executables and codes within the ADS and run those executable whenever necessary. I'll leave this as homework for you guys since it won't be hard to figure it out once you've understood the basics I've discussed above.

The alternate data stream has already been exploited in IIS, the primary web server from Microsoft. Following is the example I've taken from OWASP on how it could be exploited in IIS.
Normal access:
http://www.alternate-data-streams.com/default.asp Show code bypass accessing the :$DATA alternate data stream:
http://www.alternate-data-streams.com/default.asp::$DATA

Last thing I would like to discuss is how to delete the alternate data streams. The streamers tool provides a -d switch to delete the ADS and it also supports the wildcards for deleting the streams. Another way of deleting the alternate streams is to copy the file in non-NTFS drives such as to FAT32-formatted pendrives and then copying back. Of course, you could also save the content of main stream in another file and then delete the original file that consists of stream.

I hope this helps you. Please let me know if I should add something to it. :)


Read more...

Sunday, 11 March 2012

Determine Directory Size From Terminal In Linux [How To]

Sometimes you are working on command line and you want to find the total size of any directory. An instance is while working over SSh. Here is a technique on how you can determine the size of any directory from terminal.

du command lets us estimate the file space usage and can be recursively used for directories as well. This command can also be useful if you want to find the folder sizes of each subdirectories in any specified directory, something that would have been hard to achieve from the GUI.

To find the total size of a directory, use the -sch switch as below:

samar@Techgaun:~/Desktop/samar$ du -sch directory_name

The screenshot below will help you understand more clearly:


If you would like to see some more details like the size of each subdirectory, use the -hc switch as below:

samar@Techgaun:~/Desktop/samar$ du -hc directory_name

Check the screenshot below:


The du command provides more advanced stuffs such as exclusions of files and directories and depths for determining size. I hope this helps you. :)


Read more...

Finding Prime Factors Of A Number From Linux Terminal

Linux is awesome because of its powerful commanline interface from where you can do any tasks of varied complexities. I've been posting different command line tricks for system administrations & stuffs like that. But this time I am posting a little command line trick to find the prime factors of any number.


Linux provides a command factor that lets you find the prime factors of any number. So if you are into mathematics and working on prime factorizations, why worry? Just open the terminal and use the factor command.

This command takes any number of integer values as the argument and prints the prime factor of each of them. If no number is specified, it takes the value from standard input.

samar@Techgaun:~$ factor 2056 1234567

More information on factor command: The factor command makes use of the Pollard Rho algorithm which is suitable for numbers with relatively small factors. It is not quite good for computing factors of large numbers whose factors are not small values.


Read more...

Friday, 9 March 2012

Being Away and Back In All IRC Chans At Once [XChat Tip]

Hi everybody, one friend of mine asked me today if he could mark himself as away at once in all the open servers and he was using the XChat IRC client in mint and I quickly remembered the allserv and allchanl commands. So here I am sharing the quick tip on how you can mark yourself away in all the open IRC channels at once in XChat IRC client.


XChat provides two commands allserv and allchanl that can be used to run any specified command for all the open IRC servers and IRC channels of currently selected server respectively. The syntax is very simple as below:

/allserv <cmd>

The first command i.e. allserv allows to run any IRC command in all servers you have open currently. So if you want to mark yourself away, all you have to use is away command in conjunction with allserv command(fyi, away syntax is /away Reason for being away), type the command as below:

/allserv away Brb Breakfast time

Once you come back and do not need to be marked as away in all servers, enter the command as below:

/allserv back

The above allserv example is for marking you as away for all the servers. What if you want to mark yourself as away in all the channels of currently selected server? Don't worry, xchat offers allchanl for this very purpose. See the examples below using this command.

Set yourself away:

/allchanl away Brb Breakfast time

Set yourself no longer away:

/allchanl back

I hope these information help you. :)


Read more...