Tuesday, July 20, 2010

WGET

I don't have an internet connection @ home. I still want to browse websites. That leaves with only two options. Either Hack some wifi hotspot(That I'm seriously trying to), or download complete websites.

I've seriously tried HTTrack b4 for downloading websites.(http://www.httrack.com/), but it wasn't a great option, since it is can't get scripted websites.
I tried WGET today which I found in yahoo answers.

wget is the GNUs answer to downloading websites. Its simply awesome.
First I thought of compiling it in windows, which I'd figure out later. But you get the vs realese exe here (http://users.ugent.be/~bpuype/wget/)


I tried.. wget --help and found the following command useful for downloading the complete website

wget -A "*.exe" -r "http://nehe.gamedev.net/"

Monday, January 26, 2009

Hadoop pseudo distributed mode in Win Xp

Hadoop is an utility that works on Grids..
Grid is just a group of computers that are connected normally(Internet or LAN nothing HIFI) and are willing to install a few software..
Hadoop is one of them.

Say if you have the task to be done on large data.. it splits the data in to many parts and does it in many connected computers and gives you results..

Hadoop just the splittin and combining part. You have to write your code..
If you find it interesting visit websites and blogs on
Hadoop, Grid Computing, Distributed Computing,etc.

I'll speak only about configuring Hadoop
It took me one day to do the configuration..

Prerequisites..

1. Cygwin with openSSH(server) and EMACS(or unix based editor)
2. Hadoop Latest Version
3. Java SE JDK
Step 1.

Install cygwin with openSSH. There are many sites that tell about this.
Univ Page
might be useful for setting up openSSH .
Make sure that you have a text editor.

Hayes Davis' page
has detailed instructions. Hope You're back.

Step 2.
UnTar the install that you downloaded from hadoop's site.

go to conf\hadoop-env.sh using cygwin. Use a UNIX Based Editor(To Avoid all the trouble) and
Read the line JAVA_HOME 13 th line i think
JAVA_HOME = /cygdrive/c/"Program Files"/java1.5
Change it according to ur java path
remove the # in front

Now save the file and ..
do ssh-host-config

Should privilege separation be used?", to save tons of trouble I recommend you to answer "no"

might or might not pause and prompt you to enter a value for "CYGWIN=", enter "ntsec tty"

start sshd sometimes it starts with ssh

/usr/sbin/sshd
/usr/sbin/sshd

then do

ssh localhost


If it asks for a passphrase to ssh to the localhost, press "ctrl + c" and type the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

You're done configuring ssh.. type
exit
to come out for some time from sever mode..
Step 3

You're almost done..
There are three modes in Hadoop


1. Standalone
Easiest and fastest for beginners..
It just works after what you've done

Try an example


$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

If it shows JAVA_HOME not found then.. Plz check the path and use a UNIX based editor.


2. Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

For this change the conf/hadoop-site.xml(Replace Having a backup ) with


Configuration
Use the following conf/hadoop-site.xml:



fs.default.name
hdfs://localhost:9000


mapred.job.tracker
localhost:9001


dfs.replication
1



Now you're almost ready There.. But its essential to setup a master and slave key values

$> scp ~/.ssh/id_dsa.pub @localhost:~/.ssh/master-key.pub
$> cat ~/.ssh/master-key.pub >> ~/.ssh/authorized_keys

$> scp ~/.ssh/id_dsa.pub @localhost:~/.ssh/slave-key.pub
$> cat ~/.ssh/master-key.pub >> ~/.ssh/authorized_keys

Substitute masterusername and slaveusername with ur account name in winxp with admin privileges....

Now you're ready to go..

do

ssh master/slaverusername@localhost

If you're not connected do

/usr/sbin/sshd
then
ssh master/slaverusername@localhost

I hope you're connected...

Bingo .. u're ready to excecute your first example in psuedo cluster in win XP

Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh


If you've a problem after u restart your system.. Delete the tmp director in the root of your hadoop installation and format a namenode again...

3. Distributed mode in WINXP..

Hayes davis site is good.. It was where i figured out pseudo-distributed mode operation..


Happy Map-Reduce...

Plz post comments on any mistakes or something left out..


Acknowlegments:

Hayes Davis
Brandies Univ Site

Friday, January 23, 2009

Wednesday, January 14, 2009

Redhat's commercial

Sensational video take a look

Redhat Serious commercial

Open Source is cool

Recently, I had the opportune of attending classes on Hadoop organised in our college.
The architecture of hadoop is fairly simple and understandable. But building an application that runs on hadoop is requires some practice.

Two years back the white board was blank.. Now there are phrases. But every phrase took the lifetime of someone for development.
There are so many good people all around the world willing to share.

Its logical to go for it if so many are willing and a person is inspired.


OpenSource is cool, And I'm serious that its my hobby..(:) Finally I discovered one)..


Truth Happens ..

Thursday, November 27, 2008

HUWAEI EC325 Fedora 8 Works or My Thr

I tried in Ubuntu With GUI tools as well as changing wbconfig.conf file. Things dint work till I realised that PPP0 was taken by the Ethernet Connection that I had at home. Though I get Some but I don get most of the jargon that I am trying to convey.

I came across a manual which had install instructions for installing the device in Fedora core n one of my frns had a fedora distro

My fedora release is code-named psyche. The install was 30 min. I could configure the Network Manager for My USB modem using a manual. Pretty simple N GUI.

My modem works fine. I need to get a network monitor. Fedora is great. I acquired rpms for codeblocks ide,real player, gnomechm, mplayer, xine codecs . I have everything I want.

Must try GIMP sometime,its cool with its features n hot..(people are using it more) I always liked art. I'm doin fine with fedora, and it does good to me.

In a free world without fences, who needs gates.

Wednesday, November 5, 2008

HUAWEI EC325 Linux 1

I tried using it in UBUNTU Hardy, by changing network config in network manager.
Doesn't show up USB based connection. Have to figure a way out. Hope ubuntuforums.com would work out. I've posted a modest question there.

My first attempt to move to opensource. I rejoice my try. Join if you care.



You have to put in many, many, many tiny efforts that nobody sees or appreciates before you achieve anything worthwhile. -- Brian Tracy