Data Mining and Business Intelligence Lab Manual

A) INTRODUCTION TO WEKA TOOL. 

Weka is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization.

The buttons can be used to start the following applications:

Explorer:An environment for exploring data with WEKA

Experimenter: An environment for performing experiments and conducting statistical tests between learning schemes.

Knowledge Flow: This environment supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental learning.

Simple CLI: Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line interface.

At the top of the window, just below the title bar there is a row of Tabs. These tabs are:

1. Preprocess	4. Associate
2. Classify	5. Select Attributes
3. Cluster	6. Visualize

Explorer

1. Preprocess –It is used to choose and modify the data.

2. Classify – It is used to apply classification algorithms.

3. Cluster – Through this option we can learn different clustering algos for data.

4. Associate – It helps us to learn association rules for data.

5. Select Attributes – It helps us to select most relevant attributes in the data.

6. Visualize – It helps us to view the interactive 2D plot of the data.

Loading Data

The first three buttons at the top of the preprocess section enable us to load data into WEKA:

1. Open file Brings up a dialog box allowing us to browse for the data file on the local file system.

2. Open URL Asks for a Uniform Resource Locator address for where the data is stored.

3. Open DB Reads data from a database. (Note that to make this work you might have to edit the file in

weka/experiment/DatabaseUtils.props.)

Now select Weather.arff file from data folder by choosing Open File option.

The Current Relation

Once some data has been loaded, the Preprocess panel shows a variety of information. The Current relation box (the “current relation” is the currently loaded data, which can be interpreted as a single relational table in database terminology) has three entries:

1. Relation. The name of the relation, as given in the file it was loaded from. Filters (described below) modify the name of a relation.

2. Instances. The number of instances (data points/records) in the data.

3. Attributes. The number of attributes (features) in the data.

When you click on different rows in the list of attributes, the fields change in the box to the right titled Selected attribute.

This box displays the characteristics of the currently highlighted attribute in the list:

Name: The name of the attribute, the same as that given in the attribute list.

Type: The type of attribute, most commonly Nominal or Numeric.

Missing: The number (and percentage) of instances in the data for which this attribute is missing (unspecified).

Distinct: The number of different values that the data contains for this attribute.

Unique: The number (and percentage) of instances in the data having a value for this attribute that no other instances have.

B) IMPLEMENTATION OF DIFFERENT PREPROCESSING TECHNIQUES. 

A) Replace Missing Values.

Step 1)

a. Open weather.numeric.arff file in Word Pad

b. Place? sign in 2 or 3 places instead of numeric values

c. Save it as weather.numeric_Missing.arff

Step 2)

a. Open weather_numeric.arff filein Weka

b. From weka.filters (i.e. from Choose option)

Select unsupervised Attribute ReplaceMissingValues

c. After this click on Apply

d. Then save it as weather_numeric_Replaced.arff

e. Again open file weather_numeric_Replaced.arffin wordpad

Before Replacing [weather_numeric_Missing.arff]

After applying attribute

After Replacing [weather_numeric_Replaced.arff]

B) Discretize: Discretization is the process of transferring continuous models into discrete counterparts.

Step 1) Open weather.arff file in Weka.

Step 2) Select Choose button and from filters select Discretize

Step 3) Now Click on this.

Then the following screen will be displayed and do the changes that are shown here

After clicking on OK click on Apply, we will get following screen

Now click on Visualize All we will get following screen

C) Add: [To add a new attribute to the dataset]

Step 1) Open weather.arff filein Weka

From weka.filters (i.e. from Choose option) Select unsupervised-> Attribute -> Add

After this click on Apply – it will show us following screen

Step 2)

To set the attribute name – right click on Add that is written besides Choose button. Then following screen will appear

Type Color as an attribute name, then click on OK and then Apply.

We will get the screen like this

C) IMPLEMENTATION OF ASSOCIATION TECHNIQUES. 

A) Create an.ARFF (Attribute Relation File Format) file and find association rule with 30% support and 80% confidence for the following data:

TransId	Items
1	milk, egg, bread, chip
2	egg, popcorn, chip, beer
3	egg, bread, chip
4	milk, egg, bread, popcorn, chip, beer
5	milk, bread, beer
6	egg, bread, beer
7	milk, bread, chip
8	milk, egg, bread, butter, chip
9	milk, egg, butter, chip

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as ass2.arff.

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Associate tab and then click on choose file and select Apriori.

Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.

Step 5) Click on start button.

B) Create an.ARFF (Attribute Relation File Format) file and find association rule using apriory algorithm with 50% support and 75% confidence for the following data.

TransId	Items
1	Laptop, Mobile, Memory card, Card reader
2	Laptop, Mobile, Card reader
3	Laptop, digi cam, LCD TV
4	Laptop, Card reader, digi cam
5	Mobile, Card reader, digi cam

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as testweka.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Associate tab and then click on choose file and select Apriori.

Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.

Step 5) Click on start button.

For the above dataset find association rule using apriory algorithm with support =40% and confidence=75%.

D) IMPLEMENTATION OF CLASSIFICATION TECHNIQUES. 

A) Create an.ARFF (Attribute Relation File Format) file and construct decision tree for the following data.

RID	Age	Income	Student	Credit-rating	Class
1	Youth	High	No	Fair	No
2	Youth	High	No	Excellent	No
3	Middle-age	High	No	Fair	Yes
4	Senior	Medium	No	Fair	Yes
5	Senior	Low	Yes	Fair	Yes
6	Senior	Low	Yes	Excellent	No
7	Middle-age	Low	Yes	Excellent	Yes
8	Youth	Medium	No	Fair	No
9	Youth	Low	Yes	Fair	Yes
10	Senior	Medium	Yes	Fair	Yes
11	Youth	Medium	Yes	Excellent	Yes
12	Middle-age	Medium	No	Excellent	Yes
13	Middle-age	High	Yes	Fair	Yes
14	Senior	Medium	No	Excellent	No

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as electronics.arff.

@relation electronics

@attribute age{youth,middle_age,senior}

@attribute income{high,medium,low}

@attribute student{yes,no}

@attribute credit{fair,excellent}

@attribute class{yes,no}

@data

youth,high,no,fair,no youth,high,no,excellent,no middle_age,high,no,fair,yes senior,medium,no,fair,yes senior,low,yes,fair,yes senior,low,yes,excellent,no middle_age,low,yes,excellent,yes youth,medium,no,fair,no youth,low,yes,fair,yes senior,medium,yes,fair,yes youth,medium,yes,excellent,yes middle_age,medium,no,excellent,yes middle_age,high,yes,fair,yes senior,medium,no,excellent,noStep 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Classify tab and then click on choose file and under tree select J48.Step 4) Click On Start.

Step 5) Right click on J48 tree and select visualize tree.

B) Create an .ARFF file and construct decision tree for the following data.

Name	Gender	Height	Output
Anil	Male	2.0 m	Short
Ankit	Male	2.1 m	Short
Priya	Female	3.1 m	Tall
Ankita	Female	2.6 m	Medium
Anand	Male	3.0 m	Tall
Ganesh	Male	2.7 m	Medium

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as student.arff.

@relation student

@attribute gender{male,female}

@attribute height real

@attribute output{short,tall,medium}

@data

male,2.0,short

male,2.1,short

female,3.1,tall

female,2.6,medium

male,3.0,tall

male,2.7,medium

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Classify tab and then click on choose file and under tree select J48.

Step 5) Right click on J48 tree and select visualize tree.

E) IMPLEMENTATION OF CLUSTRING TECHNIQUES 

A) Create following two.ARFF (Attribute Relation File Format) files and implement agglomerative algorithm using single, complete link and average link method for all three dataset.

	A	B	C	D	E
A	0	1	2	2	3
B	1	0	2	4	3
C	2	2	0	1	5
D	2	4	1	0	3
E	3	3	5	3	0

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as weka1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select Hierarchical Clusterer.

Step 4) Click on editor box of Hierarchical Clusterer and select the link type as SINGLE link.

Step 5) Click on Start.

Step 6) Right click on Hierarchical Clusterer and select visualize tree.

Step 7) Now Change the link type as Complete link and then click on start.

Step 8) Visualize the tree.

Step 9) Now Change the link type as Average link and then click on start.

Step 8) Visualize the tree.

B) Create following two.ARFF (Attribute Relation File Format) files and implement K-mean one dimension algorithm using.

15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select SimpleKmean.

Step 4) Click on editor box of Simplekmean and set the no of cluster as 2.

Step 5) Click on start button.

Step 6) Click on Visualize tab.

C) Create following .ARFF (Attribute Relation File Format) files and implement K-mean cluster.

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select SimpleKmean.

Step 5) Click on start button.

Step 6) Click on Visualize tab.

D) Perform EM Clustering process by using weka tool.

Person	Weight	Height
A	102	147
B	130	162
C	111	147
D	170	182
E	175	180
F	132	157

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as emcluster.arff

@relation emcluster

@attribute weight real

@attribute height real

@data

102,147

130,162

111,147

170,182

175,180

132,157

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select EM.

Step 5) Click on start button.

Step 6) Click on Visualize tab.

“On-The-Spot” serration	Creates its own serration — no need for preformed serrations in the channel
Mechanical connection	Greater connection reliability, shear capacity equal to MQN, and no reliance on friction

Highlights

Notes

Chapter 3 Implementation of Data Mining Algorithms using WEKAPractical No 3 AIM: Implementation of different Data Mining Algorithms using WEKA. Description:

A) INTRODUCTION TO WEKA TOOL. 

B) IMPLEMENTATION OF DIFFERENT PREPROCESSING TECHNIQUES. 

A) Replace Missing Values.

C) IMPLEMENTATION OF ASSOCIATION TECHNIQUES. 

D) IMPLEMENTATION OF CLASSIFICATION TECHNIQUES. 

E) IMPLEMENTATION OF CLUSTRING TECHNIQUES 