Data Mining and Business Intelligence Lab Manual
ISBN 9788119221509

Highlights

Notes

  

Chapter 3 Implementation of Data Mining Algorithms using WEKAPractical No 3 AIM: Implementation of different Data Mining Algorithms using WEKA. Description:

A) Introduction to WEKA Tool.

B) Implementation of different Preprocessing Techniques.

C) Implementation of Association Techniques.

D) Implementation of Classification Techniques.

E) Implementation of Clustering Techniques.

INTRODUCTION TO WEKA TOOL. 

Weka is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization.

The buttons can be used to start the following applications:

Explorer:An environment for exploring data with WEKA

Experimenter: An environment for performing experiments and conducting statistical tests between learning schemes.

Knowledge Flow: This environment supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental learning.

Simple CLI: Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line interface.

At the top of the window, just below the title bar there is a row of Tabs. These tabs are:

table-wrap

1. Preprocess

4. Associate

2. Classify

5. Select Attributes

3. Cluster

6. Visualize

Explorer

    1. Preprocess –It is used to choose and modify the data.

    2. Classify – It is used to apply classification algorithms.

    3. Cluster – Through this option we can learn different clustering algos for data.

    4. Associate – It helps us to learn association rules for data.

    5. Select Attributes – It helps us to select most relevant attributes in the data.

    6. Visualize – It helps us to view the interactive 2D plot of the data.

Loading Data

The first three buttons at the top of the preprocess section enable us to load data into WEKA:

    1. Open file Brings up a dialog box allowing us to browse for the data file on the local file system.

    2. Open URL Asks for a Uniform Resource Locator address for where the data is stored.

    3. Open DB Reads data from a database. (Note that to make this work you might have to edit the file in

    weka/experiment/DatabaseUtils.props.)

Now select Weather.arff file from data folder by choosing Open File option.

The Current Relation

Once some data has been loaded, the Preprocess panel shows a variety of information. The Current relation box (the “current relation” is the currently loaded data, which can be interpreted as a single relational table in database terminology) has three entries:

    1. Relation. The name of the relation, as given in the file it was loaded from. Filters (described below) modify the name of a relation.

    2. Instances. The number of instances (data points/records) in the data.

    3. Attributes. The number of attributes (features) in the data.

    When you click on different rows in the list of attributes, the fields change in the box to the right titled Selected attribute.

    This box displays the characteristics of the currently highlighted attribute in the list:

    Name: The name of the attribute, the same as that given in the attribute list.

    Type: The type of attribute, most commonly Nominal or Numeric.

    Missing: The number (and percentage) of instances in the data for which this attribute is missing (unspecified).

    Distinct: The number of different values that the data contains for this attribute.

    Unique: The number (and percentage) of instances in the data having a value for this attribute that no other instances have.

IMPLEMENTATION OF DIFFERENT PREPROCESSING TECHNIQUES. 

Replace Missing Values.

  • Step 1)

      a. Open weather.numeric.arff file in Word Pad

      b. Place? sign in 2 or 3 places instead of numeric values

      c. Save it as weather.numeric_Missing.arff

  • Step 2)

      a. Open weather_numeric.arff filein Weka

      b. From weka.filters (i.e. from Choose option)

      Select unsupervised Attribute ReplaceMissingValues

      c. After this click on Apply

      d. Then save it as weather_numeric_Replaced.arff

      e. Again open file weather_numeric_Replaced.arffin wordpad

Before Replacing [weather_numeric_Missing.arff]

After applying attribute

After Replacing [weather_numeric_Replaced.arff]

After Replacing [weather_numeric_Replaced.arff]

B) Discretize: Discretization is the process of transferring continuous models into discrete counterparts.

Step 1) Open weather.arff file in Weka.

Step 2) Select Choose button and from filters select Discretize

Step 3) Now Click on this.

Then the following screen will be displayed and do the changes that are shown here

After clicking on OK click on Apply, we will get following screen

Now click on Visualize All we will get following screen

C) Add: [To add a new attribute to the dataset]

Step 1) Open weather.arff filein Weka

From weka.filters (i.e. from Choose option) Select unsupervised-> Attribute -> Add

After this click on Apply – it will show us following screen

Step 2)

To set the attribute name – right click on Add that is written besides Choose button. Then following screen will appear

Type Color as an attribute name, then click on OK and then Apply.

We will get the screen like this

IMPLEMENTATION OF ASSOCIATION TECHNIQUES. 

A) Create an.ARFF (Attribute Relation File Format) file and find association rule with 30% support and 80% confidence for the following data:

table-wrap

TransId

Items

1

milk, egg, bread, chip

2

egg, popcorn, chip, beer

3

egg, bread, chip

4

milk, egg, bread, popcorn, chip, beer

5

milk, bread, beer

6

egg, bread, beer

7

milk, bread, chip

8

milk, egg, bread, butter, chip

9

milk, egg, butter, chip

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as ass2.arff.

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Associate tab and then click on choose file and select Apriori.

Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.

Step 5) Click on start button.

B) Create an.ARFF (Attribute Relation File Format) file and find association rule using apriory algorithm with 50% support and 75% confidence for the following data.

table-wrap

TransId

Items

1

Laptop, Mobile, Memory card, Card reader

2

Laptop, Mobile, Card reader

3

Laptop, digi cam, LCD TV

4

Laptop, Card reader, digi cam

5

Mobile, Card reader, digi cam

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as testweka.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Associate tab and then click on choose file and select Apriori.

Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.

Step 5) Click on start button.

For the above dataset find association rule using apriory algorithm with support =40% and confidence=75%.

IMPLEMENTATION OF CLASSIFICATION TECHNIQUES. 

A) Create an.ARFF (Attribute Relation File Format) file and construct decision tree for the following data.

table-wrap

RID

Age

Income

Student

Credit-rating

Class

1

Youth

High

No

Fair

No

2

Youth

High

No

Excellent

No

3

Middle-age

High

No

Fair

Yes

4

Senior

Medium

No

Fair

Yes

5

Senior

Low

Yes

Fair

Yes

6

Senior

Low

Yes

Excellent

No

7

Middle-age

Low

Yes

Excellent

Yes

8

Youth

Medium

No

Fair

No

9

Youth

Low

Yes

Fair

Yes

10

Senior

Medium

Yes

Fair

Yes

11

Youth

Medium

Yes

Excellent

Yes

12

Middle-age

Medium

No

Excellent

Yes

13

Middle-age

High

Yes

Fair

Yes

14

Senior

Medium

No

Excellent

No

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as electronics.arff.

  • @relation electronics
  • @attribute age{youth,middle_age,senior}
  • @attribute income{high,medium,low}
  • @attribute student{yes,no}
  • @attribute credit{fair,excellent}
  • @attribute class{yes,no}
  • @data
  • youth,high,no,fair,no youth,high,no,excellent,no middle_age,high,no,fair,yes senior,medium,no,fair,yes senior,low,yes,fair,yes senior,low,yes,excellent,no middle_age,low,yes,excellent,yes youth,medium,no,fair,no youth,low,yes,fair,yes senior,medium,yes,fair,yes youth,medium,yes,excellent,yes middle_age,medium,no,excellent,yes middle_age,high,yes,fair,yes senior,medium,no,excellent,noStep 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Classify tab and then click on choose file and under tree select J48.Step 4) Click On Start.

Step 5) Right click on J48 tree and select visualize tree.

B) Create an .ARFF file and construct decision tree for the following data.

table-wrap

Name

Gender

Height

Output

Anil

Male

2.0 m

Short

Ankit

Male

2.1 m

Short

Priya

Female

3.1 m

Tall

Ankita

Female

2.6 m

Medium

Anand

Male

3.0 m

Tall

Ganesh

Male

2.7 m

Medium

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as student.arff.

  • @relation student
  • @attribute gender{male,female}
  • @attribute height real
  • @attribute output{short,tall,medium}
  • @data
  • male,2.0,short
  • male,2.1,short
  • female,3.1,tall
  • female,2.6,medium
  • male,3.0,tall
  • male,2.7,medium

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Classify tab and then click on choose file and under tree select J48.

Step 5) Right click on J48 tree and select visualize tree.

IMPLEMENTATION OF CLUSTRING TECHNIQUES 

A) Create following two.ARFF (Attribute Relation File Format) files and implement agglomerative algorithm using single, complete link and average link method for all three dataset.

table-wrap

A

B

C

D

E

A

0

1

2

2

3

B

1

0

2

4

3

C

2

2

0

1

5

D

2

4

1

0

3

E

3

3

5

3

0

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as weka1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select Hierarchical Clusterer.

Step 4) Click on editor box of Hierarchical Clusterer and select the link type as SINGLE link.

Step 5) Click on Start.

Step 6) Right click on Hierarchical Clusterer and select visualize tree.

Step 7) Now Change the link type as Complete link and then click on start.

Step 8) Visualize the tree.

Step 9) Now Change the link type as Average link and then click on start.

Step 8) Visualize the tree.

B) Create following two.ARFF (Attribute Relation File Format) files and implement K-mean one dimension algorithm using.

15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select SimpleKmean.

Step 4) Click on editor box of Simplekmean and set the no of cluster as 2.

Step 5) Click on start button.

Step 6) Click on Visualize tab.

C) Create following .ARFF (Attribute Relation File Format) files and implement K-mean cluster.

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select SimpleKmean.

Step 5) Click on start button.

Step 6) Click on Visualize tab.

D) Perform EM Clustering process by using weka tool.

table-wrap

Person

Weight

Height

A

102

147

B

130

162

C

111

147

D

170

182

E

175

180

F

132

157

Step 1) Creating ARFF file: Open Notepad and type the following code and save it as emcluster.arff

  • @relation emcluster
  • @attribute weight real
  • @attribute height real
  • @data
  • 102,147
  • 130,162
  • 111,147
  • 170,182
  • 175,180
  • 132,157

Step 2) Open this file in WEKA by clicking on open file button.

Step 3) click on Cluster tab and under that select EM.

Step 5) Click on start button.

Step 6) Click on Visualize tab.