A) Introduction to WEKA Tool.
B) Implementation of different Preprocessing Techniques.
C) Implementation of Association Techniques.
D) Implementation of Classification Techniques.
E) Implementation of Clustering Techniques.
Weka is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization.
The buttons can be used to start the following applications:
Explorer:An environment for exploring data with WEKA
Experimenter: An environment for performing experiments and conducting statistical tests between learning schemes.
Knowledge Flow: This environment supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental learning.
Simple CLI: Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line interface.
At the top of the window, just below the title bar there is a row of Tabs. These tabs are:
Explorer
1. Preprocess –It is used to choose and modify the data.
2. Classify – It is used to apply classification algorithms.
3. Cluster – Through this option we can learn different clustering algos for data.
4. Associate – It helps us to learn association rules for data.
5. Select Attributes – It helps us to select most relevant attributes in the data.
6. Visualize – It helps us to view the interactive 2D plot of the data.
Loading Data
The first three buttons at the top of the preprocess section enable us to load data into WEKA:
1. Open file Brings up a dialog box allowing us to browse for the data file on the local file system.
2. Open URL Asks for a Uniform Resource Locator address for where the data is stored.
3. Open DB Reads data from a database. (Note that to make this work you might have to edit the file in
weka/experiment/DatabaseUtils.props.)
Now select Weather.arff file from data folder by choosing Open File option.
The Current Relation
Once some data has been loaded, the Preprocess panel shows a variety of information. The Current relation box (the “current relation” is the currently loaded data, which can be interpreted as a single relational table in database terminology) has three entries:
1. Relation. The name of the relation, as given in the file it was loaded from. Filters (described below) modify the name of a relation.
2. Instances. The number of instances (data points/records) in the data.
3. Attributes. The number of attributes (features) in the data.
When you click on different rows in the list of attributes, the fields change in the box to the right titled Selected attribute.
This box displays the characteristics of the currently highlighted attribute in the list:
Name: The name of the attribute, the same as that given in the attribute list.
Type: The type of attribute, most commonly Nominal or Numeric.
Missing: The number (and percentage) of instances in the data for which this attribute is missing (unspecified).
Distinct: The number of different values that the data contains for this attribute.
Unique: The number (and percentage) of instances in the data having a value for this attribute that no other instances have.
a. Open weather.numeric.arff file in Word Pad
b. Place? sign in 2 or 3 places instead of numeric values
c. Save it as weather.numeric_Missing.arff
a. Open weather_numeric.arff filein Weka
b. From weka.filters (i.e. from Choose option)
Select unsupervised Attribute ReplaceMissingValues
c. After this click on Apply
d. Then save it as weather_numeric_Replaced.arff
e. Again open file weather_numeric_Replaced.arffin wordpad
Before Replacing [weather_numeric_Missing.arff]
After Replacing [weather_numeric_Replaced.arff]
After Replacing [weather_numeric_Replaced.arff]
B) Discretize: Discretization is the process of transferring continuous models into discrete counterparts.
Step 1) Open weather.arff file in Weka.
Step 2) Select Choose button and from filters select Discretize
Step 3) Now Click on this.
Then the following screen will be displayed and do the changes that are shown here
After clicking on OK click on Apply, we will get following screen
Now click on Visualize All we will get following screen
C) Add: [To add a new attribute to the dataset]
Step 1) Open weather.arff filein Weka
From weka.filters (i.e. from Choose option) Select unsupervised-> Attribute -> Add
After this click on Apply – it will show us following screen
To set the attribute name – right click on Add that is written besides Choose button. Then following screen will appear
Type Color as an attribute name, then click on OK and then Apply.
We will get the screen like this
A) Create an.ARFF (Attribute Relation File Format) file and find association rule with 30% support and 80% confidence for the following data:
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as ass2.arff.
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Associate tab and then click on choose file and select Apriori.
Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.
Step 5) Click on start button.
B) Create an.ARFF (Attribute Relation File Format) file and find association rule using apriory algorithm with 50% support and 75% confidence for the following data.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as testweka.arff
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Associate tab and then click on choose file and select Apriori.
Step 4) Set the minimum support to 0.3 and confidence to 0.8 by clicking on editor box of apriori.
Step 5) Click on start button.
For the above dataset find association rule using apriory algorithm with support =40% and confidence=75%.
A) Create an.ARFF (Attribute Relation File Format) file and construct decision tree for the following data.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as electronics.arff.
Step 3) click on Classify tab and then click on choose file and under tree select J48.Step 4) Click On Start.
Step 5) Right click on J48 tree and select visualize tree.
B) Create an .ARFF file and construct decision tree for the following data.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as student.arff.
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Classify tab and then click on choose file and under tree select J48.
Step 5) Right click on J48 tree and select visualize tree.
A) Create following two.ARFF (Attribute Relation File Format) files and implement agglomerative algorithm using single, complete link and average link method for all three dataset.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as weka1.arff
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Cluster tab and under that select Hierarchical Clusterer.
Step 4) Click on editor box of Hierarchical Clusterer and select the link type as SINGLE link.
Step 5) Click on Start.
Step 6) Right click on Hierarchical Clusterer and select visualize tree.
Step 7) Now Change the link type as Complete link and then click on start.
Step 9) Now Change the link type as Average link and then click on start.
Step 8) Visualize the tree.
B) Create following two.ARFF (Attribute Relation File Format) files and implement K-mean one dimension algorithm using.
15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Cluster tab and under that select SimpleKmean.
Step 4) Click on editor box of Simplekmean and set the no of cluster as 2.
Step 5) Click on start button.
Step 6) Click on Visualize tab.
C) Create following .ARFF (Attribute Relation File Format) files and implement K-mean cluster.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as kmean1.arff
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Cluster tab and under that select SimpleKmean.
Step 5) Click on start button.
Step 6) Click on Visualize tab.
D) Perform EM Clustering process by using weka tool.
Step 1) Creating ARFF file: Open Notepad and type the following code and save it as emcluster.arff
Step 2) Open this file in WEKA by clicking on open file button.
Step 3) click on Cluster tab and under that select EM.
Step 5) Click on start button.
Step 6) Click on Visualize tab.