1 INTRODUCTION
1.1 WELCOME TO SUPERQUERY
1.2 FEATURE SUMMARY
1.3 SYSTEM REQUIREMENTS
1.4 INSTALLATION
1.5 SERVICE AND SUPPORT
2
GETTING STARTED
2.1 STARTING SUPERQUERY
2.2 READING YOUR DATA
2.3 BROWSING YOUR DATA
2.4 UPDATING YOUR DATA
2.5 EXPORTING YOUR DATA
3
ORGANIZING YOUR DATA
3.1 TABS
3.2 VIEWS
3.3 PANES
3.3.1 DATA TABLE
PANE
3.3.2 DATA
PAGE PANE
3.3.3 REPS TABLE
PANE
3.3.4 REPS GRAPH
PANE
3.3.5 DATA GRAPH
PANE
3.3.6 TOTAL PAGE
PANE
3.3.7 TOTAL
TABLE PANE
3.3.8 NOTES PANE
3.3.9 DATA
CELL PANE
3.3.10 FACTS PANE
3.4 QUICK FILTERS
3.5 INDEXING YOUR DATA
4
WORKING WITH MULTIPLE TABLES
4.1
LINKING TABLES
4.2
BROWSING MULTIPLE DATA VIEWS
5
ANALYZING YOUR DATA
5.1
STATISTICAL ANALYSIS
5.2
MAPPING NUMERICAL DATA INTO RANGES
5.3
MATHEMATICAL AND LOGICAL
CALCULATIONS
5.4
COMPREHENSIVE QUERIES
USING THE FILTER COLUMN
5.5
CLASSIFYING DATA
USING THE CLASSIFICATION COLUMN
5.6
ANALYZING TEXTUAL
DATA USING THE KEYWORD COLUMN
6
SUMMARIZING YOUR DATA
6.1
WHAT IS THE SUMMARY TABLE?
6.2
SUMMARY OPERATORS
6.3
CREATING A SUMMARY TABLE
7
DATA MINING --
DISCOVERING FACTS AND EXCEPTIONS
7.1
PREPARING THE DATA FOR MINING
7.2
FACT DISCOVERY ENGINE
FOR THE DISCOVERY EDITION
7.2.1 RUNNING THE FACT DISCOVERY
ENGINE
7.2.2 TUNING THE FACT ENGINE
7.2.2.1 FINE-TUNE THE
FACT-FINDING PERFORMANCE
7.2.2.2 IGNORE DEPENDENT COLUMNS
7.2.2.3 FILTER IMPLIED FACTS
1.1 Welcome to SuperQuery
SuperQuery is a data analysis and mining tool that runs under the Windows. Using rule induction technology, SuperQuery searches data tables and reports all interesting patterns and exceptions. The Fact Discovery Engine is easily tuned to meet various analysis needs.
SuperQuery also assists in preparing data for analysis by providing a number of facilities for partitioning, classifying and processing data columns. In addition, SuperQuery helps to explore and analyze data by automatically displaying graphs and calculating statistics. It contains a number of Wizards that help read, update, and analyze data effortlessly. SuperQuery can access and query a number of databases, spreadsheets, text files directly, and through ODBC drivers .
1. Asking Questions -- Visual Query SuperQuery helps you read the entire database or any part of it. You can specify criteria, and/or scope. For example: you can just read those records that contain data for Year = 96. You can also read a random sample of your data. Once the data is read in SuperQuery, you can easily create quick filters, indexes, and choose and order columns into views.
A view in SuperQuery is a window of your data. A view displays a group of columns that you select. It shows all rows that satisfy any filter you set, in the order you specify.
Furthermore, a view consists of a number of panes. Each pane displays the data in a different format. Table Pane displays data in a table, Page Pane displays one row in a page form, Reps Pane that displays the frequency (i.e. how common each value).
For example: If you are looking at a column that has TRUE and FALSE values, the Reps pane will show you how many TRUE’s and FALSE’s are in that column. There are also Graphics Panes that display Values or Reps of the current column in a bar graph. SuperQuery has also Total Panes that display any set of calculations like total, maximum, average, standard deviation, etc.
You do not need to do anything, SuperQuery will do it automatically. Each view is pre-configured with a set of panes suitable to its contents. You just look at the view and resize the panes to fit your needs. If you want, you can change the type of any pane and change its configuration. SuperQuery is very flexible and easy to customize.
As you scroll through or set filters on your table, all associated views recalculate and redisplay statistics and graphs, automatically. For example: If you set a filter on one category of products, all statistics and graphs are recalculated for this category. If you scroll to the "order amount column", you will see Min, Max, Total, and Average of orders for this product category. In addition graphs are updated to show the top orders and most common order amounts.
2. Analyzing Data SuperQuery also helps you create Calculated Columns. These are Virtual Columns that are calculated from other columns. You can’t make a syntax error, SuperQuery calculations are entered using simple dialogs. Furthermore, SuperQuery provides additional types of virtual columns. These are analysis-oriented short cuts to an otherwise complicated Calculated column. There are Range, Filter, Classification and Keyword columns.
The Range column is an easy way to divide a numerical column into ranges. For example you can divide sales into 3 levels; High, Medium, and Low. This is useful in setting filters on these ranges and looking at Graphs, and/or Total panes. This is also very helpful in discovering patterns and facts on numerical columns. You can create this example with just 3 mouse clicks.
The Filter column is a virtual column that contains TRUE or FALSE values depending on its criteria. You can create complex criteria using one or more Filter columns. This makes your criteria more modular and easier to understand.
The Classification column is a virtual column that simplifies categorizing your data. This helps you study each individual class and identify patterns among those classes. Suppose you want to study your new orders starting from 1996. You can classify your orders into 4 categories; Large, Medium, Small and Old. You can create a Classification column that has 4 values and 4 criteria or conditions as follows:
Classification Column Criteria This gives you the ability to see graphs and statistics for each class of your orders and run the fact discovery engine.
The Keyword column is a shortcut for extracting keywords from a "description" column. For example, you can create a Keyword column on "Services Description" to identify your entire client rows with the word "hardware" in their service description. You can then study those clients and identify any patterns they share.
All Data, Graphs and Total panes are automatically prepared for any new virtual column.
Another Important analysis facility is the Summary Table. Suppose you would like to see the number of orders of Large, Medium, Small and Old orders. Suppose also that you would like to see Total, Average, and other statistics for each order class. You can create a summary table that groups your orders by the Order_Class column. The summary table may include: Count of orders, Total Amount, Average Amount, etc. This summary table will look like this:
Summary Table Example
You can group your data on more than one column. For example: on Year, then State, then Product Type. On the other hand, for each cell in the summary table, you can see all the details.
Double click on any cell and a view that shows the details is created. This kind of analysis is also known as Multi-Dimensional Analysis and Drill Down.
Summary tables are normal tables that you can add virtual columns to, summarize, and search for facts. Also their views can contain all data, graphics and total panes.
Viewing related tables. SuperQuery helps you analyze any number of tables quickly and easily. After defining primary and foreign keys in your tables, you can tell SuperQuery to synchronize your data views. Suppose you have two views: Customers, and Orders. If you scroll in Orders, the Customers view will automatically scroll to show you the customer details. On the other hand, if you scroll in the Customer view, SuperQuery will automatically set a filter on the Orders view to show you all the orders of the current customer. So as you look at each customer record, you also see all their orders. Furthermore, if you have total and graph panes open, you will also see all the statistics and graphs on those orders AUTOMATICALLY. The more screen real estate you give SuperQuery, the more information you have at your fingertips. It is that simple!
3. Discovering Facts The ultimate form of intelligence is Inference. The simplest form of inference is to:
· observe patterns and similarities among objects or concepts then, · create general theories and exceptions.
By observing that many individual birds can fly, you can create a theory or a Fact that: All birds can fly. This allows you to create a class of living beings called Birds and associate the property Fly to the class rather than to each individual bird. We are probably programmed to do this, just to save memory in our brain! However, once you have a class, you can search for and expect more similarities among its members. Notice that most theories have exceptions -- there are some birds that can not fly.
What are SuperQuery Facts and Exceptions? SuperQuery scans your table and searches for patterns. For example, it could find that all you Large orders have the same month -- December. SuperQuery considers this a "Fact" and reports it. A Fact may have exceptions, so SuperQuery will still report that fact even if only 80% of your Large orders happens in December. SuperQuery differentiates between "All" and "Most" type facts. Furthermore, SuperQuery reports the exact percentage (e.g. 80%). Facts can be seen as rules that govern your data, or patterns that exist among your objects.
SuperQuery also reports any exceptions to those patterns. For example, if only 5% of your orders are paid in cash, then SuperQuery reports this as an exception. Furthermore, exceptions may be errors in your database. This means that you can use SuperQuery to improve the quality of data.
How to hunt down those FACTS using SuperQuery? There are more opportunities for finding patterns in a table if it contains columns that have repeated values. For example columns like Year, Month, State, Season, Class, Color, etc. If your data is mostly numeric and has no repeated values, then you have to classify it into Levels, Ranges, or Categories. To simplify this process, SuperQuery has the "Range Virtual Column" that you can create with minimum effort. Here are the steps to help hunt down those interesting facts:
· Create Range columns on your numerical columns. · Create Classification columns to categorize your data entities into interesting classes ( e.g. Clients, Orders, Products, etc. ). · Create Filter columns · Create Summaries on useful categories. · Fire the Fact engine on the data tables as well as the summary tables.
SuperQuery saves you a lot of time and gives you a head start by doing its systematic search for facts. You start your search for patterns from a Factbase rather than from a database. After all, that is what computers are good for.
4. Organizing Views While conducting your analysis, you create many interesting views on your data and you need to be able to organize these views and find them efficiently. You also need to present your findings in an easy to use medium.
SuperQuery allows you to create tabs and folders with Titles and different Color Codes. Furthermore, you can create tabs within tabs. For example, you can create a tab for each year and in each year you can have tabs for each product type.
|
||||||||||||||||||||||||||||||||||||||||||
1.2 Feature Summary
|
||||||||||||||||||||||||||||||||||||||||||
1.3 System RequirementsSuperQuery version1.20 32-Bit · Windows 95, or Windows NT. · 486 Processor or Better. · 8 MB of memory or more. · 8 MB of available Hard Disk space. |
||||||||||||||||||||||||||||||||||||||||||
1.4 InstallationCheck your hard disk space to ensure you have enough space to install SuperQuery depending on the version you are installing.
After installation is complete, restart Windows. SuperQuery creates a group in the Program Manager and a SuperQuery directory on your hard disk.
SuperQuery ships with a 7-days standard trial period. In order to register or to upgrade your SuperQuery copy, please follow the following steps:
Make sure you have obtained both the User ID and Registration key from Azmy Thinkware Inc.
|
||||||||||||||||||||||||||||||||||||||||||
1.5 Service and SupportFor questions that you cannot answer using this manual or online help, contact technical support through the AZMY Thinkware Inc. World Wide Web at www.azmy.com. Or you can call AZMY Thinkware technical support directly at (201) 947 1881.
Registered users of AZMY Thinkware SuperQuery receive 60 days of complimentary technical support, beginning with their first call to AZMY Thinkware Technical Support.
|
||||||||||||||||||||||||||||||||||||||||||
2 Getting started |
||||||||||||||||||||||||||||||||||||||||||
2.1 Starting SuperQueryWhen you run SuperQuery the Getting Started dialog (figure 1) will be displayed. You can either select any SuperQuery document from the drop down list, or click the Use Wizard to read your data button to run the Open Wizard (figure 2) to open your data.
Figure 1 Getting Started dialog
You can also start SuperQuery from the File menu; select one of the three menu items:
|
||||||||||||||||||||||||||||||||||||||||||
2.2 Reading your dataSuperQuery can read most common types of databases directly using “Data Access Objects. SuperQuery also supports ODBC if you have both ODBC and your database drivers installed.
From the Table menu select the Open Wizard menu item. The Open Wizard (figure 2) will run and will assist you to open your data.
Figure 2 Open wizard - WELCOME Page
Figure 3 Open wizard - MORE Page
After opening the data table, the Wizard (figure 3) will give you the options to: - Read another table , - Define links among open tables, or to - Call the analysis Wizard to analyze your data. |
||||||||||||||||||||||||||||||||||||||||||
2.3 Browsing your dataAfter reading your data , SuperQuery will automatically create a Table View (figure 4) to show the data. The Table View contains several panes separated by Splitters. All panes are kept synchronized at all times. You can completely configure any view by including some or all pane types.
Figure 4 SuperQuery Table View showing two the panes: Data Table Pane and Reps Graph Pane
As you scroll through the data columns , look at the Reps Graph to see the most common values in each column. Change the graph type to Data Graph and see the highest values in each column. Look at the Totals pane to see the total, min, max and average values. You can select from more than 20 statistical functions.
Double click on a bar in the Graph Pane or a value in the Total Pane and SuperQuery immediately position the cursor at that row in the Data Pane. Now push the filter button, and you immediately see only those rows satisfying that value. Scroll through your columns and watch all graphs and statistics calculated for that filter.
Now push the summary table button and create a summary on any category. Or push the Facts button and launch the fact discovery engine.
Map a numerical column into High, Medium, and Low by just a couple of clicks. Use any of the five Virtual Column Shortcuts to quickly perform calculations , classifications , and other analysis tasks.
Scroll in many tables simultaneously. For example, link your Customer and Order tables and scroll in the Customer table. SuperQuery automatically sets a filter to all orders by that customer and shows you the total, min, max, and average orders by that customer. Scroll in the orders table and SuperQuery automatically highlights the customer who made these orders.
SuperQuery , automatically does what used to require many complex SQL and programming procedures. It is that simple! You have to try it yourself, to believe how simple data analysis could be.
|
||||||||||||||||||||||||||||||||||||||||||
2.4 Updating your dataSuperQuery allows you to update your data tables if the original data is changed. SuperQuery will read the new data from its original location (you can specify another source) and update the entire analysis and virtual columns you have performed and added to your original data .
From the Table menu select the Update Wizard menu item. The Update Wizard (figure 5) will run and will assist XE "assist" you to update your data .
Figure 5 Update Wizard - WELCOME page
|
||||||||||||||||||||||||||||||||||||||||||
2.5 Exporting your dataSuperQuery allows you to export both the data tables and the generated facts into text files.
From the Table menu select Export menu item then select any of the following menu items:
Select Export | Tab delimited to export data tables into a tab delimited file. The system will display the Select Table Dialog to select a table that you want to export, then will display the Save As dialog to save the new file.
Select Export | Facts to export the facts generated into a "Fact - Exception" format text file.
Select Export | Rules to export the facts generated into an "If ... Then" format text file.
|
||||||||||||||||||||||||||||||||||||||||||
3 Organizing your data
In order to create a comprehensive analysis on your data , you need to create several views . Some views showing rows sharing common criteria or compose a specific fact.
In order to be able to organize these views, SuperQuery provides a Hierarchical Tab Organizer . Each tab may contain one or more view. Each view consists of one or more pane. Initially, you have eight empty Tabs . You can start organizing your data views in these tabs . Each view has a “pin” button. You simply click the button to “fix the view in the current tab. Similarly you click the pin again to make the view “floating ”, then change to another tab and “fix” it there. If you need more tabs, you can add a new group of tabs inside any of the exiting ones. This allows you to create as many tabs as you want.
|
||||||||||||||||||||||||||||||||||||||||||
3.1 TabsThe main window in SuperQuery contains a set of tabs or folders (figure 6) . Each tab may have data Table Views or other tabs (but not both). This means you can have a hierarchy of tabs. A new SuperQuery document contains one tab-level that has eight tabs. You may add any number of tabs at this level or create more levels of tabs within any tab.
Figure 6 SuperQuery hierarchical levels of tabs. A second tab level was created under tab3
In order to customize the appearance of a tab, click on the Right Mouse Button (RMB) when placed on the tab you want to customize. The property menu will pop up, select the property you want to change. You can also customize the tab by activating the tab first then selecting the Tab menu, and the item you want to change. Tab customization includes:
· Editing the tab title To edit the tab title, select Title menu item the Edit Title dialog will be displayed. Type the new title to apply on the active tab.
· Changing the tab background color To change the tabs color select Color menu item. The standard color dialog will be displayed. Select the new color to apply on the active tab.
· Moving views from one tab to the other Each view has a “pin” button. Simply click the button to make the view “floating ”, then change to another tab and “fix it there.
· Adding more tabs Select New menu item. Either select the Tab menu item to add a new tab at the end of the active tab row or, select the Tab Level to display Insert Tab (s) dialog to add a new row of tabs inside the active tab row.
|
||||||||||||||||||||||||||||||||||||||||||
3.2 ViewsSuperQuery Table View is a new innovative way of viewing the data (figure 7) . Each Table View consists of a set of ‘Panes’. Each pane displays data from its own point of view .
Figure 7 SuperQuery standard Table View with the four panes Data Table Pane, Total Page Pane , Reps Graph Pane and the Data Page Pane .
The SuperQuery standard Table View consists of four panes ; the main pane is the Data Table Pane that controls the scrolling of the other three panes ; Total Page Pane , Reps Graph Pane and the Data Page Pane .
Each pane has a re-sizable frame, to allow you to
change the pane size according to your preferences and the type of
the analysis you are performing. As you move the mouse on the pane
frames, the cursor changes to
Table view contains a row of 12 sets of buttons:
1. Fix button Press the Fix button to fix or float a view in the current tab. To move a view from one tab to another, first float the view, switch to the other tab, and then fix the view.
2.
View setup button
Press the View setup button to display the Select and reorder columns dialog in order to select / deselect and change the order of the columns in the view .
3.
Duplicate button Press the Duplicate button to duplicate the current view . You can use the Duplicate button to create a new copy of the view. You can then set a different filter , maximize a graph pane , or in general, perform further analysis on the new view.
4.
Print button
Press the Print button , to see a print-preview the current data table view . Press the print button from the preview screen to start the printing to your printer.
5.
Find button
Press the Find button to display the Find dialog that allows you to search for a specific word or phrase in the current column.
6.
Filter buttons
Filter buttons are composed of three buttons. When you press the left button, SuperQuery sets a filter on the value in the current cell in the Data Table Pane . Only rows that have the filter value are shown in the view . If you move to another cell and press the button again, the filter further restricted by the new value. Each time you press this button, you are ANDing the current cell value to the filter.
The button at the right side, is an "undo" filter operation. It will remove the last filter you set and hence expand the view .
The middle button acts mainly as an indicator to show you if there is a filter set. You can also use it as a shortcut to clear all filters. It is important to note that all panes will change their contents according to the current filter . For example , “Total” panes will display totals for all rows in the filter.
7.
Index buttons
Index buttons are composed of two buttons. The left button displays the Add / Edit Table Index dialog that allows you to create, edit, or select a Table Index. The right button acts as an indicator and a shortcut to set or set the current index.
8.
Add
Virtual Columns
Press the Add virtual column button to display the Add / Edit Virtual Columns dialog, which helps you, creates, edit, or delete a virtual column. SuperQuery supports five types of the virtual columns :
· Filter column · Range column · Keyword column · Classification column · Formula column
9.
Drill - Down button
This button creates a detailed view of the current row in the view. This button exists for Summary and Facts tables only. When you press it in a Summary view, SuperQuery creates a "Drill-Down" view the original data of the current row. Similarly, in the facts table view, the Drill-Down button displays all data rows that support the current fact.
10.
Summary button
Press the Summary button to display the Add / Edit Summary Table Dialog that will help you summarizes a data table .
11.
Facts
button Press the Facts button to display the Add / Edit Table Facts dialog or the Fact Discovery dialog that will let you run the fact engine.
12.
The three standard Windows buttons: minimize, restore
and close buttons.
|
||||||||||||||||||||||||||||||||||||||||||
3.3 PanesA pane is a re-sizable window that displays data in a certain way. SuperQuery provides ten types of panes :
· Data Table Pane · Data Page Pane · Reps Table Pane · Reps Graph Pane · Data Graph Pane · Total Page Pane · Total Table Pane · Notes Pane · Data Cell Pane · Facts Pane
The standard Table View panes are the Data Table Pane , Total Page Pane , Reps Graph Pane and Data Page Pane . You can change the type of any pane but you must have one Data Table Pane . You can't have more than one pane of same type.
To change the pane type:
1. Place the mouse on the pane you want to change. 2. Click on the right mouse button (RMB) to get the property menu . 3. Select the “Change Type” menu item to view all the available pane types, and select the new pane type.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.1 Data table paneThe Data Table Pane is the main pane in SuperQuery (figure 8) . The first row shows column headers, and the first column shows the row numbers (initially hidden). You can keep the first column or hide it, using the property menu item "Display row headers". The status bar shows the total number of rows and the current row number.
As you scroll across columns and the rows of the Data Table Pane , all other panes in the view will consequently update their content.
Figure 8 Data Table Pane
When you double click on any virtual column in the Data Table Pane , SuperQuery will bring up the corresponding edit dialog for that virtual column type. To change the properties of the Data Table Pane , click on the right mouse button (when placed on the Data Table pane ) the Data table pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.2 Data page paneData Page Pane is a page view of the current row (figure 9). The pane ’s first column contains names of the view columns . The second column contains the current row data . When you scroll in the Data Page Pane , SuperQuery updates all other panes in the Table View .
Figure 9 Data Page Pane
To change the properties of the Data Page Pane , click on the right mouse button (when placed on the Data Page Pane ) the Data page pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.3 Reps table paneReps Table Pane displays the repetition of values in the current column (figure 10). It shows the number of occurrences (or frequency ) of each individual value. If you double click in any of the values, SuperQuery will search for the first row (in the Data Table Pane) that has this value. You can then set a filter to study all rows that have this value.
Figure 10 Reps Table Pane
To change the properties of the Reps table Pane , click on the right mouse button (when placed on the Reps Table Pane ) the Reps table pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.4 Reps graph paneThis pane displays the repetition of values in the current column (figure 11). It shows the number of occurrences (or frequency ) of each individual value in a graphical representation. The graph consists of horizontal bars representing the unique values in the current column. On the right side of the bars are the corresponding number of rows and their percentage to the total number of rows.
If you double click on any bar , SuperQuery will search for the first row (in the Data Table Pane) that has that value. You can then set a filter to study all rows that have this value.
Figure 11 Reps Graph Pane
To change the properties of the Reps Graph Pane , click on the right mouse button (when placed on the Reps Graph Pane ) the Reps graph pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.5 Data graph paneThe graph consists of horizontal bars representing the numerical values of the data fields in the current column (figure 12). The left side of the graph shows the row number and the content of the first column. The right side of the bar shows the numerical value. You can set the graph to show the highest or lowest number of values.
Figure 12 Data Graph Pane
If you double click on any bar , SuperQuery will go to that row number (in the Data Table Pane). You can then set a filter to study all rows that have this value.
To change the properties of the Data Graph Pane , click on the right mouse button (when placed on the Data Graph Pane ) the Data graph pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.6 Total page paneTotal Page Pane displays statistics about the current numerical column (figure 13). You can add or remove from the set of default statistical calculations that were selected. When you double click on a value in the pane , SuperQuery will search for the first row (in the Data Table Pane) that has that value. This makes it easy to locate rows that have the maximum or minimum values for example .
Figure 13 Total Page Pane
To change the properties of the Total Page Pane click, on the right mouse button (when placed on the Total Page Pane ) the Total page pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.7 Total table paneTotal Table Pane works exactly the same as the Total Page Pane but with extended view to show the result of the selected statistical calculations when applied to all the numerical columns in the view (figure 14). When you double click on a value in the pane , SuperQuery will search for the first row that has that value. This makes it easy to locate rows that have the maximum or minimum values for example .
Figure 14 Total Table Pane
To change the properties of the Total Table Pane , click on the right mouse button (when placed on the Total Table Pane ) the Total table pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.8 Notes paneNotes Pane is a edit text window that you can use to write any comments about the view (figure 15) .
Figure 15 Notes Pane
To change the properties of the Notes Pane , click on the right mouse button (when placed on the Notes Pane ) the Notes pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.9 Data cell paneData Cell Pane is a view of the current cell (figure 16) . It is useful for displaying memo fields or long text lines.
Figure 16 Data Cell Pane
To change the properties of the Data Cell Pane , click on the right mouse button (when placed on the Data Cell Pane ) the Data cell pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.3.10 Facts paneFacts pane rewrites the current row in the facts table in a more natural language form (figure 17). You have the option to select the writing format in either a "Facts - Exception" or "if ... then" form.
Figure 17 Facts Pane
To change the properties of the Facts Pane , click on the right mouse button (when placed on the Facts Pane ) the Facts pane menu will pop up . Select the property you want change.
|
||||||||||||||||||||||||||||||||||||||||||
3.4 Quick Filters
The left part of the button
Now if you move to another column say Color and your current cell has Black and press the same button again, then the filter will be set on Product = Jacket AND Color = Black. This means each time you press the filter button, you add a new value to the filter. All values are ANDed together.
The middle button
The right button
The filter button is designed to provide for fast and easy analysis of your data . Each time you change the filter, all panes are recalculated for rows in the filter. |
||||||||||||||||||||||||||||||||||||||||||
3.5 Indexing your data
The index button is composed of two buttons. The left
button |
||||||||||||||||||||||||||||||||||||||||||
4 Working with multiple tables
The need to linking and joining tables belonging to the same database, and being able to synchronize these tables together, so that scrolling in one table will show related information on the other tables and to exchange columns among them is a very important and vital feature in data analysis and mining. |
||||||||||||||||||||||||||||||||||||||||||
4.1 Linking tablesIn order to link your data tables together, you have to open all the related tables under the same SuperQuery database name (see the Open Wizard), then define the primary and foreign keys for each table you want to link.
The primary key is the table key . It is a data column with the unique values. An example of the primary key is the "Employer Name" column in the "Employee" table.
The foreign key in a table is a column that represents the primary key of another table . Each table may contain non-or many foreign keys.
Figure 18 Edit Table Keys dialog
How to link tables together ? 1. Open all tables you want to link using the Open Wizard. Make sure to select (name) the same database for all the tables you want to link.
2. After you finish reading all the tables select the define links in the Open Wizard or from the application main menu select Table | Link menu item. The Edit Table Keys dialog displays (figure 18). The dialog displays all the open tables in the current database in the “Table” column.
3. Click on the drop-down button next to the table you want to define its primary key .
4. To select the foreign key (s) double click on the “Foreign key ” cell next to the table you want to define its foreign key(s), or place the active grid cell on the table and click on the “Edit Foreign Key” button. The Edit Foreign Keys Dialog (figure 19) displays
5. Define the foreign keys of the table . After you finish with the dialog the keys you defined will be displayed in the cell next to the table under the “Foreign Key” column. In order to define a foreign key (s) you will have to specify the foreign table and its primary key from within the same database and to associate them with one of the table (the one you are editing) columns . The dialog displays all column names in the table you are editing to select the foreign column (s) from them. Click on the drop-down button next to foreign column you selected under the “Foreign Table- Key” column to display all the [Primary Table - Key] combinations in the open database to associate on combination with selected foreign key .
Figure 19 Edit Foreign Keys dialog
|
||||||||||||||||||||||||||||||||||||||||||
4.2 Browsing multiple data viewsWhen you have a multi-table database with its tables linked, you can synchronize the table views and exchange foreign columns among these tables.
A foreign column is an external column you create to show the corresponding values from the foreign table the column is driven from.
If you do not have views opened for the tables you want to synchronize . You may select the View | New menu items then select the table you want to create a view for it.
Synchronizing tables means as you are scrolling in one view SuperQuery will try to synchronize the scrolling and the panes update of all the other views in same tab. SuperQuery will synchronize the tables depending on their relationships. For example , if we assume that view 'X' is a Customer view and view 'Y' is an Orders view. Each customer may have a number of orders. Therefore, there is a "one to many" relation between customer and orders (or a "many to one" between orders and customers). Now, if you scroll in the customer view to customer 'A', SuperQuery sets a filter on all orders from customer 'A' in the Orders view. Consequently, when you scroll in Orders to order 'B', SuperQuery finds the customer associated with that order and make it the current row in the Customer view.
|
||||||||||||||||||||||||||||||||||||||||||
5 Analyzing your data
Browsing through raw data may be a very frustrating job, especially when the data contains numbers and long text fields. One way to understand your data is to try to categorize or to reclassify the data. This can be done, by replacing the actual data with values that best represent these data. Reclassifying your data will enable you to easily group your data and study the performance of all related fields together.
For numerical data, you may need to map the numbers into a more elaborated and meaningful easy to read form. For example a price column may be mapped to a three values column: inexpensive, moderate and expensive. This mapping will facilitate the browsing through the data and will enable you to compare the performance of the other columns when studied side by side with this column.
For the textual data, you may reclassify the data, into groups, each group will share a common nature of the data it represents. For Example a Month column can be reclassified into four groups Winter, Spring, Summer and Fall. This reclassification will facilitate and reduce the data browsing time since the number of repetitions in the column was reduced from 12 to 4.
Also, as a special case for classifying text columns is to classify the text fields into two values TRUE/ FALSE values. For example a profit column could be reclassified to TRUE if the profit is positive and FALSE if the profit is negative.
Sometimes it is also very useful to categorize your data based on a phrase or word in the fields. For example in the Industry column you may be interested only in industries that have the word "Food" in its description.
How about some statistical information about each column in your data? Totals, average, minimum, maximum, deviation, most repeated value…etc. When such information is made available to each column then, the understanding of the data and hence the classification of the column will be much more accurate.
The ability to drive a new column from exiting ones using calculation formula is another strong aid to study your data and to help the data-mining engine to discover important patterns in your data. Calculation formula should be applied to Textual, Numerical, Date and Logical columns each, with its own relevant operators.
Data reclassification into a more elaborated form will reduce the time you take in studying your data and will make it much more easier to link the different related columns together, and further more will prepare your data for mining.
|
||||||||||||||||||||||||||||||||||||||||||
5.1 Statistical AnalysisSuperQuery can automatically perform statistical analysis on your data . The two panes; Total Page Pane and the Total Table Pane are dedicated to the Statistical analysis .
By default SuperQuery displays four statistical calculation in these two panes ; Total, maximum, minimum and Average. However you have the option to add or remove from the following statistical calculations using the panes property menu item Select Calculations:
|
||||||||||||||||||||||||||||||||||||||||||
5.2 Mapping numerical data into rangesSuperQuery allows you to easily map numerical columns into ranges , by creating a new column Range Column and map all the numerical values into number of ranges you define. For example you can divide sales into 3 levels; High, Medium, and Low. This is useful when setting filters on these ranges and looking at Graphs, and/or Total panes.
The Range column is also a very important step in preparing the data for mining while it helps the Fact Discovery Engine to study the related patterns in each of these ranges in relation with all the other columns in the data .
To create a Range column follow these three steps:
1.
Press on the Add Virtual Columns button
Figure 20 Add / Edit Virtual dialog
2. The Add / Edit Virtual Columns dialog (figure 20) will be displayed. Type in the name for the new column and press on the Range Column button.
3. The Enter Ranges dialog (figure 21) will be displayed to create a number of ranges to map the selected numerical column values.
Figure 21 Enter Ranges dialog
The Enter Ranges dialog consists of a grid with three columns; the Value column, the Minimum range and the Maximum range. You can classify any numeric column into a number of ranges by entering the name of each range in the Value column and the corresponding maximum value.
The first minimum value and the last maximum value are automatically calculated by the system. As the data table is updated, the minimum and maximum are automatically recalculated.
SuperQuery copies each maximum to the minimum of the following range. This ensures the continuity of all ranges.
Following are a five pre-defined ranges that you choose from: 1. Standard: Low, Medium and High. 2. Money: Inexpensive, Average and Expensive. 3. Length: Short, Medium and Long. 4. Weight: Light, Medium and Heavy. 5. Temperature: Cold, Moderate and Hot.
You can change the number of ranges from 3 as shown above to 5 by checking the 5 - Ranges button. The system will add two more ranges by adding the word “very” to both the first and the last ranges. You can also edit the range names and add or delete any range.
· Recalc. Min. & Max. The Recalc. Min. & Max. button recalculates all the minimum and maximum ranges.
· Clear Grid The Clear Grid button clears all ranges and entries made by you leaving the two default rows.
· Delete Range The Delete Range button deletes the current range.
· Add Range The Add Range button adds a blank range above the current cell.
SuperQuery will automatically create a new column and add it to the table view (figure 22) . The new column will be given a cyan color indicating that the column is a calculated one.
Figure 22 Data Table Pane showing the Profit Range Column that maps PROFITDLR into the four ranges: Loss, Low, Medium and High
As an example for the Range column: Table name: Orders Columns: Customer, Order_Value, Order_Date
Suppose you want to classify your orders into three classes Low, Medium and High orders.
You can create a range virtual column as follows:
1. Push the virtual column button to get the Add / Edit Virtual Column dialog. 2. Enter a new name e.g. Order_Class 3. Push the Range button to get to the Enter Ranges dialog 4. Press OK to create a three range virtual column.
Low: from the lowest value to 1000 Medium: from 1000 to 3000 High: from 3000 to the highest value available
|
||||||||||||||||||||||||||||||||||||||||||
5.3 Mathematical and logical CalculationsSuperQuery allows you to easily perform calculations on your data , by creating a new column Formula Column that performs any calculation statement you define on the data column(s) you specify. You may perform the calculation on any of the following data types: · Numerical data · Text data · Date data · Logical data · Another calculation statement · Any mix of any of the above data types.
Following are the calculation operator types:
· Numerical Operators 1. + Add : adds two numerical operands. 2. - Subtract : subtracts two numerical operands. 3. * Multiply : multiplies two numerical operands. 4. / Divide : divides two numerical operands. 5. / Remainder Divide : Remainder Division operator returns the quotient of the division of the two integer operands . 6. Convert to Text : converts one numeric field to a text field. 7. Offset Numeric : offsets (shifts) the selected column with the specified number of rows .
· Text Operators 1. Concatenate : concatenates two text operands. 2. Convert to Numeric : converts one text field to its numerical equivalent. 3. Get word : extracts a word from a string by giving its position (zero based) and the separator character that separates the words in the string. 4. Substring : extracts a substring from a text field, the operator takes three operands; the string to extract from, the start position (zero base) and the number of characters to be extracted. 5. Offset Text : offsets (shifts) the selected column with the specified number of rows .
· Date Operators 1. Day : extracts the day from a date field. 2. Day of week : translates the day in a date field to the day name ( e.g., Sunday ). 3. Weekday : determines if the data field was a weekday or not. 4. Weekend : determines if the data field was a weekend or not. 5. Month : extracts the month from a date field. 6. Month name : translates the month in a date field to the month name (e.g., December ). 7. Year : extracts the year from a date field. 8. Century : returns the century of the date field. 9. Convert to date : converts any field in the form of YYYYMMDD to a standard Date field 10. Offset Date : offsets (shifts) the selected column with the specified number of rows .
· Logical Operators 1. Is Equal : evaluates the Is Equal expression of the two selected operands 2. Is Not Equal : evaluates the Is Not Equal expression of the two selected operands 3. Offset Logic : offsets (shifts) the selected column with the specified number of rows
· Mathematical Operators 1. Square Root : returns the square root of only positive numbers. 2. Power of n : returns the n - power of the numeric field. If field is not 0.0 and n is 0.0 the result is 1. If the field is 0.0 and n is negative the result is error and returns 0.0. If both are 0.0, or if the field is negative and n is not an integer the function returns error. 3. Offset Math : offsets (shifts) the selected column with the specified number of rows .
To create a formula column follow these steps:
1.
Press on the Add Virtual Columns button
2. The Add / Edit Virtual Columns dialog will be displayed. Type in the name for the new column and press on the Formula Column button, see figure 20. 3. The Enter Value-Formula Dialog (figure 23) will be displayed allowing you to enter a number of calculation statement
Figure 23 Enter Value - Formula dialog. The prof/qty is calculation statement name for the statement to be constructed.
This dialog allows you to easily create comprehensive calculations. It allows you to break down the formula into a number of calculation statements. Each calculation statement is evaluated separately on the current row of the Data Table. The result is placed in the last column in the dialog to allow you to test its functionality.
The “Variable” column contains the name of the Calculation Statement. You can enter any name. Since calculating the last calculation statement will automatically calculate all statements involved, the system will only execute the last calculation statement.
The Insert row button is used for insetting a blank calculation statement above the current cell. The Delete row button deletes the current row with its content.
To construct the calculation statement you double click on cell next to the calculation statement name you entered in the Calculation column in the Enter Value - Formula dialog. The Edit the calculation statement dialog (figure 24) will be displayed.
Figure 24 Edit the Calculation statement dialog
This Dialog allows you to create calculation statements by selecting an operator and its operands.
A list of six buttons shows the five different types of operators and the sixth button selects all types together. When choosing one of these buttons, the “Select operator” list box displays a list of operators from that type.
After you choose the operator, the “Operands - Current row” group will show the required number of operands.
An operand combo box contains:
· All table columns, including virtual columns. · All calculation statements entered before this statement. You can also enter directly any number or text as the operand value.
As a quick test of your statement, you can press the “Calc” button to view the value of operands at the current row, as well as the result of the calculation statement. You can review the result for other rows in the table by using the right and left arrow buttons to change the current row position.
SuperQuery will automatically create a new column and add it to the table view (figure 25) . The new column will be given a cyan color indicating that the column is a calculated one.
Figure 25 Data Table Pane showing the Profit per unit Formula Column that calculates the formula PROFITDLR / QTYSOL.
As an example for the Formula column: Table name : Orders Columns: Customer, Qty, Unit Price
Suppose you want to create a new column that contains the total Order_Value. You can create a formula virtual column as follows:
1. Push the virtual column button to get the Add / Edit Virtual Column dialog. 2. Enter a new name e.g. Order_Value 3. Push the Formula button to get to the Enter Name-formula dialog
Name Calculation statement Result
Line1 Qty * Unit_Price 30000
Customer Qty Unit_Price Order_Value A 100 10 10 B 200 50 40 C 100 20 20 D 500 30 30 E 300 100 50 F 200 300 20 G 600 5 10 H 200 40 20 I 500 20 50 K 300 10 30
|
||||||||||||||||||||||||||||||||||||||||||
5.4 Comprehensive Queries using the Filter ColumnSuperQuery allows you to create complex criteria using one or more Filter columns . This makes your criteria more modular and easier to understand. The Filter column is a virtual column that contains TRUE or FALSE values depending on its criteria .
To create a Filter column follow these three steps:
1.
Press on the Add Virtual Columns button
2. The Add / Edit Virtual Columns dialog will be displayed. Type in the name for the new column and press on the Filter Column button, see figure 20. 3. The Enter Criteria dialog (figure 26) will be displayed to allow you to build the criteria .
Figure 26 Enter Criteria dialog
To specify criteria (example: Color = Red and Shape = Box), follow these steps:
1. Select your column identifier (Color)
2. Double click on the adjacent criteria cells. This will display the Edit Criterion dialog to allow you to build the actual criterion, see figure 27.
3. When finished specifying the criterion, you will be back in this dialog with current cell containing the text of your criterion (Color=Red).
4. To AND another criterion, simply find your second column identifier (Shape) and double click on the cell in the same vertical column containing the previous criterion. This will repeat steps 2 and 3 to form your second criterion (Shape = Box ).
On the other hand, to OR a criterion, instead of using the same vertical column as in step 4, use another vertical column.
The rule is simple; any criteria sharing the same vertical column in this dialog are ANDed otherwise they are ORed. Read the final criteria from top to bottom.
The complete criteria are always displayed in a text box at the bottom of the dialog.
Figure 27 Enter Criterion dialog
SuperQuery will automatically create a new column and add it to the table view (figure 28) . The new column will be given a cyan color indicating that the column is a calculated one.
Figure 28 Data Table Pane showing the Jacket S_M Filter Column that evaluates the criteria PRODUCT is equal to Jacket and SIZE is one of (Small, Medium).
As an example for the Filter column Table name : Orders Columns: Customer, Order_Value, Order_Date
Suppose that you want to view only orders that are greater than $1000.
You can create a filter virtual column as follows:
1. Push the virtual column button to get the Add / Edit Virtual Column dialog. 2. Enter a new name e.g. BigOrders 3. Push the Filter button to get the Enter Criteria dialog 4. Write the criterion statement.
A new column is inserted that has the two values TRUE and FALSE depending on whether the Order_Value satisfy the criteria or not.
You can move to a cell that has a TRUE value and push the filter button to view only those rows having the TRUE value. Similarly, you can view only rows that have FALSE values.
Notice that the criteria may contain more than one condition. For example , Order_Value > 1000 and Order_Month = August.
Customer Order_Value BigOrder
1 A 1000 FALSE 2 B 2000 TRUE 3 C 120 FALSE 4 D 222 FALSE 5 E 5000 TRUE 6 F 9000 TRUE 7 G 123 FALSE 8 H 3444 TRUE 9 I 0 FALSE 10 J 1111 TRUE
|
||||||||||||||||||||||||||||||||||||||||||
5.5 Classifying data using the Classification ColumnSuperQuery helps you to study each individual class and identify patterns among those classes. Suppose you want to study your new orders starting from 1996. You can classify your orders into 4 categories ; Large, Medium, Small and Old. You can create a Classification column that has 4 values and 4 criteria or conditions as follows:
The Classification column is a virtual column that simplifies categorizing your data .
To create a Classification column follow these three steps:
1.
Press on the Add Virtual Columns button
2. The Add / Edit Virtual Columns dialog will be displayed. Type in the name for the new column and press on the Classification Column XE "Classification Column" button, see figure 20. 3. The Enter Value - Criteria Dialog (figure 29) will be displayed and allows you to enter a number of criteria - value combination.
Figure 29 Enter Value - Criteria, showing the four classifications: Winter, Spring, Summer and Fall entered in the Value column
This dialog is a list of value / criteria pairs. For each row in the table, the system evaluates each of these criteria. If a criterion is satisfied, the corresponding value is placed in the classification column. If none of the criteria were satisfied, the system places “Unclassified” in the corresponding row in the classification column.
You can change the “Unclassified” word by entering any text in the value column without an associated criterion at the last entry.
The Insert row button is used for inserting a blank line above the current cell. The Delete row button deletes the current line.
This dialog consists of grid with two columns: The Value column, where you enter a name for the criterion, and the Criteria column, where you enter the criterion. Double click on the criteria column to display the Enter Criteria dialog to edit the criteria.
Figure 30 Enter Criteria dialog with the cell next to the MONTH column having the criterion is one of (December, January, February) constructed using the Edit Criterion dialog see figure 31.
To specify criteria (example: Winter is one of (December, January, February), follow these steps:
1. Double click on the cell next to the Winter in the Enter Value - Criteria dialog to display the Enter Criteria dialog (figure 30).
2. Select your column identifier (Month), and double click on the adjacent criteria cells. This will display the Edit Criterion dialog (figure 31) to allow you to build the actual criterion.
3. When finished specifying the criterion, you will be back in this dialog with current cell containing the text of your criterion is one of (December, January, February)
4. To AND another criterion, simply find your second column identifier and double click on the cell in the same vertical column containing the previous criterion.
On the other hand, to OR a criterion, instead of using the same vertical column as in step 4, use another vertical column.
The rule is simple; any criteria sharing the same vertical column in this dialog are ANDed otherwise they are ORed. Read the final criteria from top to bottom.
The complete criteria are always displayed in a text box at the bottom of the dialog.
Figure 31 Enter Criterion dialog
SuperQuery will automatically create a new column and add it to the table view (figure 32) . The new column will be given a cyan color indicating that the column is a calculated one.
Figure 32 Data Table Pane showing the Season Classification Column that evaluates the criteria: Winter = MONTH is one of (December, January, February), Spring = MONTH is one of (March, April, May), Summer = MONTH is one of (Jun, July, August, September) and Fall = MONTH is one of (October, November).
As an example for the Classification column Table name: Orders Columns: Customer, Order_Value, Order_Date
Suppose you want to classify your orders into 4 classes Large, Medium, Small and Old. You can create a classification virtual column as follows:
1. Push the virtual column button to get the Add / Edit Virtual Column dialog. 2. Enter a new name e.g. Order_Class 3. Push the Classification button to get to the Enter Value - Criteria dialog.
4. Type each class value in the value column and double click in the criteria column to enter the corresponding criteria as follows:
Value Criteria
Large Order_Date >= 01/01/96 and Order_Value > 5000 Medium Order_Date >= 01/01/96 and Order_Value > 1000 Small Order_Date >= 01/01/96 Old any other order
Order_Value Order_Date Order_Range
1 1000 01/01/96 Low 2 2000 01/02/96 Medium 3 120 01/11/95 Old 4 222 01/03/96 Low 5 5000 01/04/96 High 6 9000 01/06/96 High 7 123 01/07/96 Low 8 3444 01/12/95 Old 9 0 01/10/95 Old 10 1111 01/09/94 Old
|
||||||||||||||||||||||||||||||||||||||||||
5.6 Analyzing textual data using the Keyword ColumnSuperQuery allows you to extract keywords from a "description " column, by creating a Keyword column. You can for example , create a Keyword column on "Services Description" to identify all your client rows with the word "hardware" in their service description. You can then study those clients and identify any patterns they share.
To create a Keyword column follow these three steps:
1.
Press on the Add Virtual Columns button
2. The Add / Edit Virtual Columns dialog will be displayed. Type in the name for the new column and press on the Keyword Column button, see figure 20. 3. The Edit / Add Keyword dialog (figure 33) will be displayed to let you enter the Keyword.
Figure 33 Edit / Add Keyword dialog. The Keyword "Food" will be looked in the INDUSTRY column
SuperQuery will automatically create a new column and add it to the table view (figure 34) . The new column will be given a cyan color indicating that the column is a calculated one.
Figure 34 Data Table Pane showing the Food Keyword Column that writes the word Food whenever the word "Food" is found in the
As an example for the Keyword column Using the Orders table , the user may create a VISA column to reclassify the Payment_Method column. This allows for the analysis of customers who used Visa card for payment.
Payment Method VISA
1 VISA & MC VISA 2 CASH 3 CASH & VISA VISA 4 CREDIT 5 VISA VISA 6 VISA VISA 7 MC 8 AMEX 9 AMEX 10 CASH
|
||||||||||||||||||||||||||||||||||||||||||
6 Summarizing your data
Another way to analyze your data is to narrow down the number of rows by grouping them. You can group the data rows by applying a calculation on those rows sharing a common nature.
|
||||||||||||||||||||||||||||||||||||||||||
6.1 What is the Summary table?Imagine you have sales data and you want to study the sales of each product, for example you want to study the following:
1. The most common city selling each of the product 2. The most selling month for each product 3. The total quantity sold for each product 4. The average profit for each product
You can create a summary table that groups your sales by the PRODUCT column. The summary table will include: the most common of the CITY column, the most common of the MONTH column the Total of the QUANTITY column, the Average of the PROFIT column, etc. This summary table will look like this:
SuperQuery will create a new table for the summarized data. The table will be viewed in a summary view that has the same capabilities as the data view. Hence you will be able to see automatically generated statistics and graphs and have the chance to run the Fact Discovery Engine and even the Summary again on the generated table.
|
||||||||||||||||||||||||||||||||||||||||||
6.2 Summary OperatorsSuperQuery allows you to choose among the following 21 summary operators to use in summarizing your data. You may apply more than one summary operator on the same data column.
|
||||||||||||||||||||||||||||||||||||||||||
6.3 Creating a Summary table
To create a summary table on your data
you can either press the summary button
Figure 35 Add/ Edit Summary Table dialog. The table will be summarized according to the CITY
You can create or select a summary table from the "Select / Add Summary Table" box. Select the “Group By” column(s) that you want to summarize your table on. This column should represent groups of values. Therefore, The "Group By" box will not show key columns or columns that has no repeated values.
If you want to select more than one Group By column, press the More button that will display four more combo boxes. Check the Ascending check box if you want to summarize columns in ascending manner or uncheck it to summarize in a descending manner.
Press the ” Select Cols” button to display the Select columns and their summary calculations dialog (figure 36) to select the summary columns and their associated calculations.
When you finish entering all the information, press the OK button, the system will then display a progress bar indicating its progress in creating the summary table.
You can also delete an existing summary table by selecting the table from the "Select / Add Summary Table" list and press the “Delete” button.
Select columns and their summary calculations dialog helps you to select the columns you want to include in the summary table and the how you want them to be summarized.
Figure 36 Select columns and their summary calculations dialog. The system will evaluate the most common PRODUCT, most common SIZE, the total of the QTYSOLD and the total of the PROFITDLR in each CITY of the table.
The dialog contains a list of all table columns with their corresponding Sql type. Select all columns that you want to include in the summary table by checking the corresponding check box.
In the “Calculation” column, click on the corresponding drop-down button to select one of the summary calculation. SuperQuery gives the new summary column a name that consists of the selected summary operator and the data column name. However, you can write a different name in the “New Column Name” next to the selected column.
Sometimes, it is useful to summarize one column with more than one operator (for example you may want to summarize Sales with the average, total and maximum operators). You can do this by duplicating the data column (Sales) and selecting the required summary operator for each duplicate.
· Select the data column and press the “Duplicate Line” button. · Select the new summary operator from the “Calculation” column.
When you finish with the dialog press OK, the system automatically generates a new table view on the new created Summary Table (figure 37).
Figure 37 Summary View showing the Group by column CITY in yellow color and the four select columns and their summary calculations.
You can browse through the summary view, change the pane types and do more analysis on the summary data.
|
||||||||||||||||||||||||||||||||||||||||||
7 Data Mining -- Discovering Facts and Exceptions
The ultimate form of intelligence is Inference. The simplest form of inference is to: · Observe patterns and similarities among objects or concepts then, · Create general theories and exceptions .
By observing that many individual birds can fly, you can create a theory or a Fact that: All birds can fly. This allows you to create a class of living beings called Birds and associate the property Fly to the class rather than to each individual bird. We are probably programmed to do this, just to save memory in our brain! However, once you have a class, you can search for and expect more similarities among its members. Notice that most theories have exceptions -- there are some birds that can not fly.
What are SuperQuery Facts and Exceptions?
SuperQuery scans your table and searches for patterns . For example , it could find that all you Large orders have the same month -- December. SuperQuery considers this a "Fact" and reports it. A Fact may have exceptions , so SuperQuery will still report that fact even if only 80% of your Large orders happens in December. SuperQuery differentiates between "All" and "Most" type facts. Furthermore, SuperQuery reports the exact percentage (e.g. 80%). Facts can be seen as rules that govern your data , or patterns that exist among your objects.
Most of the large orders happen in December.Fact Most Orders where Amount = Large have Month = December. 80% of orders.Rule If Order Amount = Large then Month = December. 80% Confidence. |
||||||||||||||||||||||||||||||||||||||||||
7.1 Preparing the data for miningThere are more opportunities for finding patterns in a table if it contains columns that have repeated values. For example columns like Year , Month , State, Season, Class, Color, etc. If your data is mostly numeric and has no repeated values, then you have to classify it into Levels, Ranges, or Categories. To simplify this process , SuperQuery has the "Range Virtual Column " that you can create with minimum effort. Here are the steps to help hunt down those interesting facts:
· Create Range columns on your numerical columns. · Create Classification columns to categorize your data entities into interesting classes (e.g. Clients, Orders, Products, etc.) . · Create Filter columns · Create Summaries on useful categories . · Fire the Fact engine on the data tables as well as the summary tables.
SuperQuery saves you a lot of time and gives you a head start by doing its systematic search for facts. You start your search for patterns from a Factbase rather than from a database . After all, that is what computers are good for.
|
||||||||||||||||||||||||||||||||||||||||||
7.2 Fact Discovery Engine for the Discovery Edition
|
||||||||||||||||||||||||||||||||||||||||||
7.2.1 Running the Fact Discovery Engine
Press the Facts button
Figure 38 Add / Edit Table Facts dialog
The "If Condition" and "Then Conclusion" combo boxes allow you to instruct the engine to search using one or all columns in the condition part and in the conclusion part. Depending on the kind of patterns you are looking for.
For example , in the following fact; Most products where Product = Jacket have Profit = High. Percentage = 80% "If Condition" column is Product and the "Then Conclusion" column is Profit.
You may be interested in facts about a specific column or about all columns .
You can store new facts in a new Table or you can use an existing Facts Table. If you use an existing facts table , then you can append or replace the new facts to the table. Check the “Append new facts” box to append any new generated facts to the selected Facts Table. If you uncheck the button the new facts will replace the existing ones.
You can select whether you want Facts , Exceptions or both from the radio button group.
If you need to fine tune the performance of the Fact Discovery Engine, then press the Settings XE "Settings" button to display the Fact Settings Dialog (figure 39) .
Figure 39 Facts Settings dialog
When you press the Run button, the fact engine starts, and displays the Fact Discovery Engine Progress dialog (figure 40) showing the facts and exceptions as it discovers them. The engine will then, automatically generates a new table view on the new created Facts Table (figure 41).
Figure 40 Facts Engine Progress dialog
Figure 41 Detailed Fact View
|
||||||||||||||||||||||||||||||||||||||||||
7.2.2 Tuning the Fact Engine
|
||||||||||||||||||||||||||||||||||||||||||
7.2.2.1 Fine-tune the fact-finding performance
There are two sets of sliders used to fine-tune the fact-finding performance . The first set is the Fact setting sliders and the second is the Exception settings set.
In the Facts set, each slider controls the number of facts reported in a specific range. The first slider controls facts that are of type “All” and have a 100% confidence. The second slider is for facts of type “Most” and in the confidence range from 80% to 99%. The rest of the sliders cover the remaining 4 ranges . Sliders that control confidence ranges less than 50% are only useful when fact level is more than 1.
For example , if you increase the first slider, you will increase the number of facts reported that have a confidence of 100%. The fact engine will take more time and the new facts will be supported by fewer rows from the table .
In the Exception set, the first slider control exceptions that are less than 1% and the remaining sliders control the range up to 10%. When you increase a slider, you increase the number of exceptions reported in that range. On the other hand, decreasing the slider value requires a higher clustering level. On the other hand, increasing the slider value relaxes the clustering requirements.
Most Level : This combo-box allows you to define the level of the facts of type “Most”. For example if you select 70% then any fact of type Most must have at least 70% confidence factor. The confidence factor is the percentage of rows satisfying the fact to all rows in the filter . The default level is 60%. For example, the following table describes abstract objects that have color, shape and weight.
Color Shape Weight
-> Red Box 100 -> Red Box 200 -> Red Box 100 Red Cone 300 Blue Box 300 Blue Cone 300
The following fact is generated if the “Most Level ” is set at 60%.
Most (75%) objects that have Color = Red have Shape = Box (3 rows )
Consequently, if the “Most Level ” is increased to 80%, then this fact is not reported.
Maximum fact level : This combo-box allows you to define the Fact Level. The fact level is defined by the number of columns in the fact left side. For example in the following table :
Color Shape Weight -> Red Box 100 -> Red Box 100 -> Red Box 100 Red Cone 300 Blue Box 300 Blue Cone 300
The fact: Most objects that have Color = Red have Shape = Box is fact level 1 because it has one column at the left side (Color). On the other hand, the fact: All objects that have Color = Red and Shape = Box have weight = 100 is fact level 2 because it has two columns at the left side (Color and Shape ). Increasing the fact level increase the fact finding time. |
||||||||||||||||||||||||||||||||||||||||||
7.2.2.2 Ignore dependent columnsSince the data sometimes contains groups of columns that are known to be dependent and their relation is not useful It is important to avoid spending time discovering already known facts. The Fact Discovery Engine's objective is to report only interesting facts. SuperQuery allows you to enter these groups of columns. From the Add / Edit Table Facts dialog (figure 38) press the Dependency button the Select Dependent Column Groups dialog displays (figure 42). The dialog contains an example of such a situation.
Figure 42 Select Dependent Column Groups dialog
|
||||||||||||||||||||||||||||||||||||||||||
7.2.2.3 Filter implied factsIt is also important not to report any fact that can be implied from other more general facts. There are a number of other dependencies that SuperQuery is able to detect automatically and avoid. The Fact Discovery Engine can automatically ignore:
· Facts that contains blank values · Facts between a calculated column and its source columns · Facts that conclude a most common value
The Fact Filters dialog (figure 42) shows all the patterns that the engine automatically ignores. This allows you to cancel any or all filters and in the case of Most Common Values (MCV), you can also adjust the filter. You can display this dialog by pressing the Filters button from the Add / Edit Table Facts dialog (figure 38).
Figure 43 Facts Filters dialog
Following is the explanation of each option: · Ignore blank values: When reporting facts, do not report any fact that has a blank value in its condition or conclusion parts. · Ignore facts between a virtual column and its sources: Do not report facts resulting from existing dependencies between a virtual (calculated) column and its source columns. Source columns are those columns that were used to calculate the virtual column. · Ignore facts that conclude a most common value (MCV): Suppose that we have a Profit column that has 3 values Low, Medium, and High, and Medium is the most common value. Any fact that concludes this value is usually uninteresting. For example, a fact like: Most Product = Jacket have Profit = Medium may not be interesting, since we already know that most products have Medium profit. On the other hand, a fact like: Most Product = Jacket have profit = High is interesting, since profit = High is not expected. This filter can be set for facts of type ALL and Facts of type MOST. The percentage, with which a value is considered a Most Common, can also be specified. |
||||||||||||||||||||||||||||||||||||||||||