Table of Contents


1 INTRODUCTION
            1.1
WELCOME TO SUPERQUERY
            1.2
FEATURE SUMMARY
            1.3 SYSTEM REQUIREMENTS
            1.4 INSTALLATION
            1.5 SERVICE AND SUPPORT

2 GETTING STARTED
            2.1 STARTING SUPERQUERY
            2.2 READING YOUR DATA
            2.3 BROWSING YOUR DATA
            2.4 UPDATING YOUR DATA
            2.5 EXPORTING YOUR DATA
 

3 ORGANIZING YOUR DATA
            3.1 TABS
            3.2 VIEWS
            3.3 PANES
                3.3.1 DATA TABLE PANE
                3.3.2 DATA PAGE PANE
                3.3.3 REPS TABLE PANE
                3.3.4 REPS GRAPH PANE
                3.3.5 DATA GRAPH PANE
                3.3.6 TOTAL PAGE PANE
                3.3.7 TOTAL TABLE PANE
                3.3.8 NOTES PANE
                3.3.9 DATA CELL PANE
                3.3.10 FACTS PANE
            3.4 QUICK FILTERS
            3.5 INDEXING YOUR DATA

 

4 WORKING WITH MULTIPLE TABLES
            4.1 LINKING TABLES
            4.2 BROWSING MULTIPLE DATA VIEWS
 

5 ANALYZING YOUR DATA
            5.1 STATISTICAL ANALYSIS
            5.2 MAPPING NUMERICAL DATA INTO RANGES
            5.3 MATHEMATICAL AND LOGICAL CALCULATIONS
            5.4 COMPREHENSIVE QUERIES USING THE FILTER COLUMN
            5.5 CLASSIFYING DATA USING THE CLASSIFICATION COLUMN
            5.6 ANALYZING TEXTUAL DATA USING THE KEYWORD COLUMN
 

6 SUMMARIZING YOUR DATA
            6.1 WHAT IS THE SUMMARY TABLE?
            6.2 SUMMARY OPERATORS
            6.3 CREATING A SUMMARY TABLE
 

7 DATA MINING -- DISCOVERING FACTS AND EXCEPTIONS
            7.1 PREPARING THE DATA FOR MINING
            7.2 FACT DISCOVERY ENGINE FOR THE DISCOVERY EDITION
                7.2.1 RUNNING THE FACT DISCOVERY ENGINE
                7.2.2 TUNING THE FACT ENGINE
                    7.2.2.1 FINE-TUNE THE FACT-FINDING PERFORMANCE
                    7.2.2.2 IGNORE DEPENDENT COLUMNS
                    7.2.2.3 FILTER IMPLIED FACTS

1 Introduction

 

1.1 Welcome to SuperQuery

 

SuperQuery is a data analysis and mining tool that runs under the Windows. Using rule induction technology, SuperQuery searches data tables and reports all interesting patterns and exceptions.  The Fact Discovery Engine is easily tuned to meet various analysis needs. 

 

SuperQuery also assists in preparing data for analysis by providing a number of facilities for partitioning, classifying and processing data columns.  In addition, SuperQuery helps to explore and analyze data by automatically displaying graphs and calculating statistics. It contains a number of Wizards that help read, update, and analyze data effortlessly.  SuperQuery can access and query a number of databases, spreadsheets, text files directly, and through ODBC drivers . 

 

1.  Asking Questions -- Visual Query

SuperQuery helps you read the entire database or any part of it. You can specify criteria, and/or scope. For example: you can just read those records that contain data for Year = 96. You can also read a random sample of your data. Once the data is read in SuperQuery, you can easily create quick filters, indexes, and choose and order columns into views.

 

A view in SuperQuery is a window of your data. A view displays a group of columns that you select. It shows all rows that satisfy any filter you set, in the order you specify.

 

Furthermore, a view consists of a number of panes. Each pane displays the data in a different format. Table Pane displays data in a table, Page Pane displays one row in a page form, Reps Pane that displays the frequency (i.e. how common each value).

 

For example: If you are looking at a column that has TRUE and FALSE values, the Reps pane will show you how many TRUE’s and FALSE’s are in that column. There are also Graphics Panes that display Values or Reps of the current column in a bar graph. SuperQuery has also Total Panes that display any set of calculations like total, maximum, average, standard deviation, etc.

 

You do not need to do anything, SuperQuery will do it automatically. Each view is pre-configured with a set of panes suitable to its contents. You just look at the view and resize the panes to fit your needs. If you want, you can change the type of any pane and change its configuration. SuperQuery is very flexible and easy to customize.

 

As you scroll through or set filters on your table, all associated views recalculate and redisplay statistics and graphs, automatically. For example: If you set a filter on one category of products, all statistics and graphs are recalculated for this category. If you scroll to the "order amount column", you will see Min, Max, Total, and Average of orders for this product category. In addition graphs are updated to show the top orders and most common order amounts.

 

2.  Analyzing Data

SuperQuery also helps you create Calculated Columns. These are Virtual Columns that are calculated from other columns. You can’t make a syntax error, SuperQuery calculations are entered using simple dialogs. Furthermore, SuperQuery provides additional types of virtual columns. These are analysis-oriented short cuts to an otherwise complicated Calculated column. There are Range, Filter, Classification and Keyword columns.

 

The Range column is an easy way to divide a numerical column into ranges. For example you can divide sales into 3 levels; High, Medium, and Low. This is useful in setting filters on these ranges and looking at Graphs, and/or Total panes. This is also very helpful in discovering patterns and facts on numerical columns. You can create this example with just 3 mouse clicks.

 

The Filter column is a virtual column that contains TRUE or FALSE values depending on its criteria. You can create complex criteria using one or more Filter columns. This makes your criteria more modular and easier to understand.

 

The Classification column is a virtual column that simplifies categorizing your data. This helps you study each individual class and identify patterns among those classes. Suppose you want to study your new orders starting from 1996. You can classify your orders into 4 categories; Large, Medium, Small and Old. You can create a Classification column that has 4 values and 4 criteria or conditions as follows:

Value

Criteria

Large

Order_Date >= 01/01/96 and Order_Value > 5000

Medium

Order_Date >= 01/01/96 and Order_Value > 1000

Small

Order_Date >= 01/01/96

Old

Any other order

Classification Column Criteria

This gives you the ability to see graphs and statistics for each class of your orders and run the fact discovery engine.

 

The Keyword column is a shortcut for extracting keywords from a "description" column. For example, you can create a Keyword column on "Services Description" to identify your entire client rows with the word "hardware" in their service description. You can then study those clients and identify any patterns they share.

 

All Data, Graphs and Total panes are automatically prepared for any new virtual column.

 

Another Important analysis facility is the Summary Table. Suppose you would like to see the number of orders of Large, Medium, Small and Old orders. Suppose also that you would like to see Total, Average, and other statistics for each order class. You can create a summary table that groups your orders by the Order_Class column. The summary table may include: Count of orders, Total Amount, Average Amount, etc. This summary table will look like this:

 

Order Class

No of Orders

Total Amount

Average Amount 

Large

1000

100,000

100

Medium

4000

200,000

50

Small

8000

200,000

25

Old

9000

450,000

50

Summary Table Example

 

You can group your data on more than one column. For example: on Year, then State, then Product Type. On the other hand, for each cell in the summary table, you can see all the details.

 

Double click on any cell and a view that shows the details is created. This kind of analysis is also known as Multi-Dimensional Analysis and Drill Down.

 

Summary tables are normal tables that you can add virtual columns to, summarize, and search for facts. Also their views can contain all data, graphics and total panes.

 

 

Viewing related tables.

SuperQuery helps you analyze any number of tables quickly and easily. After defining primary and foreign keys in your tables, you can tell SuperQuery to synchronize your data views. Suppose you have two views: Customers, and Orders. If you scroll in Orders, the Customers view will automatically scroll to show you the customer details. On the other hand, if you scroll in the Customer view, SuperQuery will automatically set a filter on the Orders view to show you all the orders of the current customer. So as you look at each customer record, you also see all their orders. Furthermore, if you have total and graph panes open, you will also see all the statistics and graphs on those orders AUTOMATICALLY. The more screen real estate you give SuperQuery, the more information you have at your fingertips. It is that simple!

 

3.  Discovering Facts

The ultimate form of intelligence is Inference. The simplest form of inference is to:

 

·         observe patterns and similarities among objects or concepts then,

·         create general theories and exceptions.

 

By observing that many individual birds can fly, you can create a theory or a Fact that: All birds can fly. This allows you to create a class of living beings called Birds and associate the property Fly to the class rather than to each individual bird. We are probably programmed to do this, just to save memory in our brain! However, once you have a class, you can search for and expect more similarities among its members. Notice that most theories have exceptions -- there are some birds that can not fly.

 

What are SuperQuery Facts and Exceptions?

SuperQuery scans your table and searches for patterns. For example, it could find that all you Large orders have the same month -- December. SuperQuery considers this a "Fact" and reports it. A Fact may have exceptions, so SuperQuery will still report that fact even if only 80% of your Large orders happens in December. SuperQuery differentiates between "All" and "Most" type facts. Furthermore, SuperQuery reports the exact percentage (e.g. 80%). Facts can be seen as rules that govern your data, or patterns that exist among your objects.

 

 

Most of the large orders happen in December.

Fact

Most Orders where Amount = Large have Month = December. 80% of orders.

Rule

If Order Amount = Large then Month = December. 80% Confidence.

 

SuperQuery also reports any exceptions to those patterns. For example, if only 5% of your orders are paid in cash, then SuperQuery reports this as an exception. Furthermore, exceptions may be errors in your database. This means that you can use SuperQuery to improve the quality of data.

 

How to hunt down those FACTS using SuperQuery?

There are more opportunities for finding patterns in a table if it contains columns that have repeated values. For example columns like Year, Month, State, Season, Class, Color, etc. If your data is mostly numeric and has no repeated values, then you have to classify it into Levels, Ranges, or Categories. To simplify this process, SuperQuery has the "Range Virtual Column" that you can create with minimum effort. Here are the steps to help hunt down those interesting facts:

 

·         Create Range columns on your numerical columns.

·         Create Classification columns to categorize your data entities into interesting classes ( e.g. Clients, Orders, Products, etc. ).

·         Create Filter columns

·         Create Summaries on useful categories.

·         Fire the Fact engine on the data tables as well as the summary tables.

 

SuperQuery saves you a lot of time and gives you a head start by doing its systematic search for facts. You start your search for patterns from a Factbase rather than from a database. After all, that is what computers are good for.

 

4.  Organizing Views

While conducting your analysis, you create many interesting views on your data and you need to be able to organize these views and find them efficiently. You also need to present your findings in an easy to use medium.

 

SuperQuery allows you to create tabs and folders with Titles and different Color Codes. Furthermore, you can create tabs within tabs.  For example, you can create a tab for each year and in each year you can have tabs for each product type.

 

 

1.2 Feature Summary

 

 

Visual Query

  • Visual Query Creation -- no programming or SQL required.

  •  Quick Filters.

  • Easy Indexing of your data tables.

  • Automatically create graphs that shows common values, exceptions, high, and low values

  • Automatically create Total, Min, Max, Average, etc. for values in the current column.

  • Automatically synchronize related views.

  • Open Access, xBASE, Excel, Paradox, Text files directly .

  • Open most popular databases using ODBC drivers .

 

 

 

Data Analysis

 Virtual Columns:

  • Range. Automatically Divide a numerical column into a number of ranges (High, Medium, Low).

  • Filter. Visually create comprehensive criteria.

  • Keyword. Easy short cut to extract keywords from description-type columns

  • Classification. Easy short cut to map values in one or more column(s) .

  • Formula. Visually create calculations.

·Create Summaries using more than 20 functions.

  • Summary tables are normal tables that can be re-analyzed and re-summarized.

  •   Drill Down. Just double click on any summary cell to create a detail view.

Fact Discovery

  •  on button click to start the Fact Discovery Engine.

  • Find Facts (rules or patterns) and Exceptions or Errors.

  • Many options to control the search process.

  • Report facts as a Table so that you can re-analyze the Factbase itself 

  • Report facts in an easy to read " If...then" rules or "All / Most" statements.

  • Drill Down. Just double click on any Fact cell to create a detail view . 

Organization

 

  •  Organize your views into tabs.

  • Create tabs within tabs .

 

1.3 System Requirements

SuperQuery version1.20 32-Bit

·         Windows 95, or Windows NT.

·         486 Processor or Better.

·         8 MB of memory or more.

·         8 MB of available Hard Disk space.

 

1.4  Installation

Check your hard disk space to ensure you have enough space to install SuperQuery depending on the version you are installing.

 

  1. Insert the first disk labeled SuperQuery Setup Disk 1 of -- into your Floppy disk drive.

  2. From the program Manager File menu, choose Run (If you' re running Windows 95, from the Start menu, choose Run.)

  3. For command Line, type a:\Setup, (in which a is your Floppy disk drive)

  4. Click OK to start Setup, then follow the instructions on screen.

 

After installation is complete, restart Windows. SuperQuery creates a group in the Program Manager and a SuperQuery directory on your hard disk.

 

SuperQuery ships with a 7-days standard trial period. In order to register or to upgrade your SuperQuery copy, please follow the following steps:

 

Make sure you have obtained both the User ID and Registration key from Azmy Thinkware Inc.

  1. Run SuperQuery

  2. In the Getting Started Dialog, press the Register (or upgrade) button.  This will display the Registration Dialog.

  3. In Version, select the version you want to upgrade to.

  4. In Term, select term of your usage to SuperQuery; purchase or trial.

  5. In Days, put "0" if you choose purchase or the number of the days for the trial.

  6. In User ID, and the Registration Key type the User ID and Registration Key you obtained from Azmy Thinkware Inc.

  7. Press OK.

 

 

1.5 Service and Support

For questions that you cannot answer using this manual or online help, contact technical support through the AZMY Thinkware Inc. World Wide Web at www.azmy.com. Or you can call AZMY Thinkware technical support directly at (201) 947 1881.

 

Registered users of AZMY Thinkware SuperQuery receive 60 days of complimentary technical support, beginning with their first call to AZMY Thinkware Technical Support.

 

2 Getting started

2.1 Starting SuperQuery

When you run SuperQuery the Getting Started dialog  (figure 1) will be displayed. You can either select any SuperQuery document from the drop down list, or click the Use Wizard to read your data button to run the Open Wizard (figure 2) to  open your data. 

 

 

Figure 1 Getting Started dialog

 

You can also start SuperQuery from the File menu; select one of the three menu items:

 

New

To create a new document.

 

Open SuperQuery Document

To select an exiting SuperQuery document (*.sqd).

Open  Database

To run the Open Wizard and to  open your data .

 

 

2.2 Reading your data

SuperQuery can read most common types of databases directly using  “Data Access Objects. SuperQuery also supports ODBC  if you have both ODBC and your database drivers installed.

 

From the Table menu select the Open Wizard menu item. The Open Wizard (figure 2) will run and will assist  you to open your data.

Figure 2 Open wizard - WELCOME Page

 

Figure 3 Open wizard - MORE Page

 

After opening the data table, the Wizard (figure 3) will give you the options to:

- Read another table ,          

- Define links among open tables, or to

- Call the analysis Wizard to analyze your data.

 

2.3 Browsing your data

After reading your data , SuperQuery will automatically create a Table View (figure 4) to show the data. The Table View   contains several panes separated by Splitters. All panes are kept synchronized at all times. You can completely configure any view by including some or all pane types.

 

Figure 4 SuperQuery Table View showing two the panes: Data Table Pane and Reps Graph Pane

 

As you scroll through the data columns , look at the Reps Graph to see the most common values in each column. Change the graph type to Data Graph and see the highest values in each column. Look at the Totals pane to see the total, min, max and average values. You can select from more than 20 statistical functions.

 

Double click on a bar in the Graph Pane or a value in the Total Pane and SuperQuery immediately position the cursor   at that row in the Data Pane. Now push the filter button, and you immediately see only those rows satisfying that value. Scroll through your columns and watch all graphs  and statistics calculated for that filter.

 

Now push the summary table button and create a summary on any category. Or push the Facts button and launch the fact discovery engine.

 

Map a numerical column into High, Medium, and Low by just a couple of clicks. Use any of the five Virtual Column Shortcuts to quickly perform calculations , classifications , and other analysis tasks.

 

Scroll in many tables simultaneously. For example, link your Customer and Order tables and scroll in the Customer table. SuperQuery automatically sets a filter to all orders by that customer and shows you the total, min, max, and average orders by that customer. Scroll in the orders table and SuperQuery automatically highlights the customer who made these orders.

 

SuperQuery  , automatically   does what used to require many complex SQL and programming procedures. It is that simple! You have to try it yourself, to believe how simple data   analysis  could be.

 

 

2.4 Updating your data

SuperQuery   allows you to update your data   tables if the original data is changed. SuperQuery will read the new data from its original location (you can specify another source) and update the entire analysis and virtual columns you have performed and added to your original data  .

 

From the Table menu   select the Update Wizard   menu item. The Update Wizard   (figure 5) will run and will assist XE "assist"  you to update your data  .

 

 

Figure 5 Update Wizard - WELCOME page

 

 

2.5 Exporting your data

SuperQuery   allows you to export both the data   tables and the generated facts into text files.

 

From the Table menu   select Export menu item then select any of the following menu items:

 

Tab  delimited

Write data   table into a text file with columns   separated by tabs  and rows   separated by new line.

Facts 

Write fact table   rows   in "Fact - Exception" format

Rules

Write fact table   rows   in "If ... Then" format.

 

Select Export | Tab delimited   to export data   tables into a tab delimited   file. The system will display the Select Table Dialog   to select a table   that you want to export, then will display the Save As dialog to save the new file.

 

Select Export | Facts   to export the facts generated into a "Fact - Exception" format text file.

 

Select Export | Rules to export the facts generated into an "If ... Then" format text file.

 

 

3 Organizing your data

 

In order to create a comprehensive analysis   on your data  , you need to create several views  . Some views showing rows   sharing common criteria   or compose a specific fact.

 

In order to be able to organize these views, SuperQuery   provides a Hierarchical Tab   Organizer  .  Each tab may contain one or more view. Each view consists of one or more pane. Initially, you have eight empty Tabs  .  You can start organizing your data views in these tabs  . Each view   has a “pin” button. You simply click the button to “fix  the view in the current   tab. Similarly you click the pin again to make the view “floating  ”, then change to another tab and “fix” it there.  If you need more tabs, you can add a new group of tabs inside any of the exiting ones. This allows you to create as many tabs as you want.

 

 

3.1 Tabs

The main   window in SuperQuery   contains a set of tabs   or folders (figure 6)  .  Each tab may have data   Table Views or other tabs (but not both). This means you can have a hierarchy   of tabs. A new SuperQuery document contains one tab-level that has eight tabs. You may add any number of tabs at this level or create more levels of tabs within any tab.

 

Figure 6 SuperQuery hierarchical   levels of tabs. A second tab level was created under tab3

 

In order to customize the appearance of a tab, click on the Right Mouse Button (RMB) when placed on the tab you want to customize. The property menu will pop up, select the property you want to change. You can also customize the tab by activating the tab first then selecting the Tab menu, and the item you want to change. 

Tab   customization includes:

 

·         Editing   the tab title

To edit the tab title,     select Title menu item the Edit Title dialog will be displayed. Type the new title to apply on the active tab.

 

·         Changing the tab background color

To change the tabs   color select     Color menu item. The standard color dialog will be displayed. Select the new color to apply on the active tab.

 

·         Moving views from one tab to the other

Each view   has a “pin” button. Simply click the button to make the view “floating  ”, then change to another tab and “fix  it there.

 

·         Adding more tabs   

    Select New   menu item. Either select the Tab menu item to add a new tab at the end of the active tab row or, select the Tab Level to display Insert Tab (s) dialog to add a new row of tabs   inside the active tab row.

 

 

3.2 Views

SuperQuery   Table View   is a new innovative way of viewing the data (figure 7)  . Each Table View consists of a set of ‘Panes’.  Each pane   displays data from its own point of view  .

 

Figure 7 SuperQuery standard Table View with the four panes Data Table Pane,   Total Page Pane  , Reps Graph Pane   and the Data Page Pane  .

 

The SuperQuery   standard Table View   consists of four panes  ; the main   pane   is the Data Table Pane   that controls the scrolling of the other three panes  ; Total Page Pane  ,  Reps Graph Pane   and the Data Page Pane  .

 

Each pane has a re-sizable frame, to allow you to change the pane size according to your preferences and the type of the analysis you are performing. As you move the mouse on the pane frames, the cursor changes toin order to resize the vertical splitters and   to resize the horizontal splitters separating the panes. As you move the cursor to the splitter end, the cursor change to one of the following four shapes  in order to maximize or minimize the pane in the arrow's direction.

 

Table view   contains a row of 12 sets of buttons:

 

1.       Fix button  

Press the Fix button   to fix   or float a view   in the current   tab. To move a view from one tab to another, first float the view, switch to the other tab, and then fix the view.

 

2.       View setup button

Press the View setup button   to display the Select and reorder columns   dialog in order to select / deselect and change the order of the columns in the view  .

 

3.       Duplicate button

Press the Duplicate button   to duplicate the current   view  .  You can use the Duplicate button to create a new copy of the view.  You can then set a different filter  , maximize a graph   pane  , or in general, perform   further analysis   on the new view.

 

4.       Print button

Press the Print button  , to see a print-preview the current   data   table   view  .  Press the print button from the preview screen to start the printing to your printer.

 

5.       Find button

Press the Find button   to display the Find dialog that allows you to search   for a specific word or phrase in the current   column.

 

6.       Filter buttons  

Filter buttons   are composed of three buttons.  When you press the left button, SuperQuery   sets a filter   on the value in the current   cell   in the Data Table Pane  .  Only rows   that have the filter value are shown in the view  .  If you move to another cell and press the button again, the filter further restricted by the new value.  Each time you press this button, you are ANDing the current cell value to the filter.

 

The button at the right side, is an "undo" filter   operation.  It will remove the last filter you set and hence expand the view  . 

 

The middle button acts mainly as an indicator  to show you if there is a filter   set. You can also use it as a shortcut to clear all filters. It is important to note that all panes   will change their contents according to the current   filter  .  For example  , “Total” panes will display totals for all rows   in the filter.

 

7.       Index buttons

Index buttons   are composed of two buttons. The left button displays the Add  / Edit Table Index dialog that allows you to create, edit, or select a Table Index.  The right button acts as an indicator   and a shortcut to set or set the current   index.

 

8.       Add   Virtual Columns

Press the Add   virtual   column button to display the Add / Edit Virtual Columns   dialog, which helps you, creates, edit, or delete a virtual column. SuperQuery   supports five types of the virtual columns  :

 

·         Filter column 

·         Range column

·         Keyword   column

·         Classification column

·         Formula column

 

9.        Drill - Down button  (only in Summary and Facts   views  )

This button creates a detailed view   of the current   row in the view. This button exists for Summary and Facts   tables only. When you press it in a Summary view, SuperQuery   creates a "Drill-Down" view the original data   of the current row.  Similarly, in the facts table   view, the Drill-Down button displays all data rows   that support the current fact.

 

10.    Summary button

Press the Summary button   to display the Add   / Edit Summary Table Dialog   that will help   you summarizes a data   table  .

 

11.     Facts button

Press the Facts   button to display the Add   / Edit Table Facts dialog or the Fact Discovery dialog  that will let you run the fact engine.

 

12.    The three standard Windows buttons: minimize, restore and close buttons.  

 

 

3.3 Panes

A pane   is a re-sizable window that displays data   in a certain way.  SuperQuery   provides ten types of panes  : 

 

·         Data Table Pane 

·         Data Page Pane 

·         Reps Table Pane 

·         Reps Graph Pane 

·         Data Graph Pane 

·         Total Page Pane 

·         Total Table Pane 

·         Notes Pane 

·         Data Cell Pane 

·         Facts   Pane 

 

The standard Table View   panes   are the Data Table Pane  , Total Page Pane  ,

Reps Graph Pane   and Data Page Pane  . You can change the type of any pane   but you must have one Data Table Pane  . You can't have more than one pane of same type.

 

To change the pane   type:

 

1.        Place the mouse   on the pane   you want to change.

2.        Click on the right mouse   button (RMB) to get the property   menu  .

3.        Select the “Change Type” menu   item to view   all the available pane   types, and select the new pane type.

 

 

3.3.1 Data table pane

The Data Table Pane   is the main   pane   in SuperQuery (figure 8)  . The first row shows column headers, and the first column shows the row numbers (initially hidden). You can keep the first column or hide it, using the property   menu   item "Display row headers". The status bar   shows the total number of rows   and the current   row number.

 

As you scroll   across columns   and the rows   of the Data Table Pane  , all other panes   in the view   will consequently update their content.

 

Figure 8 Data Table Pane

 

When you double click   on any virtual   column in the Data Table Pane  , SuperQuery   will bring up the corresponding edit dialog for that virtual column type. To change the properties of the Data Table Pane  , click on the right mouse   button (when placed on the Data Table pane  ) the Data table   pane menu   will pop up  . Select the property   you want change.

 

 

3.3.2 Data page  pane

Data Page Pane   is a page   view   of the current   row (figure 9). The pane  ’s first column contains names of the view columns  . The second column contains the current row data  . When you scroll   in the Data Page Pane  , SuperQuery   updates all other panes   in the Table View  .

 

Figure 9 Data Page Pane

 

To change the properties of the Data Page Pane  , click on the right mouse   button (when placed on the Data Page Pane  ) the Data page   pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.3 Reps table pane

Reps Table Pane   displays the repetition   of values in the current   column (figure 10). It shows the number of occurrences   (or frequency  ) of each individual value. If you double click   in any of the values, SuperQuery   will search   for the first row (in the Data Table Pane) that has this value. You can then set a filter   to study all rows   that have this value.

 

Figure 10 Reps Table Pane

 

To change the properties of the Reps table   Pane  , click on the right mouse   button (when placed on the Reps Table Pane  ) the Reps table pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.4 Reps graph pane

This pane   displays the repetition   of values in the current   column (figure 11). It shows the number of occurrences   (or frequency  ) of each individual value in a graphical representation.  The graph   consists of horizontal bars representing the unique values in the current   column. On the right side of the bars are the corresponding number of rows   and their percentage to the total number of rows.

 

If you double click   on any bar  , SuperQuery   will search   for the first row (in the Data Table Pane) that has that value. You can then set a filter   to study all rows   that have this value.

 

Figure 11 Reps Graph Pane

 

To change the properties of the Reps Graph Pane  , click on the right mouse   button (when placed on the Reps Graph Pane  ) the Reps graph   pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.5 Data graph pane

The graph   consists of horizontal bars representing the numerical   values of the data   fields in the current   column (figure 12). The left side of the graph shows the row number and the content of the first column.  The right side of the bar   shows the numerical value. You can set the graph to show the highest or lowest number of values.

 

Figure 12 Data Graph Pane

 

If you double click   on any bar  , SuperQuery   will go to that row number (in the Data Table Pane). You can then set a filter   to study all rows   that have this value. 

 

To change the properties of the Data Graph Pane  , click on the right mouse   button (when placed on the Data Graph Pane  ) the Data graph   pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.6 Total page pane

Total Page Pane   displays statistics about the current   numerical   column (figure 13). You can add or remove from the set of default   statistical   calculations   that were selected.  When you double click   on a value in the pane  , SuperQuery   will search   for the first row (in the Data Table Pane) that has that value. This makes it easy to locate rows   that have the maximum or minimum values for example  .

 

Figure 13 Total Page Pane

 

To change the properties of the Total Page Pane   click, on the right mouse   button (when placed on the Total Page Pane  ) the Total page   pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.7 Total table pane  

Total Table Pane   works exactly the same as the Total Page Pane   but with extended view   to show the result of the selected statistical   calculations   when applied to all the numerical   columns   in the view (figure 14).  When you double click   on a value in the pane  , SuperQuery   will search   for the first row that has that value. This makes it easy to locate rows   that have the maximum or minimum values for example  .

 

Figure 14 Total Table Pane

 

To change the properties of the Total Table Pane  , click on the right mouse   button (when placed on the Total Table Pane  ) the Total table   pane   menu   will pop up  . Select the property   you want change.

 

 

3.3.8 Notes pane   

Notes Pane   is a edit text window that you can use to write any comments   about the view (figure 15)  .

 

Figure 15 Notes Pane

 

To change the properties of the Notes Pane  , click on the right mouse   button (when placed on the Notes Pane  ) the Notes pane   menu   will pop up  . Select the property you want change.

 

 

3.3.9 Data cell pane 

Data Cell Pane   is a view   of the current   cell (figure 16)  . It is useful for displaying memo fields   or long text lines.

 

Figure 16 Data Cell Pane

 

To change the properties of the Data Cell Pane  , click on the right mouse   button (when placed on the Data Cell Pane  ) the Data cell   pane   menu   will pop up  . Select the property   you want change.

 

3.3.10 Facts pane  

Facts   pane   rewrites the current   row in the facts table   in a more natural language   form (figure 17). You have the option to select the writing format in either a "Facts - Exception" or "if ... then" form.

 

Figure 17 Facts Pane

 

To change the properties of the Facts   Pane  , click on the right mouse   button (when placed on the Facts Pane  ) the Facts pane   menu   will pop up  . Select the property   you want change.

 

 

3.4 Quick Filters   

The left part of the button , will set filter   on the current   cell   value. For example   if you are at the Product column and the current value is Jacket, then when you press this button, only rows   that have Product = Jacket will appear in the data pane and hence all the other panes in the view will show their contents with respect to this filter.

 

Now if you move to another column say Color and your current cell has Black and press the same button again, then the filter will be set on Product = Jacket AND Color = Black. This means each time you press the filter button, you add a new value to the filter. All values are ANDed together.

 

The middle button , is mainly an indicator   to whether there is a filter   set or not. When there is a filter set, a green indicator is turned on. You can also use this button to reset all filters (if any).

 

The right button , removes the last filter value from the filter  . For example if you have the above filter set Product = Jacket AND Color = Black. Pressing this button will remove the last filter Color = Black, and the current filter will be Product = Jacket. Pressing this button again will clear the remaining filter.

 

The filter   button is designed to provide for fast and easy analysis   of your data  . Each time you change the filter, all panes   are recalculated for rows   in the filter.

 

3.5 Indexing your data 

The index button is composed of two buttons. The left button creates, (or edits) the current   index. The right button is mainly an indicator   to whether there is an index in use or not. You can also use this button to set and rest the last index.

 

4 Working with multiple tables

 

The need to linking and joining tables belonging to the same database, and being able to synchronize   these tables together, so that scrolling in one table will show related information on the other tables and to exchange  columns   among them is a very important and vital feature in data analysis and mining.

 

4.1 Linking tables   

In order to link your data   tables together, you have to open all the related tables under the same SuperQuery database name (see the Open Wizard), then define the primary   and foreign   keys for each table   you want to link

 

The primary   key   is the table   key  . It is a data   column with the unique values. An example   of the primary key is the "Employer Name" column in the "Employee" table.

 

The foreign   key   in a table is a column that represents the primary   key   of another table  . Each table may contain non-or many foreign keys.

 

Figure 18 Edit Table Keys dialog

 

How to link tables together ?

1.        Open all tables you want to link using the Open Wizard. Make sure to select (name) the same database for all the tables you want to link. 

 

2.        After you finish reading all the tables select the define links in the Open Wizard or from the application main menu select Table | Link menu   item. The Edit Table Keys dialog displays (figure 18).  The dialog displays all the open tables in the current   database   in the “Table” column.

 

3.        Click on the drop-down button next to the table   you want to define its primary   key  . 

 

4.        To select the foreign   key  (s) double click   on the “Foreign key  ” cell   next to the table   you want to define its foreign key(s), or place the active grid cell on the table and click on the “Edit Foreign Key” button. The Edit Foreign Keys Dialog   (figure 19) displays

 

5.        Define the foreign   keys of the table  . After you finish with the dialog the keys you defined will be displayed in the cell   next to the table under the “Foreign Key” column. In order to define a foreign   key  (s) you will have to specify the foreign table   and its primary   key   from within the same database   and to associate them with one of the table (the one you are editing) columns  .  The dialog displays all column names in the table   you are editing to select the foreign   column  (s) from them. Click on the drop-down button next to foreign   column   you selected under the “Foreign Table- Key” column to display all the [Primary Table - Key] combinations in the open database   to associate on combination with selected foreign key  .

 

Figure 19 Edit Foreign Keys dialog

 

 

4.2 Browsing multiple data views  

When you have a multi-table   database   with its tables linked, you can synchronize   the table views   and exchange foreign   columns   among these tables.

 

A foreign   column   is an external   column you create to show the corresponding values from the foreign table   the column is driven from.

 

If you do not have views   opened for the tables you want to synchronize  . You may select the View | New   menu   items then select the table you want to create a view for it. 

 

Synchronizing tables means as you are scrolling in one view SuperQuery   will try to synchronize   the scrolling and the panes update of all the other views   in same tab. SuperQuery will synchronize the tables depending on their relationships. For example  , if we assume that view   'X' is a Customer view and view 'Y' is an Orders view.  Each customer may have a number of orders.  Therefore, there is a "one to many" relation between customer and orders (or a "many to one" between orders and customers).  Now, if you scroll   in the customer view to customer 'A', SuperQuery   sets a filter   on all orders from customer 'A' in the Orders view.  Consequently, when you scroll in Orders to order 'B', SuperQuery finds the customer associated with that order and make it the current   row in the Customer view.

 

5 Analyzing your data   

 

Browsing through raw data may be a very frustrating job, especially when the data contains numbers and long text fields. One way to understand your data is to try to categorize or to reclassify the data. This can be done, by replacing the actual data with values that best represent these data. Reclassifying your data will enable you to easily group your data and study the performance of all related fields together.

 

For numerical data, you may need to map the numbers into a more elaborated and meaningful easy to read form. For example a price column may be mapped to a three values column: inexpensive, moderate and expensive. This mapping will facilitate the browsing through the data and will enable you to compare the performance of the other columns when studied side by side with this column.

 

For the textual data, you may reclassify the data, into groups, each group will share a common nature of the data it represents. For Example a Month column can be reclassified into four groups Winter, Spring, Summer and Fall. This reclassification will facilitate and reduce the data browsing time since the number of repetitions in the column was reduced from 12 to 4.

 

Also, as a special case for classifying text columns is to classify the text fields into two values TRUE/ FALSE values. For example a profit column could be reclassified to TRUE if the profit is positive and FALSE if the profit is negative.

 

Sometimes it is also very useful to categorize your data based on a phrase or word in the fields. For example in the Industry column you may be interested only in industries that have the word "Food" in its description.

 

How about some statistical information about each column in your data? Totals, average, minimum, maximum, deviation, most repeated value…etc. When such information is made available to each column then, the understanding of the data and hence the classification of the column will be much more accurate.

 

The ability to drive a new column from exiting ones using calculation formula is another strong aid to study your data and to help the data-mining engine to discover important patterns in your data. Calculation formula should be applied to Textual, Numerical, Date and Logical columns each, with its own relevant operators.

 

Data reclassification into a more elaborated form will reduce the time you take in studying your data and will make it much more easier to link the different related columns together, and further more will prepare your data for mining.

 

 

5.1 Statistical Analysis   

SuperQuery   can automatically   perform   statistical   analysis   on your data  . The two panes;   Total Page Pane   and the Total Table Pane     are dedicated to the Statistical analysis  .

 

By default   SuperQuery   displays four statistical   calculation in these two panes  ; Total, maximum, minimum and Average. However you have the option to add or remove from the following statistical calculations   using the panes property   menu   item Select Calculations:

 

  1. TOTAL   calculates the sum of the data   points.

  2. AVERAGE   calculates the average of data   points.

  3. MAXIMUM   calculates the maximum data   point among the data points.

  4. MINIMUM   calculates the minimum data   point among the data points.

  5. AVEDEV   calculates the average of the absolute deviations of data   points from their mean. AVEDEV is a measure of the variability in a data   set.

  6. HARMEAN   calculates the harmonic mean of a data   set. The harmonic mean is the reciprocal of the arithmetic mean of reciprocals.

  7. KURT   calculates the kurtosis of a data   set. Kurtosis characterizes the relative peakness or flatness of a distribution   compared to the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution.

  8. SKEW   calculates the skewness of a distribution  . Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending towards more positive values. Negative skewness indicates a distribution with an asymmetric tail extending towards more negative values.

  9. STDEV   calculates standard deviation based on a sample. The standard deviation is a measure of how widely values are dispersed from the average value (the mean).

  10. STDEVP XE "STDEVP"  calculates standard deviation based on the entire population given as arguments. The standard deviation is a measure of how widely values are dispersed from the average value (the mean).

  11. VAR   calculates variance based on a sample.

  12. VARP   calculates variance based on the entire population.

  13. Count   counts the number of total number of rows  .

  14. Distinct Count     the number of the distinct values.

  15. Median   calculates the median of the column values.

  16. Most Common   finds the most common value.

  17. MC %   calculates the percentage of the most common value to the total number of the rows  .

  18. MC #   counts of the occurrence of the most common value.

  19. Least Common   finds the least common value.

  20. LC %   calculates the percentage of the least common value to the total number of the rows  .

  21. LC #   counts of the occurrence of the most common value.

 

5.2 Mapping numerical data into ranges   

SuperQuery   allows you to easily map numerical   columns into ranges  , by creating a new column Range Column   and map all the numerical values   into number of ranges you define. For example   you can divide sales into 3 levels; High, Medium, and Low. This is useful when setting filters on these ranges and looking at Graphs, and/or Total panes.

 

The Range column is also a very important step in preparing the data for mining while it helps the Fact Discovery Engine to study the related patterns in each of these ranges in relation with all the other columns in the data  .

 

To create a Range column follow these three steps:

 

1.                    Press on the Add Virtual Columns   button  from the Table View or select View | Add   column from the main   menu   

 

Figure 20 Add / Edit Virtual dialog

 

2.                    The Add   / Edit Virtual Columns   dialog (figure 20) will be displayed. Type in the name for the new column and press on the Range Column   button.

 

3.                    The Enter Ranges dialog (figure 21) will be displayed to create a number of ranges   to map the selected numerical   column values.

Figure 21 Enter Ranges dialog

 

The Enter Ranges dialog consists of a grid with three columns; the Value column, the Minimum range and the Maximum range. You can classify any numeric column into a number of ranges by entering the name of each range in the Value column and the corresponding maximum value.

 

The first minimum value and the last maximum value are automatically calculated by the system. As the data table is updated, the minimum and maximum are automatically recalculated.

 

SuperQuery copies each maximum to the minimum of the following range.  This ensures the continuity of all ranges.

 

Following are a five pre-defined ranges that you choose from:

1.             Standard:               Low, Medium and High.

2.             Money:                  Inexpensive, Average and Expensive.

3.             Length:                  Short, Medium and Long.

4.             Weight:                  Light, Medium and Heavy.

5.             Temperature:         Cold, Moderate and Hot.

               

 

You can change the number of ranges from 3 as shown above to 5 by checking the 5 - Ranges button. The system will add two more ranges by adding the word “very” to both the first and the last ranges. You can also edit the range names and add or delete any range.

 

·         Recalc. Min. & Max.

The Recalc. Min. & Max. button recalculates all the minimum and maximum ranges.

 

·         Clear Grid

The Clear Grid button clears all ranges and entries made by you leaving the two default rows. 

 

·         Delete Range

The Delete Range button deletes the current range.

 

·         Add Range

The Add Range button adds a blank range above the current cell.

 

SuperQuery   will automatically   create a new column and add it to the table   view (figure 22)  . The new column will be given a cyan   color indicating that the column is a calculated   one.

 

Figure 22 Data Table Pane showing the Profit Range Column that maps PROFITDLR into the four ranges: Loss, Low, Medium and High

 

As an example   for the Range column:

Table name:           Orders

Columns:               Customer, Order_Value, Order_Date

 

Suppose you want to classify   your orders into three classes Low, Medium and High orders.

 

You can create a range virtual   column as follows:

 

1. Push the virtual   column button to get the Add   / Edit Virtual Column   dialog.

2. Enter a new name e.g. Order_Class

3. Push the Range button to get to the Enter Ranges dialog

4. Press OK to create a three range virtual   column.

 

Low:         from      the lowest value   to            1000

Medium:   from     1000                        to            3000

High:        from      3000                        to            the highest value available

 

Row #

Order_Value

Order_Class

 1

1000     

Low

 2

2000      

Medium

 3

120     

Low

 4

222     

Low

 5

5000     

High

 6

9000     

High

 7

123     

Low

 8

3444     

High

 9

0     

Low

10

1111     

Low

 

 

 

 

5.3 Mathematical and logical Calculations

SuperQuery   allows you to easily perform   calculations   on your data  , by creating a new column Formula Column   that performs any calculation statement   you define on the data column(s) you specify. You may perform the calculation on any of the following data types:

·         Numerical data    

·         Text data 

·         Date data   

·         Logical data 

·         Another calculation statement 

·         Any mix of any of the above data   types.

 

Following are the calculation operator types:

 

·         Numerical Operators 

1.       + Add  : adds two numerical   operands.

2.        - Subtract  : subtracts two numerical   operands.

3.       * Multiply  : multiplies two numerical   operands.

4.       / Divide  : divides two numerical   operands.

5.        / Remainder   Divide  : Remainder Division operator returns the quotient of  the division of the two integer operands .   

6.        Convert to Text  : converts one numeric field to a text field.

7.        Offset Numeric  : offsets (shifts) the selected column with the specified number of rows  .

 

·         Text Operators 

1.        Concatenate  : concatenates two text operands.

2.       Convert to Numeric  : converts one text field to its numerical   equivalent.

3.        Get word  : extracts a word from a string by giving its position (zero based) and the separator character that separates the words in the string.

4.        Substring  : extracts a substring from a text field, the operator takes three operands; the string to extract from, the start position (zero base) and the number of characters to be extracted.

5.        Offset Text  : offsets (shifts) the selected column with the specified number of rows  .

 

·         Date Operators 

1.        Day  : extracts the day from a date field.

2.        Day   of week  : translates the day in a date field to the day name ( e.g., Sunday ).

3.        Weekday  : determines if the data   field was a weekday or not.

4.        Weekend  : determines if the data   field was a weekend or not.

5.        Month  : extracts the month from a date field.

6.        Month   name  : translates the month in a date field to the month name (e.g., December ).

7.        Year  : extracts the year from a date field.

8.        Century  : returns the century of the date field.

9.        Convert to date  : converts any field in the form of YYYYMMDD to a standard Date field

10.     Offset Date  : offsets (shifts) the selected column with the specified number of rows  .

 

·         Logical Operators 

1.        Is Equal  : evaluates the Is Equal expression of the two selected operands

2.        Is Not Equal  : evaluates the Is Not Equal expression of the two selected operands

3.        Offset Logic  : offsets (shifts) the selected column with the specified number of rows 

 

·         Mathematical   Operators

1.        Square Root  : returns the square root of  only positive numbers.

2.        Power of n  : returns the n - power of the numeric field. If field is not 0.0 and n is 0.0 the result is 1. If the field is 0.0 and n is negative the result is error and returns 0.0. If both are 0.0, or if the field is negative and n is not an integer the function returns error.

3.        Offset Math  : offsets (shifts) the selected column with the specified number of rows  .

 

To create a formula column follow these steps:

1.        Press on the Add Virtual Columns   button  from the Table View or select View | Add   column from the main   menu  

2.        The Add   / Edit Virtual Columns   dialog will be displayed. Type in the name for the new column and press on the Formula Column   button, see figure 20.

3.        The Enter Value-Formula Dialog   (figure 23) will be displayed allowing you to enter a number of  calculation statement 

 

Figure 23 Enter Value - Formula dialog. The prof/qty is calculation statement name for the statement to be constructed.

 

This dialog allows you to easily create comprehensive calculations. It allows you to break down the formula into a number of calculation statements. Each calculation statement is evaluated separately on the current row of the Data Table. The result is placed in the last column in the dialog to allow you to test its functionality.

  

The “Variable” column contains the name of the Calculation Statement.  You can enter any name.

Since calculating the last calculation statement will automatically calculate all statements involved, the system will only execute the last calculation statement.

 

The Insert row button is used for insetting a blank calculation statement above the current cell. The Delete row button deletes the current row with its content.

 

To construct the calculation statement you double click on cell next to the calculation statement name you entered in the Calculation column in the Enter Value - Formula dialog. The Edit the calculation statement dialog (figure 24) will be displayed.

 

Figure 24 Edit the Calculation statement dialog

 

This Dialog allows you to create calculation statements by selecting an operator and its operands.

 

A list of six buttons shows the five different types of operators and the sixth button selects all types together. When choosing one of these buttons, the “Select operator” list box displays a list of operators from that type.

 

After you choose the operator, the “Operands - Current row” group will show the required number of operands. 

 

An operand combo box contains:

 

·         All table columns, including virtual columns. 

·         All calculation statements entered before this statement.

You can also enter directly any number or text as the operand value. 

 

As a quick test of your statement, you can press the “Calc” button to view the value of operands at the current row, as well as the result of the calculation statement. You can review the result for other rows in the table by using the right and left arrow  buttons to change the current row position.

 

SuperQuery   will automatically   create a new column and add it to the table   view (figure 25)  . The new column will be given a cyan   color indicating that the column is a calculated   one.

 

Figure 25 Data Table Pane showing the Profit per unit Formula Column that calculates the formula PROFITDLR / QTYSOL.

 

As an example   for the Formula column:

Table name :          Orders

Columns:               Customer, Qty, Unit Price

 

Suppose you want to create a new column that contains the total Order_Value.

You can create a formula virtual   column as follows:

 

1. Push the virtual   column button to get the Add   / Edit Virtual Column   dialog.

2. Enter a new name e.g. Order_Value

3. Push the Formula button to get to the Enter Name-formula dialog

 

Name                      Calculation statement                          Result

 

Line1                      Qty * Unit_Price                                  30000     

 

Customer               Qty         Unit_Price    Order_Value

      A                       100                          10                           10           

      B                       200                          50                           40

      C                       100                          20                           20

      D                       500                          30                           30

      E                        300                          100                          50

      F                        200                          300                          20

      G                       600                            5                            10

      H                       200                          40                           20

      I                         500                          20                           50

      K                       300                          10                           30

 

 

 

5.4 Comprehensive Queries using the Filter Column

SuperQuery   allows you to create complex criteria   using one or more Filter columns  . This makes your criteria more modular and easier to understand.

The Filter column   is a virtual   column that contains TRUE or FALSE values depending on its criteria  .

 

To create a Filter column follow these three steps:

1.        Press on the Add Virtual Columns   button  from the Table View or select View | Add   column from the main   menu  

2.        The Add   / Edit Virtual Columns   dialog will be displayed. Type in the name for the new column and press on the Filter Column button, see figure 20.

3.        The Enter Criteria dialog (figure 26) will be displayed to allow you to build the criteria  .

 

Figure 26 Enter Criteria dialog

 

To specify criteria (example: Color = Red and Shape = Box), follow these steps:

 

1.        Select your column identifier (Color) 

 

2.        Double click on the adjacent criteria cells. This will display the Edit Criterion dialog to allow you to build the actual criterion, see figure 27.

 

3.        When finished specifying the criterion, you will be back in this dialog with current cell containing the text of your criterion (Color=Red).

 

4.        To AND another criterion, simply find your second column identifier (Shape) and double click on the cell in the same vertical column containing the previous criterion. This will repeat steps 2 and 3 to form your second criterion (Shape = Box ).

 

On the other hand, to OR a criterion, instead of using the same vertical column as in step 4, use another vertical column.  

               

The rule is simple; any criteria sharing the same vertical column in this dialog are ANDed otherwise they are ORed.   Read the final criteria from top to bottom.

               

The complete criteria are always displayed in a text box at the bottom of the dialog.

 

Figure 27 Enter Criterion dialog

 

SuperQuery   will automatically   create a new column and add it to the table   view (figure 28)  . The new column will be given a cyan   color indicating that the column is a calculated   one.

 

Figure 28 Data Table Pane showing the Jacket S_M Filter Column that evaluates the criteria PRODUCT is equal to Jacket and SIZE is one of (Small, Medium).

 

As an example   for the Filter column 

Table name :          Orders

Columns:               Customer, Order_Value, Order_Date

 

Suppose that you want to view   only orders that are greater than $1000.

 

You can create a filter   virtual   column as follows:

 

1. Push the virtual   column button to get the Add   / Edit Virtual Column   dialog.

2. Enter a new name e.g. BigOrders

3. Push the Filter button to get the Enter Criteria dialog

4. Write the criterion statement. 

 

A new column is inserted that has the two values TRUE and FALSE depending on whether the Order_Value satisfy the criteria   or not.

 

You can move to a cell   that has a TRUE value and push the filter   button to view   only those rows   having the TRUE value.  Similarly, you can  view only rows that have FALSE values. 

 

Notice that the criteria   may contain more than one condition. For example  , Order_Value > 1000 and Order_Month   = August.

 

     Customer          Order_Value          BigOrder

 

 1             A              1000                       FALSE                  

 2             B              2000                       TRUE

 3             C               120                        FALSE

 4             D               222                        FALSE

 5             E              5000                       TRUE

 6             F              9000                       TRUE

 7             G               123                        FALSE

 8             H              3444                       TRUE

 9             I                   0                          FALSE

10            J               1111                       TRUE

 

 

 

5.5 Classifying data using the Classification Column

SuperQuery   helps you to study each individual class and identify patterns   among those classes. Suppose you want to study your new orders starting from 1996. You can classify   your orders into 4 categories  ; Large, Medium, Small and Old. You can create a Classification column that has 4 values and 4 criteria   or conditions as follows:

 

Value

Criteria

Large

Order_Date >= 01/01/96 and Order_Value > 5000

Medium

Order_Date >= 01/01/96 and Order_Value > 1000

Small

Order_Date >= 01/01/96

any other order

Old

 

The Classification column is a virtual   column that simplifies categorizing your data  .

 

To create a Classification column follow these three steps:

1.        Press on the Add Virtual Columns   button  from the Table View or select View | Add   column from the main   menu  

2.        The Add   / Edit Virtual Columns   dialog will be displayed. Type in the name for the new column and press on the Classification Column XE "Classification Column"  button, see figure 20.

3.        The Enter Value - Criteria Dialog   (figure 29) will be displayed and allows you to enter a number of criteria   - value combination.

 

Figure 29 Enter Value - Criteria, showing the four classifications: Winter, Spring, Summer and Fall entered in the Value column

 

 

This dialog is a list of value / criteria pairs. For each row in the table, the system evaluates each of these criteria.  If a criterion is satisfied, the corresponding value is placed in the classification column. If none of the criteria were satisfied, the system places “Unclassified” in the corresponding row in the classification column.

 

You can change the “Unclassified” word by entering any text in the value column without an associated criterion at the last entry.

 

The Insert row button is used for inserting a blank line above the current cell. 

The Delete row button deletes the current line.

 

This dialog consists of grid with two columns: The Value column, where you enter a name for the criterion, and the Criteria column, where you enter the criterion. Double click on the criteria column to display the Enter Criteria dialog to edit the criteria.

 

Figure 30 Enter Criteria dialog with the cell next to the MONTH column having the criterion is one of (December, January, February) constructed using the Edit Criterion dialog see figure 31.

 

To specify criteria (example: Winter is one of (December, January, February), follow these steps:

 

1.        Double click on the cell next to the Winter in the Enter Value - Criteria dialog to display the Enter Criteria dialog (figure 30).

 

2.        Select your column identifier (Month), and double click on the adjacent criteria cells. This will display the Edit Criterion dialog (figure 31) to allow you to build the actual criterion.

 

3.        When finished specifying the criterion, you will be back in this dialog with current cell containing the text of your criterion is one of (December, January, February)

 

4.        To AND another criterion, simply find your second column identifier and double click on the cell in the same vertical column containing the previous criterion.

 

On the other hand, to OR a criterion, instead of using the same vertical column as in step 4, use another vertical column.  

               

The rule is simple; any criteria sharing the same vertical column in this dialog are ANDed otherwise they are ORed.   Read the final criteria from top to bottom.

               

The complete criteria are always displayed in a text box at the bottom of the dialog.

 

Figure 31 Enter Criterion dialog

 

SuperQuery   will automatically   create a new column and add it to the table   view (figure 32)  . The new column will be given a cyan   color indicating that the column is a calculated   one.

 

Figure 32 Data Table Pane showing the Season Classification Column that evaluates the criteria: Winter = MONTH is one of (December, January, February), Spring = MONTH is one of (March, April, May), Summer = MONTH is one of (Jun, July, August, September) and Fall = MONTH is one of (October, November).

 

As an example   for the Classification column

 Table name:          Orders

Columns:               Customer, Order_Value, Order_Date

 

Suppose you want to classify   your orders into 4 classes Large, Medium, Small and Old.

You can create a classification virtual   column as follows:

 

1. Push the virtual   column button to get the Add   / Edit Virtual Column   dialog.

2. Enter a new name e.g. Order_Class

3. Push the Classification button to get to the Enter Value - Criteria dialog.

 

4. Type each class value in the value column and double click   in the criteria   column to enter the corresponding criteria as follows:

 

Value                   Criteria

 

Large                      Order_Date >= 01/01/96 and Order_Value > 5000

Medium                 Order_Date >= 01/01/96 and Order_Value > 1000

Small                       Order_Date >= 01/01/96

Old                          any other order

 

Order_Value          Order_Date           Order_Range

 

 1    1000                 01/01/96                 Low                       

 2    2000                 01/02/96                 Medium

 3      120                 01/11/95                 Old

 4       222                01/03/96                 Low

 5     5000                01/04/96                 High

 6     9000                01/06/96                 High

 7       123                01/07/96                 Low

 8     3444                01/12/95                 Old

 9           0                01/10/95                 Old

10     1111               01/09/94                 Old

 

 

 

5.6 Analyzing textual data using the Keyword Column

SuperQuery   allows you to extract keywords from a "description  " column, by creating a Keyword   column. You can for example  , create a Keyword column   on "Services Description" to identify all your client rows   with the word "hardware" in their service description. You can then study those clients and identify any patterns   they share.

 

To create a Keyword column follow these three steps:

1.        Press on the Add Virtual Columns   button  from the Table View or select View | Add   column from the main   menu  

2.        The Add   / Edit Virtual Columns   dialog will be displayed. Type in the name for the new column and press on the Keyword   Column button, see figure 20.

3.        The Edit / Add   Keyword   dialog (figure 33) will be displayed to let you enter the Keyword.

 

Figure 33 Edit / Add Keyword dialog. The Keyword "Food" will be looked in the INDUSTRY column

 

SuperQuery   will automatically   create a new column and add it to the table   view (figure 34)  . The new column will be given a cyan   color indicating that the column is a calculated   one.

 

Figure 34 Data Table Pane showing the Food Keyword Column that writes the word Food whenever the word "Food" is found in the

 

As an example   for the Keyword   column

Using the Orders table  , the user may create a VISA column to reclassify the Payment_Method column.  This allows for the analysis   of customers who used Visa card for payment.

 

                Payment Method                                 VISA

 

 1             VISA & MC                                          VISA

 2             CASH

 3             CASH & VISA                                      VISA

 4             CREDIT

 5             VISA                                                      VISA

 6             VISA                                                      VISA

 7             MC

 8             AMEX

 9             AMEX

10            CASH

 

 

6 Summarizing your data 

 

Another way to analyze your data is to narrow down the number of rows by grouping them. You can group the data rows by applying a calculation on those rows sharing a common nature.

 

 

6.1 What is the Summary table?

Imagine you have sales data and you want to study the sales of each product, for example you want to study the following:

 

1.        The most common city selling each of the product

2.        The most selling month for each product

3.        The total quantity sold for each product

4.        The average profit for each product

 

You can create a summary   table   that groups your sales by the PRODUCT column. The summary table   will include: the most common   of the CITY column, the most common of the MONTH column the Total of the QUANTITY column, the Average of the PROFIT column, etc. This summary table will look like this:

 

PRODUCT

Most common CITY

Most common MONTH

Total QUANTITY

Average PROFIT

Coat

New York

October

7260

505

Glove

New York

September

5035

387

Hat

Los Angeles

May

9825

502

Jacket

Chicago

December

15265

467

Shirt

Boston

November

20130

344

 

SuperQuery will create a new table for the summarized data. The table will be viewed in a summary view that has the same capabilities as the data view. Hence you will be able to see automatically generated statistics and graphs and have the chance to run the Fact Discovery Engine and even the Summary again on the generated table.

 

 

6.2 Summary Operators

SuperQuery allows you to choose among the following 21 summary operators to use in summarizing your data. You may apply more than one summary operator on the same data column.

 

Total

The total of the group values.

Average

The average of the group values.

Maximum

The maximum of the group values.

Minimum

The minimum of the group values.

AVEDEV

The average deviation of the group values.

HARMEAN

The harmonic mean of the group values.

Skew

The skewness of the group values.

Kurt

The kurtosis of the group values.

STDEV

The standard deviation of the group values.

STDEVP

The standard deviation based on the entire population of the group values.

VAR

The variance of the group values.

VARP

The variance based on the entire population of the group values.

Count

The number of rows in the group.

Distinct Count

The number of distinct values in the group.

Median

The median of the group values.

Most Common

The most common value in the group.

MC %

The  percentage of the most common value within the group.

MC #

The  count of the most common value within the group.

Least Common

The least common value in the group.

LC %

The  percentage of the least common value within the group.

LC #

The least common value in the group.

 

 

 

6.3 Creating a Summary table

To create a summary   table   on your data   you can either press the summary button    , or select View | Summary menu   item or   . The Add   / Edit Summary Table dialog (figure 35) will be displayed to create a new Summary Table.

 

Figure 35 Add/ Edit Summary Table dialog. The table will be summarized according to the CITY

 

You can create or select a summary table from the "Select / Add Summary Table" box.  Select the “Group By” column(s) that you want to summarize your table on.  This column should represent groups of values.  Therefore, The "Group By" box will not show key columns or columns that has no repeated values.

 

If you want to select more than one Group By column, press the More button that will display four more combo boxes. Check the Ascending check box if you want to summarize columns in ascending manner or uncheck it to summarize in a descending manner.

 

Press the ” Select Cols” button to display the Select columns and their summary calculations dialog (figure 36) to select the summary columns and their associated calculations.

 

When you finish entering all the information, press the OK button, the system will then display a progress bar indicating its progress in creating the summary table.

 

You can also delete an existing summary table by selecting the table from the "Select / Add Summary Table" list and press the “Delete” button.

 

Select columns and their summary calculations dialog helps you to select the columns   you want to include in the summary table   and the how you want them to be summarized.

 

Figure 36 Select columns and their summary calculations dialog. The system will evaluate the most common PRODUCT, most common SIZE, the total of the QTYSOLD and the total of the PROFITDLR in each CITY of the table.

 

The dialog contains a list of all table columns with their corresponding Sql type. Select all columns that you want to include in the summary table by checking the corresponding check box.

 

In the “Calculation” column, click on the corresponding drop-down button to select one of the summary calculation. SuperQuery gives the new summary column a name that consists of the selected summary operator and the data column name. However, you can write a different name in the “New Column Name” next to the selected column.

 

Sometimes, it is useful to summarize one column with more than one operator (for example you may want to summarize Sales with the average, total and maximum operators). You can do this by duplicating the data column (Sales) and selecting the required summary operator for each duplicate.

 

·         Select the data column and press the “Duplicate Line” button.

·         Select the new summary operator from the “Calculation” column.

 

When you finish with the dialog press OK, the system automatically   generates a new table view on the new created Summary Table (figure 37).

 

Figure 37 Summary View showing the Group by column CITY in yellow color and the four select columns and their summary calculations.

 

You can browse through the summary view, change the pane types and do more analysis on the summary data.

 

 

7 Data Mining -- Discovering Facts and Exceptions

 

The ultimate form of intelligence   is Inference. The simplest form of inference   is to:

·         Observe patterns   and similarities   among objects or concepts then,

·         Create general theories   and exceptions  .

 

By observing that many individual birds can fly, you can create a theory or a Fact that: All birds can fly. This allows you to create a class of living beings called Birds and associate the property   Fly to the class rather than to each individual bird. We are probably programmed to do this, just to save memory   in our brain! However, once you have a class, you can search   for and expect more similarities   among its members. Notice that most theories   have exceptions   -- there are some birds that can not fly.

 

What are SuperQuery   Facts   and Exceptions?

 

SuperQuery   scans   your table   and searches for patterns  . For example  , it could find that all you Large orders have the same month -- December. SuperQuery considers this a "Fact" and reports it. A Fact may have exceptions  , so SuperQuery will still report   that fact even if only 80% of your Large orders happens in December. SuperQuery differentiates between "All" and "Most" type facts. Furthermore, SuperQuery reports the exact percentage (e.g. 80%). Facts   can be seen as rules that govern your data  , or patterns that exist among your objects.

 

 Most of the large orders happen in December.Fact Most Orders where Amount = Large have Month   = December. 80% of orders.Rule If Order Amount = Large then Month = December. 80% Confidence.

 

7.1 Preparing the data for mining  

There are more opportunities for finding patterns   in a table   if it contains columns   that have repeated values. For example   columns like Year  , Month  , State, Season, Class, Color, etc. If your data   is mostly numeric and has no repeated values, then you have to classify   it into Levels, Ranges, or Categories. To simplify this process  , SuperQuery   has the "Range Virtual Column  " that you can create with minimum effort. Here are the steps to help   hunt down those interesting facts:

 

·         Create Range columns   on your numerical   columns.

·         Create Classification columns   to categorize your data   entities into interesting classes (e.g. Clients, Orders, Products, etc.) .

·         Create Filter columns   

·         Create Summaries on useful categories  .

·         Fire the Fact engine   on the data   tables as well as the summary   tables.

 

SuperQuery   saves you a lot of time and gives you a head start by doing its systematic search   for facts. You start your search for patterns   from a Factbase   rather than from a database  . After all, that is what computers are good for.

 

 

7.2 Fact Discovery Engine for the Discovery Edition 

 

 

7.2.1 Running the Fact Discovery Engine

 

Press the Facts button    or select the View | Facts   menu   item or to display the Add   / Edit Table Facts dialog (figure 38). Enter a new name in the ‘Select / Add Fact' list box to create a new facts table   or select one of the existing tables to edit. 

 

Figure 38 Add / Edit Table Facts dialog

 

The "If Condition" and "Then Conclusion" combo boxes allow you to instruct the engine to search   using one or all columns   in the condition part and in the conclusion part.  Depending on the kind of patterns   you are looking for.

 

For example  , in the following fact;

Most products where Product = Jacket have Profit = High. Percentage = 80%

"If Condition" column is Product and the "Then Conclusion" column is Profit.

 

You may be interested in facts about a specific column or about all columns  .

 

You can store   new facts in a new Table or you can use an existing Facts   Table.  If you use an existing facts table  , then you can append or replace the new facts to the table.  Check the “Append new facts” box to append any new generated facts to the selected Facts Table. If you uncheck the button the new facts will replace the existing ones.

 

You can select whether you want Facts  , Exceptions or both from the radio button group.

 

If you need to fine tune   the performance   of the Fact Discovery   Engine, then press the Settings XE "Settings"  button to display the Fact Settings Dialog (figure 39)  .

 

 

Figure 39 Facts Settings dialog

 

When you press the Run button, the fact engine starts, and displays the Fact Discovery Engine Progress dialog (figure 40) showing the facts and exceptions as it discovers them.

The engine will then, automatically   generates a new table   view   on the new created Facts   Table (figure 41).

 

Figure 40 Facts Engine Progress dialog

 

Figure 41 Detailed Fact View

 

 

 

7.2.2 Tuning the Fact Engine 

 

7.2.2.1 Fine-tune the fact-finding performance

 

There are two sets of sliders used to fine-tune   the fact-finding performance  . The first set is the Fact   setting sliders and the second is the Exception settings set. 

 

In the Facts   set, each slider controls the number of facts reported in a specific range.  The first slider controls facts that are of type “All” and have a 100% confidence.  The second slider is for facts of type “Most” and in the confidence range from 80% to 99%.  The rest of the sliders cover the remaining 4 ranges  . Sliders that control confidence ranges less than 50% are only useful when fact level is more than 1.

 

For example  , if you increase the first slider, you will increase the number of facts reported that have a confidence of 100%.  The fact engine will take more time and the new facts will be supported by fewer rows   from the table  .

 

In the Exception set, the first slider control exceptions   that are less than 1% and the remaining sliders control the range up to 10%.  When you increase a slider, you increase the number of exceptions reported in that range.  On the other hand, decreasing the slider value requires a higher clustering level.  On the other hand, increasing the slider value relaxes the clustering requirements.

 

Most Level  :

This combo-box allows you to define the level of the facts of type “Most”.  For example   if you select 70% then any fact of type Most must have at least 70% confidence factor.  The confidence factor is the percentage of rows   satisfying the fact to all rows in the filter  .  The default   level is 60%. For example, the following table   describes abstract objects that have color, shape and weight.

 

                Color                      Shape                     Weight

 

->            Red                         Box                         100

->            Red                         Box                         200

->            Red                         Box                         100

                Red                         Cone                       300

                Blue                        Box                         300

                Blue                        Cone                       300

 

The following fact is generated if the “Most Level  ” is set at 60%. 

 

Most (75%) objects that have Color = Red have Shape = Box  (3 rows  )

 

Consequently, if the “Most Level  ” is increased to 80%, then this fact is not reported. 

 

Maximum fact level  :

This combo-box allows you to define the Fact Level.  The fact level is defined by the number of columns   in the fact left side. For example   in the following table  :

 

Color                      Shape                     Weight

->            Red                         Box                         100

->            Red                         Box                         100

->            Red                         Box                         100

                Red                         Cone                       300

                Blue                        Box                         300

                Blue                        Cone                       300

 

The fact: Most objects that have Color = Red have Shape = Box

is fact level 1 because it has one column at the left side (Color). On the other hand, the fact:

All objects that have Color = Red and Shape = Box have weight = 100 is fact level 2 because it has two columns   at the left side (Color and Shape ). Increasing the fact level increase the fact finding time.

 

7.2.2.2     Ignore dependent columns

Since the data sometimes contains groups of columns that are known to be dependent and their relation is not useful It is important to avoid spending time discovering already known facts. The Fact Discovery Engine's objective is to report only interesting facts. SuperQuery allows you to enter these groups of columns. From the Add / Edit Table Facts dialog (figure 38) press the Dependency button the Select Dependent Column Groups dialog displays (figure 42). The dialog contains an example of such a situation.

 

Figure 42 Select Dependent Column Groups dialog

 

 

7.2.2.3     Filter implied facts

It is also important not to report any fact that can be implied from other more general facts. There are a number of other dependencies that SuperQuery is able to detect automatically and avoid. The Fact Discovery Engine can automatically ignore:

 

·         Facts that contains blank values

·         Facts between a calculated column and its source columns

·         Facts that conclude a most common value

 

The Fact Filters dialog (figure 42) shows all the patterns that the engine automatically ignores. This allows you to cancel any or all filters and in the case of Most Common Values (MCV), you can also adjust the filter. You can display this dialog by pressing the Filters button from the Add / Edit Table Facts dialog (figure 38).

 

Figure 43 Facts Filters dialog

 

Following is the explanation of each option:

·         Ignore blank values: When reporting facts, do not report any fact that has a blank value in its condition or conclusion parts.

·         Ignore facts between a virtual column and its sources: Do not report facts resulting from existing dependencies between a virtual (calculated) column and its source columns. Source columns are those columns that were used to calculate the virtual column.

·         Ignore facts that conclude a most common value (MCV): Suppose that we have a Profit column that has 3 values Low, Medium, and High, and Medium is the most common value. Any fact that concludes this value is usually uninteresting. For example, a fact like: Most Product = Jacket have Profit = Medium may not be interesting, since we already know that most products have Medium profit. On the other hand, a fact like: Most Product = Jacket have profit = High is interesting, since profit = High is not expected. This filter can be set for facts of type ALL and Facts of type MOST. The percentage, with which a value is considered a Most Common, can also be specified.