Introduction to Orange Tool Part-3

ARPAN KANANI
3 min readSep 15, 2021

--

Aim: Data Pre-processing and text analytics using Orange

Theory

what is text analytics?

Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns.

What is sentiment analysis?

Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. A sentiment analysis system for text analysis combines natural language processing (NLP) and machine learning techniques to assign weighted sentiment scores to the entities, topics, themes and categories within a sentence or phrase.

Why it is useful?

Sentiment analysis is extremely useful in social media monitoring as it allows us to gain an overview of the wider public opinion behind certain topics. Sentiment analysis is useful for quickly gaining insights using large volumes of text.

What is the effect of discretization, continuization, Normalization, Randomization on the data w.r.t. Orange?

Randomization:

A method based on chance alone by which study participants are assigned to a treatment group. Randomization minimizes the differences among groups by equally distributing people with particular characteristics among all the trial arms.

Sample Code:

>>> from Orange.data import Table

>>> from Orange.preprocess import Randomize

>>> data = Table(“titanic”)

>>> randomizer = Randomize(Randomize.RandomizeClasses)

>>> randomized_data = randomizer(data)

Discretization:

It is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation data by the models.

Discretization replaces continuous features with the corresponding categorical features:

Sample Code:

import Orange

store = Orange.data.Table(“superstore.tab”)

disc = Orange.preprocess.Discretize()

disc.method = Orange.preprocess.discretize.EqualFreq(n=3)

d_store = disc(store)

print(“Original dataset:”)

for e in store[:3]:

print(e)

print(“Discretized dataset:”)

for e in d_store[:3]:

print(e)

Continuization:

Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.

  • binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
  • multinomial variables are treated according to the argument multinomial_treatment.
  • discrete attribute with only one possible value are removed

Sample Code:

import Orange

products = Orange.data.Table(“Products”)

continuizer = Orange.preprocess.Continuize()

products1 = continuizer(titanic)

Normalization:

It is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies.

Sample Code:

>>> from Orange.data import Table

>>> from Orange.preprocess import Normalize

>>> data = Table(“Customers”)

>>> normalizer = Normalize(norm_type=Normalize.NormalizeBySpan)

>>> normalized_data = normalizer(data)

How to work with Orange in Python and vice-versa?

Orange is an open-source data visualization and analysis tool, where data mining is done through visual programming or Python scripting. The tool has components for machine learning, add-ons for bioinformatics, and text mining and it is packed with features for data analytics. … Orange is a Python library.

--

--