< July 2019 >
Su Mo Tu We Th Fr Sa
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Note: Please login to download data.

Data Mining Tasks Description

Task 1: DDoS Attacks Detection for Enterprise Network Security

This dataset contains traffic from an enterprise under attack scenario for 24 hours. To perform data analysis, over 70 features are extracted from the network traffic from all machines of that organization.

The training data has two behaviours: "Benign" and "attack". The normal behaviour (Benign) includes five different types of user daily activities such as email checking and file transferring between users in the organisation. The abnormal activity is a DDoS attack to one machine, where we labeled the traffic as "attack”. The labels for training data has been placed at the last column.

For testing data, it is a mixture of normal and attack traffic. The participants are expected to find the label for each flow (record) as "Benign" or "Attack”.

The statistical information of this dataset is summarized as:

No. of Sample
No. of Features
No. of Training
No. of Testing
1,278,824 76 464,976 813,848


Reference: Ali Ghorbani and Arash Habibi Lashkari, CDMC2018 Dataset: DDoS Attacks Detection for Enterprise Network Security, Canadian Institute for Cybersecurity, University of New Brunswick, http://www.csmining.org/


Task 2: Age of Abalone Prediction

The age of abalone in general is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope. Instead of such boring and time-consuming task, the age could be predicted by assessing other measurements, which are easier to obtain. Given is the name, type, and the measurement unit of all 7 attributes. The number of rings is the value to predict (integer).

The Attributes list is given as:

Length continuous
Diameter continuous
Height continuous
Whole Weight continuous
Shucked Weight continuous
Viscera Weight continuous
Shell Weight continuous

The statistical information of this dataset is summarized as:

No. of Sample
No. of Features
No. of Training
No. of Testing
4,177 8 2,924 1,253

Reference: Iqbal Gondal, Sam Waugh, and Warwick Nash,  CDMC2018 Dataset: Age of Abalone Prediction, Marine Research Laboratories, Department of Primary Industry and Fisheries, Tasmania, http://www.csmining.org/

Task 3: Social Network Services Sentiment Recognition

Social Network Services (SNS) nowadays is an essential part of our life. It is meaningful in multiple aspects for finding sentiments and tendencies among the opinions stated within SNS. The overall sentiment in a text is determined by the semantics of each sentence and the contextual information. This sentiment recognition task aims to evaluate the techniques of text classification for sentiment analysis.

Note that, the text messages provided in the data set are encrypted to prevent human involvement, which is NOT allowed in this competition.

The statistical information of the dataset is summarized as:

No. of Classes
No. of Training
No. of Testing
7       24,807

The 7 sentiment tagsets are described as:

Classes Sentiment Tags Descriptions
0 Happiness It tags sentences that expresse the emotion of happiness, including happiness, fondness, agreement, appreciation, proud and other positive emotional expressions.
1 Anger Sentences that express anger, criticism, insults and negative judgements are tagged with “Anger”
2 Sadness Sentences tagged with “Sadness” includes the expressions of complaint, depression, exhaustion, regrets, missing someone or something, etc.
3 Surprise The tag “Surprise” tags sentences that express the feeling of surprise caused by unexpected events.
4 Fear The tag “Fear” includes emotions of fear, anxiety, nervousness and confusion etc.
5 Neutral Sentences with “Neutral” tags are considered to convey no obvious emotions.
6 Boredom The tag “Boredom” includes emotions of boredom, loneliness or a lack of willingness to continue the current conversation.

Reference: CDMC2018 Dataset: Social Network Services Sentiment Recognition, International Cybersecurity Data Mining Competition, http://www.csmining.org/