COMP 333 Introduction to Data Analytics

Summer 2020 Semester 1: Marks


Marks will be posted here.


Timed Quiz 4

See Moodle for marks.

The results were spread from 5.0/10 to 10.0/10, somehwat bimodal with modes 7.0 and 8.5, and one outlier at 3.0/10 The average was around 7.7

Only one common mistake:
which of following are not a problem for data cleaning?

Lab Assignment 4

Marks/10
slide1 markup /2
slide2 markup /2
slide3 markup /2
slide4 markup /2
notebook /2

10% penalty or each day late

Total /10

SID		S1/2	S2/2	S3/2	S4/2	N/2	Days	Total/10
21347829	2	2	2	2	2	0	10
26101267	2	2	1	2	2	3	6.3
26321755	2	1	1	2	1	0	7
26353479	0	0	0	0	0	0	0
26487815	2	2	2	1	1	0	8
26565549	2	2	2	2	2	0	10
26671187	2	2	1	2	2	0	9
26993702	2	2	2	1	1	0	8
27017839	2	2	2	2	2	0	10
27117507	2	1	2	2	0	0	7
27148224	2	2	2	2	0	0	8
27491042	2	2	2	2	2	0	10
27739060	2	2	2	0	2	0	8
27739656	2	2	2	2	1	0	9
27742339	2	2	2	2	1	0	9
27808518	2	1	2	2	2	0	9
27819137	0	0	0	0	2	0	2
27877986	2	2	2	2	2	0	10
29683852	2	2	2	2	2	0	10
29794840	2	2	2	2	2	0	10
40000457	1	1	1	1	1	0	5
40005573	2	1	1	2	1	0	7
40009265	2	2	2	2	0	0	8
40013408	2	2	2	2	2	0	10
40016103	2	2	2	2	2	0	10
40016333	2	2	1	2	1	0	8
40016392	2	2	1	2	2	0	9
40018002	2	2	2	2	1	0	9
40018632	2	2	1	2	1	0	8
40022364	2	2	2	2	2	0	10
40024743	1	1	1	2	1	0	6
40024810	2	2	2	2	1	0	9
40032879	2	2	2	2	2	0	10
40033178	1	2	1	2	1	1	6.3
40034769	1	2	1	0	2	0	6
40034960	2	2	2	2	2	0	10
40037231	1	2	1	0	1	0	5
40038047	2	2	1	2	1	0	8
40039979	2	1	2	2	2	0	9
40042677	1	1	2	2	2	0	8
40043261	2	2	2	1	1	0	8
40043950	1	1	2	1	2	0	7
40044161	2	2	1	1	1	0	7
40044215	0	2	1	2	1	0	6
40044353	2	2	1	1	1	0	7
40045719	2	2	2	2	1	0	9
40045894	2	2	2	2	2	1	9
40046280	2	2	2	2	2	0	10
40049573	2	2	2	2	2	0	10
40051125	2	2	1	2	2	0	9
40051654	2	2	2	2	0	0	8
40054557	2	2	2	2	1	0	9
40055084	2	2	2	2	2	0	10
40056138	2	2	2	2	1	0	9
40057065	2	2	2	2	2	0	10
40057375	2	2	2	2	2	0	10
40058722	2	2	2	2	2	0	10
40060940	2	2	2	2	2	0	10
40061450	2	1	2	2	2	0	9
40061530	2	1	2	2	2	0	9
40063347	1	1	1	1	1	0	5
40063896	2	1	2	2	2	0	9
40065338	2	2	2	2	2	0	10
40065716	1	1	1	1	2	0	6
40065756	2	1	2	2	2	0	9
40065815	2	1	1	2	2	0	8
40066210	0	0	0	0	0	0	0
40066502	2	2	2	2	2	0	10
40066599	2	2	2	2	2	0	10
40074762	2	2	1	2	2	0	9
40077502	2	2	2	2	2	0	10
40082976	2	2	2	2	2	0	10
40086388	1	2	1	1	2	0	7
40086963	1	1	2	1	2	0	7
40087900	2	2	1	2	2	0	9
40089856	2	2	2	2	2	0	10
40090981	1	1	1	1	2	0	6
40092165	1	1	2	1	1	0	6
40095436	2	2	2	2	2	0	10
40099541	2	1	2	2	2	0	9
40108654	2	1	1	1	1	0	6
40130695	0	2	0	0	2	0	4

Lab Assignment 3

Marks /10
testing /2
facets, sorting /2
clustering /2
string processing /2
notebook /2

10% penalty or each day late

Total /10

SID		T/2	F/2	C/2	S/2	N/2	Days	Total/10
21347829	2	2	1	2	1	0	8
26101267	2	2	1	1	1	2	5.6
26321755	2	1	1	2	1	0	7
26487815	2	2	0	1	2	0	7
26565549	2	1	0	2	1	0	6
26671187	2	2	2	2	1	0	9
26993702	2	2	0	2	1	0	7
27017839	2	2	0	1	1	0	6
27117507	2	1	1	2	1	0	7
27491042	2	2	1	2	2	0	9
27739060	2	2	0	1	2	0	7
27739656	2	2	2	2	2	0	10
27742339	2	0	1	1	1	0	5
27808518	2	2	2	2	2	0	10
27819137	2	2	0	1	1	0	6
27877986	2	2	0	2	2	0	8
29683852	2	2	2	2	2	0	10
29794840	2	2	0	1	2	0	7
40000457	2	1	0	1	1	0	5
40005573	2	1	1	1	1	0	6
40009265	2	2	2	2	2	0	10
40013408	2	2	2	1	2		9
40016103	1	2	2	2	2	0	9
40016333	2	1	0	1	2	0	6
40016392	2	2	0	1	1	0	6
40018002	2	2	2	2	1	0	9
40018632	2	2	0	2	1	0	7
40022364	2	2	2	1	2	0	9
40024743	2	2	0	1	2	0	7
40024810	2	2	2	2	2	0	10
40032879	2	2	0	2	1	0	7
40033178	2	2	0	2	1	0	7
40034769	2	1	0	2	1	0	6
40034960	2	2	2	1	1	0	8
40037231	2	2	1	2	1	0	8
40038047	2	2	2	2	1	0	9
40039979	2	2	0	1	1	0	6
40042677	2	2	0	1	1	0	6
40043261	2	2	0	2	1	0	7
40043950	2	2	0	1	1	0	6
40044161	2	2	0	1	1	0	6
40044215	2	1	2	1	1	0	7
40044353	2	1	0	2	2	0	7
40045719	2	2	0	1	1	0	6
40045894	2	1	2	1	1	0	7
40046280	2	2	0	1	2	0	7
40049573	2	2	2	2	1	0	9
40051125	2	2	1	2	1	0	8
40051654	2	2	2	2	1	0	9
40054557	2	2	0	1	2	0	7
40055084	2	2	2	1	2	0	9
40056138	2	2	2	1	1	0	8
40057065	2	1	2	2	2	0	9
40057375	2	0	0	2	1	0	5
40058722	2	1	1	2	1	0	7
40060940	2	2	1	2	2	0	9
40061450	2	2	2	1	2	0	9
40061530	2	2	1	1	2	0	8
40063347	2	2	0	1	1	0	6
40063896	2	2	1	1	1	0	7
40065338	2	1	1	2	1	0	7
40065716	2	2	2	2	1	0	9
40065756	2	2	1	0	2	0	7
40065815	2	2	0	1	1	0	6
40066502	2	2	2	1	2	0	9
40066599	2	2	1	2	1	0	8
40074762	2	2	2	2	2	0	10
40077502	2	2	2	1	2	0	9
40082976	2	2	2	1	1	0	8
40086388	2	2	0	1	1	0	6
40086963	2	2	2	1	2	0	9
40087900	2	2	2	2	1	0	9
40089856	2	2	2	1	2	0	9
40090981	2	1	2	2	2	0	9
40092165	2	2	0	2	2	0	8
40095436	2	2	1	2	1	0	8
40099541	2	1	1	2	1	0	7
40108654	2	2	1	1	1	0	7
40130695	2	2	0	1	1	0	6

Timed Quiz 3

See Moodle for marks.

The results were a reasonable spread from 4.5/10 to 10.0/10. The average was around 7.3

Four very common mistakes:
Steps involved in EDA
Objectives of EDA
Uses of PCA
Which techniques are regression, and which are classification within modeling.

And Data Wrangling is part of EDA, so the time and effort for EDA includes the time and effort for Data Wrangling.

Lab Assignment 2

Part (1) marks
function quantDDA() /2
testing quantDDA() /2
notebook /2

Part (2) mark /3

Part (3) mark /1

10% penalty or each day late

Total /10

SID		P1f/2	P1t/2	P1n/2	P2/3	P3/1	Days	Total/10
21347829	2	2	2	3	1	0	10
26321755	1	2	1	3	1	0	8
26487815	1	2	1	3	1	0	8
26565549	2	2	1	3	1	0	9
26671187	2	2	2	3	0	0	9
26993702	1	2	1	3	1	0	8
27017839	1	0	1	1	1	0	4
27117507	1	0	1	1	1	0	4
27148224	2	2	1	1	0	0	6
27491042	2	2	1	3	1	0	9
27739060	1	2	1	1	1	0	6
27739656	1	2	2	3	1	0	9
27742339	1	2	0	3	1	0	7
27808518	2	2	2	3	1	0	10
27819137	2	2	1	3	1	0	9
27877986	1	2	2	3	1	0	9
29683852	2	2	2	3	1	0	10
29794840	1	2	1	3	1	0	8
40000457	2	2	1	3	1	0	9
40005573	2	2	1	3	1	0	9
40009265	2	2	2	3	1	0	10
40013408	2	2	2	3	0	0	9
40016103	2	2	2	3	1	0	10
40016333	2	2	2	3	1	0	10
40016392	1	2	1	2	1	2	5.6
40018002	2	2	1	3	1	0	9
40018632	2	2	1	1	1	1	6.3
40022364	2	2	2	3	1	0	10
40024743	1	2	2	0	0	0	5
40024810	2	2	2	3	1	0	10
40032879	2	2	1	2	1	0	8
40033178	1	2	1	3	1	2	6.4
40034769	1	2	2	1	1	0	7
40034960	1	2	2	3	1	0	9
40037231	2	2	1	3	1	0	9
40038047	2	2	2	3	1	0	10
40039979	2	2	2	3	1	0	10
40042677	1	2	1	0	1	0	5
40043261	2	2	1	3	1	0	9
40043950	1	2	2	2	0	0	7
40044161	1	2	1	3	1	0	8
40044215	1	1	2	1	1	0	6
40044353	1	2	2	3	1	0	9
40045719	1	2	1	3	1	0	8
40045894	2	2	2	3	1	0	10
40046280	2	2	2	3	1	0	10
40049573	2	2	2	3	1	0	10
40051125	2	2	1	3	1	0	9
40051654	2	2	1	3	1	1	8.1
40054557	2	2	0	3	0	0	7
40055084	2	1	0	2	0	0	5
40056138	2	2	2	3	1	0	10
40057065	2	2	2	3	1	0	10
40057375	2	2	1	3	1	0	9
40058722	2	2	0	3	1	0	8
40060940	2	2	0	2	1	0	7
40061450	2	2	0	3	1	0	8
40061530	2	2	2	3	1	0	10
40063347	1	2	1	3	1	0	8
40063896	1	2	1	3	1	0	8
40065338	2	2	1	3	1	0	9
40065716	2	2	1	3	1	0	9
40065756	2	2	2	3	1	0	10
40065815	2	2	2	3	1	0	10
40066502	1	2	2	3	1	0	9
40066599	2	2	2	3	1	0	10
40074762	2	2	2	3	1	0	10
40077502	2	2	2	3	1	0	10
40082976	2	2	1	3	1	0	9
40086388	1	2	0	3	1	0	7
40086963	2	2	2	2	1	0	9
40087900	2	2	2	3	1	0	10
40089856	2	2	2	3	1	0	10
40090981	2	2	0	3	1	0	8
40092165	2	2	1	3	1	0	9
40095436	2	2	1	3	1	0	9
40099541	2	2	2	3	1	0	10
40108654	2	2	0	3	1	0	8
40130695	1	2	0	1	0	1	3.6

Lab Assignment 1

Marks /2 for each of four plots
Mark /2 for notebook
10% penalty or each day late
Total /10

SID		P1	P2	P3	P4	N	Days	Total/10
21347829	2	2	2	2	2	0	10
26101267	2	2	2	2	1	1	8.1
26321755	2	2	2	2	0	1	7.2
26487815	2	2	2	2	2	0	10
26565549	2	2	2	2	1	0	9
26671187	2	2	2	2	1	1	8.1
26993702	1	1	2	2	1	0	7
27017839	1	1	2	2	1	0	7
27117507	1	2	2	1	0	0	6
27148224	1	1	1	1	1	0	5
27491042	2	2	2	2	1	0	9
27739060	2	2	2	2	1	0	9
27739656	2	2	2	1	1	0	8
27742339	2	2	2	2	0	0	8
27808518	2	2	2	2	0	0	8
27819137	2	2	2	1	0	0	7
27877986	2	2	2	2	2	0	10
29683852	2	2	2	2	2	0	10
29794840	1	1	2	2	1	0	7
40000457	2	2	2	2	0	0	8
40005573	1	1	2	2	0	0	6
40009265	2	2	2	2	2	0	10
40013408	2	2	2	2	1	0	9
40016103	2	2	2	2	1	0	9
40016333	2	2	2	1	1	0	8
40018002	1	1	2	2	1	0	7
40018632	2	2	2	2	1	0	9
40022364	2	2	2	2	2	0	10
40024743	2	2	2	1	1	0	8
40024810	2	2	2	2	2	0	10
40032879	2	2	2	2	2	0	10
40033178	2	0	2	2	1	1	6.3
40034769	2	2	2	2	2	0	10
40034960	2	2	2	2	2	0	10
40037231	2	2	2	1	0	0	7
40038047	2	2	2	2	0	0	8
40039979	2	2	2	2	1	0	9
40042677	2	2	2	2	1	0	9
40043261	2	2	2	2	1	3	6.3
40043950	2	2	2	0	1	0	7
40044161	2	2	2	2	1	0	9
40044215	0	0	2	1	1	0	4
40044353	1	1	2	1	1	0	6
40045719	2	2	2	2	0	0	8
40045894	2	2	2	2	2	0	10
40046280	2	2	2	2	2	0	10
40049573	2	2	2	2	2	0	10
40051125	2	2	2	2	2	0	10
40051654	2	2	2	2	0	0	8
40054557	2	2	2	2	0	0	8
40055084	2	2	2	2	0	0	8
40056138	2	2	2	2	2	0	10
40057065	2	2	2	2	2	0	10
40057375	1	2	2	2	2	0	9
40058722	2	2	2	2	1	0	9
40060940	1	1	1	0	0	0	3
40061450	2	2	2	2	1	0	9
40061530	2	2	2	2	2	0	10
40063347	2	2	2	2	0	0	8
40063896	2	2	2	2	1	0	9
40065338	2	2	2	2	1	0	9
40065716	2	2	2	2	1	0	9
40065756	2	2	2	2	1	0	9
40065815	2	2	2	2	2	0	10
40066502	2	2	2	2	0	0	8
40066599	2	2	2	2	2	0	10
40074762	2	2	2	2	2	0	10
40077502	2	2	2	2	2	0	10
40082976	2	2	2	2	2	0	10
40086388	2	2	2	2	0	0	8
40086963	2	2	2	2	1	0	9
40087900	2	2	2	2	1	0	9
40089856	2	2	2	2	2	0	10
40090981	2	2	2	2	1	0	9
40092165	2	2	2	2	1	0	9
40095436	2	2	2	2	0	0	8
40099541	2	2	2	2	1	0	9
40108654	2	2	2	2	1	0	9
40130695	2	2	2	2	1	0	9

Timed Quiz 2

See Moodle for marks.

The results were a reasonable spread from 5.0/10 to 9.0/10, with a few people below 5/10. The average was around 7.0

Three very common mistakes:
The different types of problems handled by data cleaning.
The role of clustering in data wrangling.
The steps for data wrangling, as different steps and different sequence of steps are proposed by different people.

Timed Quiz 1

See Moodle for marks.

The results were a nice bell-curve from 4.5/10 to 9.5/10. Average around 7.5.

Two very common mistakes:
The count of the number of boys in second grade is a continuous and ratio variable.
For coin tossing, Heads/Tails is a nominal variable. It remains a nominal variable even when encoded as 0/1. For nominal variables the mean is not defined. So 0.4 is not the mean for the dataset.


Last modified on 05 May 2020 by gregb@cse.concordia.ca