关键词 > COMP3425andCOMP8410

COMP3425 and COMP8410 Data Mining S1 2024 Assignment 2

发布时间:2024-06-29

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP3425 and COMP8410 Data Mining S1 2024

Assignment 2: Description of Data

Data and Metadata

The data supplied for the assignment arises from The Australian  Data Archive’s  ANU  Poll Dataverse [1]. As a student of the course, you are assumed to accept the Terms and Conditions of Use reproduced below. Please read them carefully. The custodian of the data has further requested you delete your data at the end of the course. However, you would be able to obtain another copy by request at the Website.

In particular, the data captures the results of a survey poll conducted in late 2023 on the topic of the 14th  October 2023 Australian Constitutional Referendum on the Aboriginal and Torres Strait Islander Voice to Parliament.   You can find a complete description of the purpose of the poll and coding of the data (metadata) and also adescriptive summary of the poll results here:

https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/13NPGQ

The data is provided to you for the assignment. You have  original dataset as downloaded from the   ADA    called   02_ANUPoll_57_CSV_100150_general.csv,    in    comma-separated-values format.           This           data            is           described            by           the            metadata            in 01_ANUPoll_57_DataDictionary_100150_general.xlsx and the corresponding question text in 01_ANUPoll_57_Questionnaire_100150.docx

If you are a COMP3425 (undergraduate) student, you are required to undertake some pre- processing   steps   as   specified   in   the   assignment   specification.   If   you   are   COMP8410 (postgraduate) student you may choose your own preprocessing actions, but you may find that referring to the COMP3425  assignment specification will help you.

A Note on Data Types

Note that most of the data is either nominal or ordinal.  Many ordinal variables include some marker values that are not ordinal, but indicate unordered categories as exceptions to the ordinal values. Be careful that you do not blindly handle those marker values as ordinal, and that you do not treat nominal data as ordinal without specifically justifying why you do so.    Appropriate handling may depend on the mining methods you use.

You can translate a nominal variable that is, by default, loaded in Rattle as numeric, using Rattle’s “Transform” tab (Recode-> As Categoric).  Alternatively you can use Excel prior to loading by following the example here:

For example, for  nominal nominal p_state_sdc, the formula CONCATENATE("""",

<p_state_sdc>, """") is used. If the variable has empty cells that you want to map to the “0” nominal value, you can use the formula or CONCATENATE("""",

TEXT(<p_state_sdc>, "0"), """") . In both cases, replace the variable name, where we use <p_state_sdc> in these examples, by the Excel cell reference, such as FB2.

References

[1] Biddle, Nicholas; McAllister, Ian, 2023, "ANU Poll 57/Australian Constitutional Referendum Survey (ACRS) (October 2023): Aboriginal and Torres Strait Islander Voice to Parliament",

doi:10.26193/13NPGQ, ADA Dataverse, V4

Terms and Conditions of Use

This data has been distributed exclusively for students of COMP3425 and COMP8410 S1

2024 only. Data must be destroyed at the end of the course but maybe re-obtained by request to the Australian Data Archive.

Furthermore, from

https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/13NPGQ

Iacknowledge that:

1. Use of the material is restricted to use for analytical purposes and that this means that I can only use the material to produce information of an analytical nature. Examples of such uses are:

(a) the manipulation of data to produce means, correlations or other descriptive summary measures;

(b) the estimation of population characteristics from sample data;

(c) the use of data as input to mathematical models and for other types of analyses (e.g. factor analysis); and

(d) to provide graphical and pictorial representation of characteristics of the population or sub-sets of the population.

2. The material is not to be used for any non-analytical purposes, or for commercial or financial gain, without the express written permission of the Australian Data Archive.   Examples of non-analytical purposes are:

(a) transmitting or allowing access to the data in part or whole to any other person / Department / Organisation not a party to this undertaking; and

(b) attempting to match unit record data in whole or in part with any other information for the purposes of attempting to identify individuals.

3. Outputs (such as statistics, tables and graphs) obtained from analysis of these data may be further disseminated provided that I:

(a) acknowledge both the original depositors and the Australian Data Archive;

(b) acknowledge another archive where the data file is made available through the Australian Data Archive by another archive; and

(c) declare that those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of it.

4. Use of the material is solely at my risk and I indemnify the Australian Data Archive and its host institution, The Australian National University.

5. The Australian Data Archive and its host institution, The Australian National University, shall not beheld liable for any breach of this undertaking.

6. The Australian Data Archive and its host institution, The Australian National University,  shall not beheld responsible for the accuracy and completeness of the material supplied.

7. Once access has been granted to the data, abuses of access rights, breaches of this undertaking, or failure to keep the data safe, may result in the application of restrictions.

Restrictions will escalate in severity depending upon the seriousness of the breach and vary from termination of access to the user and/or institution, on either a temporary or permanent basis, through to potential legal action in the most extreme cases.

8. I will notify promptly the ADA of any non-compliance with these Terms and Conditions of Use or of any infringements of the data, including unintentional disclosure or any errors within the data of which I become aware.

9. At the conclusion of my research notify of use. This may include the offer of publication to the ADA any new dataset that has been derived from the materials supplied or which have been created by the combination of the material supplied with other available data. The deposit of the derived dataset will include sufficient supporting documentation to enable the new dataset to be made accessible to other users.