Installation using CSD & Parcel

This tutorial requires a running CDP 7.1.7+ platform with admin access to Cloudera Manager.

Note that this has been written for CDP-7.1.9.3 and DATAGEN-0.4.13, for future releases, please change the repository to point to the new release.

To get the links corresponding to the Datagen Version you want and your CDP Version, go to the S3 repository and navigate to find links of wanted CSD and Parcels:

https://datagen-repo.s3.eu-west-3.amazonaws.com/index.html

Note: Advise is to always go to latest Datagen Version (currenlty 0.4.10)

Setup CSD

Go to Cloudera Manager and make a wget of this:

wget https://datagen-repo.s3.eu-west-3.amazonaws.com/0.4.13/7.1.9.3/csd/DATAGEN-0.4.13.7.1.9.2.jar

Make a copy of the downloaded jar file into /opt/cloudera/csd/:

cp DATAGEN-*.jar /opt/cloudera/csd/

Restart Cloudera Server:

systemctl restart cloudera-scm-server

Setup Parcel

Go to Cloudera Manager, in Parcels > Parcel Repositories & Network:

Add this public repository to Cloudera Manager: https://datagen-repo.s3.eu-west-3.amazonaws.com/0.4.13/7.1.9.2/parcels/

Save & Verify to make sure URL is correct, you should have:

It is now possible to download Datagen parcel:

Then distribute it:

And finally activate it:

At the end, result should be:

Add Service wizard

Go Home in Cloudera Manager and pick the cluster where you want to install Datagen.

Click on Actions > Add a Service.

Now, it is possible to add Datagen as a Service to CDP:

Start the Add Wizard by clicking on Continue.

Select the Ranger dependency, if you are running Ranger (and you should), so Datagen can automatically creates policies in Ranger.

Select where to places Datagen servers (best is to start with only one and scale up later if needed):

Review changes, they all should be filled in automatically, however it is recommended to set properly the ranger properties (they could be removed later):

You should end up with:

Restart CMS before going on: Clusters > Cloudera Management Service , then Actions > Restart.

Start Service

Before launching commands, it is required to install jq with following commands:

yum install jq

In Actions > Start.

Once command pop up launched, you can browse Role Log and click on Full Log File:

and verify it started well, you should have:

You can proceed to Data Generation Basic Part