google cloud Archives - Indium https://www.indiumsoftware.com/blog/tag/google-cloud/ Make Technology Work Wed, 22 May 2024 08:04:20 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://www.indiumsoftware.com/wp-content/uploads/2023/10/cropped-logo_fixed-32x32.png google cloud Archives - Indium https://www.indiumsoftware.com/blog/tag/google-cloud/ 32 32 Training Custom Machine Learning Model on Vertex AI with TensorFlow https://www.indiumsoftware.com/blog/training-custom-machine-learning-model-on-vertex-ai-with-tensorflow/ Fri, 03 Feb 2023 12:11:24 +0000 https://www.indiumsoftware.com/?p=14404 “Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.” AutoML vs Custom Training To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google. Choose a training method | Vertex AI | Google Cloud In this

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

]]>
“Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

AutoML vs Custom Training

To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google.

Choose a training method | Vertex AI | Google Cloud

In this article we are going to train the Custom Machine Learning Model on Vertex AI with TensorFlow.

To know about Vertex AI’s AutoML feature read my previous blog : Machine Learning using Google’s Vertex AI.

About Dataset

We will be using Crab Age Prediction dataset from Kaggle. The dataset is used to estimate the age of the crab based on the physical attributes.

To learn more about how our AI and machine learning capabilities can assist you.

Click here

There are 9 columns in the Dataset as follows.

  1. Sex: Crab gender (Male, Female and Indeterminate)
  2. Length: Crab length (in Feet; 1 foot = 30.48 cms)
  3. Diameter: Crab Diameter (in Feet; 1 foot = 30.48 cms)
  4. Height: Crab Height (in Feet; 1 foot = 30.48 cms)
  5. Weight: Crab Weight (in ounces; 1 Pound = 16 ounces)
  6. Shucked Weight: Without Shell Weight (in ounces; 1 Pound = 16 ounces)
  7. Viscera Weight: Viscera Weight
  8. Shell Weight: Shell Weight (in ounces; 1 Pound = 16 ounces)
  9. Age: Crab Age (in months)

We must predict the Age column with the help of the rest of the columns.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from Kaggle. There is only one csv file in the downloaded dataset called CrabAgePrediction.csv, I have uploaded this csv to the bucket called vertex-ai-custom-ml on Google Cloud Storage.

Step 2: Working on Workbench

Go to Vertex AI, then to Workbench section and enable the Notebook API. Then click on New Notebook and select TensorFlow Enterprise, we are using TensorFlow Enterprise 2.6 without GPU for the project. Make sure to select us-central1 (Iowa) region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

In the JupyterLabopen the Terminal and Run following cmd one by one.

mkdir crab_folder     # This will create crab_folder                       

cd crab_folder        # To enter the folder

mkdir trainer         # This will create trainer folder

touch Dockerfile      # This will create a Dockerfile

We can see all the files and folder on the left side of the JupyterLab, from that open the Dockerfile and start editing with following lines of code.

FROM gcr.io/deeplearning-platform_release/tf2-cpu.2-6

WORKDIR /

COPY trainer /trainer

ENTRYPOINT [“python”,”-m”,”trainer.train”]

Now save the Docker file and with this we have given the Entrypoint for the docker file.

To save the model’s output, we’ll make a bucket called crab-age-pred-bucket.

For the model training file, I have already uploaded the python file into the GitHub Repository. To clone this Repository, click on the Git from the top of JupyterLab and select Clone a Repository and paste the repository link and hit clone.

In the Lab, we can see the crab-age-pred folder; copy the train.py file from this folder to crab_folder/ trainer /.

Let’s look at the train.py file before we create the Docker IMAGE.

#Importing the required packages..

import numpy as np

import pandas as pd

import pathlib

import tensorflow as tf

#Importing tensorflow 2.6

from tensorflow import keras

from tensorflow.keras import layers

print(tf.__version__)

#Reading data from the gcs bucket

dataset = pd.read_csv(r”gs://vertex-ai-custom/CrabAgePrediction.csv”)

dataset.tail()

BUCKET = ‘gs://vertex-ai-123-bucket’

dataset.isna().sum()

dataset = dataset.dropna()

#Data transformation..

dataset = pd.get_dummies(dataset, prefix=”, prefix_sep=”)

dataset.tail()

#Dataset splitting..

train_dataset = dataset.sample(frac=0.8,random_state=0)

test_dataset = dataset.drop(train_dataset.index)

train_stats = train_dataset.describe()

#Removing age column, since it is a target column

train_stats.pop(“Age”)

train_stats = train_stats.transpose()

train_stats

#Removing age column from train and test data

train_labels = train_dataset.pop(‘Age’)

test_labels = test_dataset.pop(‘Age’)

def norma_data(x):

    #To normalise the numercial values

    return (x – train_stats[‘mean’]) / train_stats[‘std’]

normed_train_data = norma_data(train_dataset)

normed_test_data = norma_data(test_dataset)

def build_model():

    #model building function

    model = keras.Sequential([

    layers.Dense(64, activation=’relu’, input_shape=[len(train_dataset.keys())]),

    layers.Dense(64, activation=’relu’),

    layers.Dense(1)

  ])

    optimizer = tf.keras.optimizers.RMSprop(0.001)

    model.compile(loss=’mse’,

                optimizer=optimizer,

                metrics=[‘mae’, ‘mse’])

    return model

#model = build_model()

#model.summary()

model = build_model()

EPOCHS = 10

early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10)

early_history = model.fit(normed_train_data, train_labels,

                    epochs=EPOCHS, validation_split = 0.2,

                    callbacks=[early_stop])

model.save(BUCKET + ‘/model’)

Summary of train.py

When all of the necessary packages are imported, TensorFlow 2.6 will be used for modelling. The pandas command will be used to read the stored csv file in the vertex-ai-custom-ml bucket, and the BUCKET variable will be used to specify the bucket where we will store the train model.

We are doing some transformation such as creating dummy variable for the categorical column. Next, we are splitting the data into training and testing and normalizing the data.

We wrote a function called build_model that includes a simple two-layer tensor flow model. The model will be constructed using ten EPOCHS. We have to save the model in the crab-age-pred-bucket/model file on Data storage and see it has been educated.

Now, in the JupyterLab Terminal, execute the following cmd one by one to create a Docker IMAGE.

PROJECT_ID=crab-age-pred

IMAGE_URI=”gcr.io/$ PROJECT_ID/crab:v1”

docker build ./ -t $IMAGE_URI

Before running the build command make sure to enable the Artifact Registry API and Google Container Registry API by going to the APIs and services in Vertex AI.

After running the CMD our Docker Image is built successfully. Now we will push the docker IMAGE with following cmd.

docker push $IMAGE_URI

Once pushed we can see our Docker IMAGE in the Container registry. To find the Container registry you can search it on Vertex AI.

Best Read: Our success story about how we assisted an oil and gas company, as well as Nested Tables and Machine Drawing Text Extraction

Step 3: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In Datasets select no managed dataset and click continue.

In Model details I have given the model’s name as “pred-age-crab” and in advance option select the available service account. For rest keep default. Make sure that the service account has the cloud storage permissions if not give the permissions from IAM and Admin section.

Select the custom container for the Container image in the Training container. Navigate to and select the newly created Docker image. Next, navigate to and select the crab-age-pred-bucket in the Model output directory. Now press the continue button.

Ignore any selections for Hyperparameters and click Continue.

In Compute and pricing, Select the machine type n1-standard-32, 32 vCPUs, 120 GiB memory and hit continue.

For Prediction Container select Pre-Built container with TensorFlow Framework 2.6 and start the model training.

You can see the model in training in the Training section.

In about 8 minutes, our custom model training is finished.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. The region should be us-central1.

Give crab_age_pred as the name of Endpoint and click Continue.

In the Model Settings select pred_age_crab as Model NameVersion 1 as Version and 2 as number of compute nodes, n1-standard-8, 8 vCPUs, 30 GiB memory as Machine Type and select service account. Click Done and Create.

In Model monitoring ignore this selection and click create to implement the version.

It may take 11 minutes to deploy the model.

With the above step our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can make predictions. For this project we are going to use Python to make predictions. We will need to give the Vertex AI Admin and Cloud Storage Admin permissions to the service account. We can do that in the IAM and administration section of Google cloud. Once the permissions are given, we will download the key of the service account in JSON format, it will be useful in authenticating the OS.

Following is the code used for the prediction.

pip install google-cloud-aiplatform

from typing import Dict

from google.cloud import aiplatform

from google.protobuf import json_format

from google.protobuf.struct_pb2 import Value

import os

def predict_tabular_sample(

    project: str,

    endpoint_id: str,

    instance_dict: Dict,

    location: str = “us-central1”,

    api_endpoint: str = “us-central1-aiplatform.googleapis.com”):

    # The AI Platform services require regional API endpoints.

    client_options = {“api_endpoint”: api_endpoint}

    # Initialize client that will be used to create and send requests.

    # This client only needs to be created once, and can be reused for multiple requests.

    client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

    # for more info on the instance schema, please use get_model_sample.py

    # and look at the yaml found in instance_schema_uri

    instance = json_format.ParseDict(instance_dict, Value())

    instances = [instance]

    parameters_dict = {}

    parameters = json_format.ParseDict(parameters_dict, Value())

    endpoint = client.endpoint_path(

        project=project, location=location, endpoint=endpoint_id

    )

    response = client.predict(

        endpoint=endpoint, instances=instances, parameters=parameters

    )

    predictions = response.predictions

    print(predictions)

#Authentication using service account.

#We are giving the path to the JSON key

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/crab-age-pred-7c1b7d9be185.json”

#normalized values

inputs =[0,0,1,1.4375,1.175,0.4125,0.63571550,0.3220325,1.5848515,0.747181]

<emstyle=”color:blue;”>project_id = “crab-age-pred”                         #Project Id from the Vertex AI</emstyle=”color:blue;”>

endpoint_id = 7762332189773004800                    #Endpoint Id from the Enpoints Section

predict_tabular_sample(project_id,endpoint_id,inputs)

Output

[[8.01214314]]

This is how we can make the predictions. For the inputs make sure to do the same transformation and normalizing which we have done for the training data.

With this we have completed the project and learned how to train, deploy and to get predictions of the custom trained ML model.

I hope you will find it useful.

See you again.

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

]]>
Overview of Big Query’s Unique feature, BQML with a regression model example https://www.indiumsoftware.com/blog/overview-of-big-querys-unique-feature-bqml-with-a-regression-model-example/ Thu, 02 Feb 2023 13:20:43 +0000 https://www.indiumsoftware.com/?p=14393 In this Blog you are going to see what Big Query is, its best feature of Big Query (BQML), Areas of BQML, with clear example to understand its easiness of building machine learning model with simple SQL code. The blog will go through the following topics: Let’s dive into the article., What is Big Query?

The post Overview of Big Query’s Unique feature, BQML with a regression model example appeared first on Indium.

]]>
In this Blog you are going to see what Big Query is, its best feature of Big Query (BQML), Areas of BQML, with clear example to understand its easiness of building machine learning model with simple SQL code.

The blog will go through the following topics:

  • What is Big Query?
  • Best features of Big Query?
  • Why BQML? Areas of BQML?
  • Regression model to show efficiency of BQML

Let’s dive into the article.,

What is Big Query?

With built-in technologies like machine learning, business intelligence and geospatial analysis, Big Query is a managed service data management warehouse that can enable you to manage and analyse your data. With no need for infrastructure administration, Big Query’s serverless architecture enables you to leverage SQL queries to tackle the most critical issues facing your company. You may query data in terabytes in a matter of seconds and petabytes of data in a matter of minutes thanks to Big Query’s robust, distributed analytical engine.

Best features of Big Query?

Built-in ML Integration (BQ ML), Multi cloud Functionality (BQ Omni), Geospatial Analysis (BQ GIS), Foundation for BI (BQ BI Engine), Free Access (BQ Sandbox), Automated Data Transfer (BQ Data Transfer Service). These are the amazing features of BQ, in this blog we will discuss the most amazing feature of Big Query which is Big Query ML.

*An amazing feature of Big Query is Big Query ML,

Big Query ML allows you to use standard SQL queries to develop and run machine learning models in Big Query. Machine learning on huge datasets requires extensive programming and ML framework skills. These criteria restrict solution development within each organization to a small group of people, and they exclude data analysts who understand the data but lack machine learning and programming skills. This is where Big Query ML comes in handy; it allows data analysts to employ machine learning using their existing SQL tools and skills. Big Query ML allows analysts to create and evaluate machine learning models in Big Query with large volumes of data.

For more information on Big Query Machine Learning services and solutions

Contact us today

Why BQML?

The major advantages I’ve identified using BQML

  • There is no need to read your data from local memory because, like any other ML language, BQML can subsample your dataset, but BQML can also train your model directly in your database.
  • Working in SQL can help you collaborate more easily if you’re working in a team and the majority of your teammates don’t know Python, R, or your favourite modelling language. 
  • Because your model will be in the same location as your data, you can serve it immediately after it has been trained and make predictions directly from it.

Areas we can use BQML

  • Retail Industry (Demand forecasting, Customer segmentation, Propensity to purchase or propensity to click on item, Product recommendations by emails and ads).
  • Logistics Industry (Time estimation of package delivery, Predictive maintenance).
  • Finance Industry (Product recommendations by emails and ads, Product recommendations by emails and ads, Product recommendations by emails and ads, Product recommendations by emails and ads).
  • Gaming Industry (Content recommendation, Predicting churn customers).

 Another blog worth reading: Databricks Overview, Why Databricks, and More

Regression model to show efficiency of BQML

  • For this we will build a linear regression model to predict the house prices in the USA, as it is best fit to predict the value of one variable using another. Also, for understanding about model working in the article I am using example of regression model as it is simpler to communicate how the model itself works and interpret results.
  • With the USA housing dataset, we will see how efficient and easy Big Query ML feature is to build machine learning linear regression model with SQL code.

Step 1: Creating the Model

CREATE OR REPLACE MODEL

`testproject-351804.regression.house_prices2` OPTIONS(model_type = ‘linear_reg’, input_label_cols = [‘price’],l2_reg = 1, early_stop = false, max_iterations = 12, optimize_strategy = ‘batch_gradient_descent’) ASSELECT avg_house_age, avg_rooms, avg_bedrooms, avg_income, population, price/100000 AS priceFROM `regression.usa_housing_train`

SELECT avg_house_age, avg_rooms, avg_bedrooms, avg_income, population, price/100000 AS price FROM  `regression.usa_housing_train

Model creation

  • The above code will create and train the model.
  • With the simple CREATE MODEL function we can create the ML model, you need to specify the OPTIONS, we need basically only model_type and input_label_cols(predicting variable) to create the model but why I used other OPTIONS, you will see in evaluation section.

Step 2: Evaluating the Model

SELECT * FROM ML.EVALUATE(MODEL `regression.house_prices2`,TABLE ` testproject- 351804._8b41b9f5a2e85d72c62e834e3e9dd60a58ba542d.anoncb5de70d_1e3d_4213_8c5d_bb10d6b9385b_imported_data_split_eval_data`)

Model Evaluation

  • We have to see how well our model is performing by using ML.EVALUATE funtion, So now we will see why I used other OPTIONS in creating the model,
  • First I created a model in BigQuery ML, with model options model_type= ‘linear_reg’ and input_label_cols = ‘price’ but while evaluating the model “r square” is only 0.3 which I felt less accurate and I came to know that model is overfitt by seeing huge difference between the training loss and evaluation loss.
  • So, as a solution I added options in creating model, used L2 regularization to overcome overfitt and generalize the model to adapt the data points and changed values for three times to made it generalize and after the r square is 0.92 with above 90% accuracy.

*We need to look upon R-Squared, which is Coefficient of determination. Higher is better.

Step 3: Predicting the Model


The model’s prediction process is as simple as calling ML.PREDICT

SELECT * FROM ML.PREDICT (Model `regression.house_prices2`,TABLE `regression.usa_housing_predict`)

Model Prediction

See, how efficient is Big Query ML feature of Big Query, it predicted the house prices basing upon the trained data of avg_house_age, avg_rooms, avg_bedrooms, avg_income, avg_population.

Summary

Now you know how to create linear regression models in BigQuery ML. We have discussed how to build a model, assess it, apply it to make predictions, and analyse model coefficients.

In next coming blogs you will see other unique features of Big Query like Geospatial Analytics and Array/Structs.

Happy Reading

Hope you find this useful.

The post Overview of Big Query’s Unique feature, BQML with a regression model example appeared first on Indium.

]]>
Kubeflow Pipeline on Vertex AI for Custom ML Models https://www.indiumsoftware.com/blog/kubeflow-pipeline-on-vertex-ai-for-custom-ml-models/ Thu, 02 Feb 2023 11:56:32 +0000 https://www.indiumsoftware.com/?p=14381 What is Kubeflow? “Kubeflow is an open-source project created to help deployment of ML pipelines. It uses components as python functions for each step of pipeline. Each component runs on the isolated container with all the required libraries. It runs all the components in the series one by one.” In this article we are going

The post Kubeflow Pipeline on Vertex AI for Custom ML Models appeared first on Indium.

]]>
What is Kubeflow?

“Kubeflow is an open-source project created to help deployment of ML pipelines. It uses components as python functions for each step of pipeline. Each component runs on the isolated container with all the required libraries. It runs all the components in the series one by one.”

In this article we are going to train a custom machine learning model on Vertex AI using Kubeflow Pipeline.

About Dataset

Credit Card Customers dataset from Kaggle will be used. The 10,000 customer records in this dataset include columns for age, salary, marital status, credit card limit, credit card category, and other information. In order to predict the customers who are most likely to leave, we must analyse the data to determine the causes of customer churn.

Interesting Read: In the world of hacking, we’ve reached the point where we’re wondering who is a better hacker: humans or machines.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from GitHub. There are two csv files in the downloaded dataset called churner_p1 and churner_p2, I have created a Big Query dataset credit_card_churn with the tables as churner_p1 and churner_p2 with this csv files. I have also created the bucket called credit-card-churn on Cloud Storage. This bucket will be used to store the artifacts of the pipeline

Step 2: Employing Workbench

Enable the Notebook API by going to Vertex AI and then to the Workbench section. Then select Python 3 by clicking on New Notebook. Make sure to choose the us-central1 region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

We will also have to enable the following APIs from API and services section of Vertex AI.

  1. Artifact Registry API
  2. Container Registry API
  3. AI Platform API
  4. ML API
  5. Cloud Functions API
  6. Cloud Build API

Now click on the Python 3 to open a jupyter notebook in the JupyterLab Notebook section and run the below code cells.

USER_FLAG = “–user”

!pip3 install {USER_FLAG} google-cloud-aiplatform==1.7.0

!pip3 install {USER_FLAG} kfp==1.8.9

This will install google cloud AI platform and Kubeflow packages. Make sure to restart the kernel after the packages are installed.

import os

PROJECT_ID = “”

# Get your Google Cloud project ID from gcloud

if not os.getenv(“IS_TESTING”):

    shell_output=!gcloud config list –format ‘value(core.project)’ 2>/dev/null

    PROJECT_ID = shell_output[0]

    print(“Project ID: “, PROJECT_ID)

Create the variable PROJECT_ID with the name of project.

BUCKET_NAME=”gs://” + PROJECT_ID

BUCKET_NAME

Create the variable BUCKET_NAME, this will return the same bucket name we have created earlier.

import matplotlib.pyplot as plt

import pandas as pd

from kfp.v2 import compiler, dsl

from kfp.v2.dsl import pipeline, component, Artifact, Dataset, Input, Metrics, Model, Output, InputPath, OutputPath

from google.cloud import aiplatform

# We’ll use this namespace for metadata querying

from google.cloud import aiplatform_v1

PATH=%env PATH

%env PATH={PATH}:/home/jupyter/.local/bin

REGION=”us-central1″

PIPELINE_ROOT = f”{BUCKET_NAME}/pipeline_root/”

PIPELINE_ROOT

This will import required packages and create the pipeline folder in the credit-card-churn bucket.

#First Component in the pipeline to fetch data from big query.

#Table1 data is fetched

@component(

    packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

    base_image=”python:3.9″,

    output_component_file=”dataset_creating_1.yaml”

)

def get_data_1(

   bq_table: str,

   output_data_path: OutputPath(“Dataset”)

):

    from google.cloud import bigquery

    import pandas as pd

    bqclient = bigquery.Client()

   table = bigquery.TableReference.from_string(

      bq_table

    )

    rows = bqclient.list_rows(

        table

    )

   dataframe = rows.to_dataframe(

        create_bqstorage_client=True,

    )

   dataframe.to_csv(output_data_path)

The first component of the pipeline will fit the data from the table churner_p1 from big query and pass the csv file as the output for the next component. The structure is the same for every component. We have used the @component decorator to install the required packages and specify the base image and output file, then we create the get_data_1 function to get the data from big query.

#Second Component in the pipeline to fetch data from big query.

#Table2 data is fetched

#First component and second component doesnt need inputs from any components

@component(

    packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

    base_image=”python:3.9″,

    output_component_file=”dataset_creating_2.yaml”

)

def get_data_2(

    bq_table: str,

    output_data_path: OutputPath(“Dataset”)

):

   from google.cloud import bigquery

   import pandas as pd

    bqclient = bigquery.Client()

   table = bigquery.TableReference.from_string(

       bq_table

    )

   rows = bqclient.list_rows(

        table

    )

    dataframe = rows.to_dataframe(

        create_bqstorage_client=True,

    )

    dataframe.to_csv(output_data_path)

The second component of the pipeline will fit the data from the table churner_2 from big query and pass the csv file as the output for the next component. The first component and second component do not need inputs from any components.

#Third component in the pipeline to to combine data from 2 sources and for some data transformation

@component(

    packages_to_install=[“sklearn”, “pandas”, “joblib”],

   base_image=”python:3.9″,

  output_component_file=”model_training.yaml”,

)

def data_transformation(

    dataset1: Input[Dataset],

    dataset2: Input[Dataset],

    output_data_path: OutputPath(“Dataset”),

):

    from sklearn.metrics import roc_curve

    from sklearn.model_selection import train_test_split

    from joblib import dump

    from sklearn.metrics import confusion_matrix

    from sklearn.tree import DecisionTreeClassifier

    from sklearn.ensemble import RandomForestClassifier

   import pandas as pd

    data1 = pd.read_csv(dataset1.path)

    data2 = pd.read_csv(dataset2.path)

    data=pd.merge(data1, data2, on=’CLIENTNUM’, how=’outer’)

    data.drop([“CLIENTNUM”],axis=1,inplace=True)

   data = data.dropna()

   cols_categorical = [‘Gender’,’Dependent_count’, ‘Education_Level’, ‘Marital_Status’,’Income_Category’,’Card_Category’]

    data[‘Attrition_Flag’] = [1 if cust == “Existing Customer” else 0 for cust in data[‘Attrition_Flag’]]

    data_encoded = pd.get_dummies(data, columns = cols_categorical)

    data_encoded.to_csv(output_data_path)

The third component is where we have combined the data from the first and second component and did the data transformation such as dropping the “CLIENTNUM” column, dropping the null values and converting the categorical columns into numerical. we will pass this transformed data as csv to the next component.

#Fourth component in the pipeline to train the classification model using decision Trees or Randomforest

@component(

    packages_to_install=[“sklearn”, “pandas”, “joblib”],

    base_image=”python:3.9″,

    output_component_file=”model_training.yaml”,

)

def training_classmod(

    data1: Input[Dataset],

   metrics: Output[Metrics],

    model: Output[Model]

):

    from sklearn.metrics import roc_curve

    from sklearn.model_selection import train_test_split

    from joblib import dump

    from sklearn.metrics import confusion_matrix

    from sklearn.ensemble import RandomForestClassifier

    import pandas as pd

    data_encoded=pd.read_csv(data1.path)

    X = data_encoded.drop(columns=[‘Attrition_Flag’])

    y = data_encoded[‘Attrition_Flag’]

   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100,stratify=y)

   model_classifier = RandomForestClassifier()

    model_classifier.fit(X_train,y_train)

    y_pred=model_classifier.predict(X_test)

    score = model_classifier.score(X_test,y_test)

    print(‘accuracy is:’,score)

    metrics.log_metric(“accuracy”,(score * 100.0))

    metrics.log_metric(“model”, “RandomForest”)

    dump(model_classifier, model.path + “.joblib”)

In the fourth component we will train the model with the Random Classifier and we have used the “accuracy” as the evaluation metric.

@component(

   packages_to_install=[“google-cloud-aiplatform”],

    base_image=”python:3.9″,

    output_component_file=”model_deployment.yaml”,

)

def model_deployment(

    model: Input[Model],

    project: str,

    region: str,

    vertex_endpoint: Output[Artifact],

   vertex_model: Output[Model]

):

    from google.cloud import aiplatform

   aiplatform.init(project=project, location=region)

    deployed_model = aiplatform.Model.upload(

        display_name=”custom-model-pipeline”,

      artifact_uri = model.uri.replace(“model”, “”),

        serving_container_image_uri=”us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest”

    )

    endpoint = deployed_model.deploy(machine_type=”n1-standard-4″)

    # Save data to the output params

    vertex_endpoint.uri = endpoint.resource_name

    vertex_model.uri = deployed_model.resource_name

Fifth component is the last component, in this we will create the endpoint on the Vertex AI and deploy the model. We have used Docker as base IMAGE and have deployed the model on “n1-standard-4” machine.

@pipeline(

    # Default pipeline root. You can override it when submitting the pipeline.

    pipeline_root=PIPELINE_ROOT,

    # A name for the pipeline.

    name=”custom-pipeline”,

)

def pipeline(

   bq_table_1: str = “”,

    bq_table_2: str = “”,

    output_data_path: str = “data.csv”,

    project: str = PROJECT_ID,

    region: str = REGION

):

    dataset_task_1 = get_data_1(bq_table_1)

   dataset_task_2 = get_data_2(bq_table_2)

   data_transform=data_transformation(dataset_task_1.output,dataset_task_2.output)

    model_task = training_classmod(data_transform.output)

    deploy_task = model_deployment(model=model_task.outputs[“model”],project=project,region=region)

In the last we have pipeline function which will call all the components in the sequential manner: dataset_tast_1 and dataset_tast_2 will get the data from the big query, data_transform will transform the data, model_task will train the Random Classifier model and deploy_task will deploy the model on Vertex AI.

compiler.Compiler().compile(pipeline_func=pipeline, package_path=”custom-pipeline-classifier.json”)

Compiling the pipeline.

run1 = aiplatform.PipelineJob(

    display_name=”custom-training-vertex-ai-pipeline”,

    template_path=”custom-pipeline-classifier.json”,

    job_id=”custom-pipeline-rf8″,

   parameter_values={“bq_table_1”: “credit-card-churn.credit_card_churn.churner_p1″,”bq_table_2”: “credit-card-churn.credit_card_churn.churner_p2”},

   enable_caching=False,)

Creating the pipeline job.

run1.submit()

Running the pipeline job.

With this we have completed creating the Kubeflow pipeline and we can see it on the Pipelines section of Vertex AI.

 

Our Pipeline has run successfully and we have managed to get 100% accuracy for the classification.

We can use this model to get the online prediction using Rest API or Python. We can also create the different pipelines and compare their metrics on Vertex AI.

With this we have completed the project and learned how to create the Pipeline on Vertex AI for custom train models.

I hope you will find it useful.

To learn more about our AI & ML Solutions and Capabilities

Contact Us

See you again.

The post Kubeflow Pipeline on Vertex AI for Custom ML Models appeared first on Indium.

]]>
Data Modernization with Google Cloud https://www.indiumsoftware.com/blog/data-modernization-with-google-cloud/ Thu, 12 Jan 2023 11:42:20 +0000 https://www.indiumsoftware.com/?p=14041 L.L. Bean was established in 1912. It is a Freeport, Maine-based retailer known for its mail-order catalog of boots. The retailer runs 51 stores, kiosks, and outlets in the United States. It generates US $1.6 billion in annual revenues, of which US $1billion comes from its e-commerce engine. This means, delivery of a great omnichannel

The post Data Modernization with Google Cloud appeared first on Indium.

]]>
L.L. Bean was established in 1912. It is a Freeport, Maine-based retailer known for its mail-order catalog of boots. The retailer runs 51 stores, kiosks, and outlets in the United States. It generates US $1.6 billion in annual revenues, of which US $1billion comes from its e-commerce engine. This means, delivery of a great omnichannel customer experience is a must and an essential part of its business strategy. But the retailer faced a significant challenge in sustaining its seamless omnichannel experience. It was relying on on-premises mainframes and distributed servers which made upgradation of clusters and nodes very cumbersome. It wanted to modernize its capabilities by migrating to the cloud. Through cloud adoption, it wanted to improve its online performance, accelerate time to market, upgrade effortlessly, and enhance customer experience.

L.L. Bean turned to Google Cloud to fulfill its cloud requirements. By modernizing data on, it experienced faster page loads and it was able to access transaction histories more easily. It also focused on value addition instead of infrastructure management. And, it reduced release cycles and rapidly delivered cross-channel services. These collectively improved its overall delivery of agile, cutting-edge customer experience.

Data Modernization with Google Cloud for Success

Many businesses that rely on siloed data find it challenging to make fully informed business decisions, and in turn accelerate growth. They need a unified view of data to be able to draw actionable, meaningful insights that can help them make fact-based decisions that improve operational efficiency, deliver improved services, and identify growth opportunities. In fact, businesses don’t just need unified data. They need quality data that can be stored, managed, scaled and accessed easily.

Google Cloud Platform empowers businesses with flexible and scalable data storage solutions. Some of its tools and features that enable this include:

BigQuery

This is a cost-effective, serverless, and highly scalable multi-cloud data warehouse that provides businesses with agility.

Vertex AI

This enables businesses to build, deploy, and scale ML models on a unified AI platform using pre-trained and custom tooling.

Why should businesses modernize with Google Cloud?

It provides faster time to value with serverless analytics, it lowers TCO (Total Cost of Ownership) by up to 52%, and it ensures data is secure and compliant.

Read this informative post on Cloud Cost Optimization for Better ROI.

Google Cloud Features

Improved Data Management

BigQuery, the serverless data warehouse from Google Cloud Platform (GCP), makes managing, provisioning, and dimensioning infrastructure easier. This frees up resources to focus on the quality of decision-making, operations, products, and services.

Improved Scalability

Storage and computing are decoupled in BigQuery, which improves availability and scalability, and makes it cost-efficient.

Analytics and BI

GCP also improves website analytics by integrating with other GCP and Google products. This helps businesses get a better understanding of the customer’s behavior and journey. The BI Engine packaged with BigQuery provides users with several data visualization tools, speeds up responses to queries, simplifies architecture, and enables smart tuning.

Data Lakes and Data Marts

GCP’s enables ingestion of batch and stream/real-time data, change data capture, landing zone, and raw data to meet other data needs of businesses.

Data Pipelines

GCP tools such as Dataflow, Dataform, BigQuery Engine, Dataproc, DataFusion, and Dataprep help create and manage even complex data pipelines.

Discover how Indium assisted a manufacturing company with data migration and ERP data pipeline automation using Pyspark.

Data Orchestration

For data orchestration too, GCP’s managed or serverless tools minimize infrastructure, configuration, and operational overheads. Workflows is a popular tool for simple workloads while Cloud Composer can be used for more complex workloads.

Data Governance

Google enables data governance, security, and compliance with tools such as Data Catalog, that facilitates data discoverability, metadata management, and data class-level controls. This helps separate sensitive and other data within containers. Data Loss Prevention and Identity Access Management are some of the other trusted tools.

Data Visualization

Google Cloud Platform provides two fully managed tools for data visualization, Data Studio and Looker. Data Studio is free and transforms data into easy-to-read and share, informative, and customizable dashboards and reports. Looker is flexible and scalable and can handle large data and query volumes.

ML/AI

Google Cloud Platform leverages Google’s expertise in ML/AI and provides Managed APIs, BigQuery ML, and Vertex AI. Managed APIs enable solving common ML problems without having to train a new model or even having technical skills. Using BigQuery, models can be built and deployed based on SQL language. Vertex AI, as already seen, enables the management of the ML product lifecycle.

Indium to Modernize Your Data Platform With GCP

Indium Software is a recognized data and cloud solution provider with cross domain expertise and experience. Our range of services includes data and app modernization, data analytics, and digital transformation across the various cloud platforms such as Amazon Web Server, Azure, Google Cloud. We work closely with our customers to understand their modernization needs and align them with business goals to improve the outcomes for faster growth, better insights, and enhanced operational efficiency.

To learn more about Indium’s data modernization and Google Cloud capabilities.

Visit

FAQs

What Cloud storage tools and libraries are available in Google Cloud?

Along with JSON API and the XML API, Google also enables operations on buckets and objects. Google cloud storage commands provide a command-line interface with cloud storage in Google Cloud CLI. Programmatic support is also provided for programming languages, such as Java, Python, and Ruby.

The post Data Modernization with Google Cloud appeared first on Indium.

]]>