Data and Analytics Archives - Indium

Accelerating Data-Driven Decisions: Empowering Enterprises with Real-Time Insights using Striim

Abishek Balakumar — Wed, 28 Jun 2023 12:37:38 +0000

McKinsey’s report, ‘The Data-Driven Enterprise of 2025’, points out how though organizations apply data-driven approaches such as predictive analytics and AI-driven automation, it is still sporadic, ineffective, and time-consuming. By 2025, all employees will leverage data more uniformly using innovative data techniques that would help solve problems faster.

This will help to effect continuous improvement in performance and create differentiated experiences for customers and employees. It will also enable accelerated development of innovative new solutions.

McKinsey also identifies the current challenges to optimizing data sources as

Limited capabilities of legacy technologies
Challenges in modernizing the architecture.
Demand for high computational resources for real-time processing jobs

This results in only a small part of the data from connected devices being leveraged. As companies balance speed and computational intensity, they are unable to perform complex analyses or implement real-time use cases.

Getting the right data technologies to ingest, process, analyze, and visualize in real-time is going to be a game-changer in improving decision-making, enhancing customer experience, and accelerating growth.

Improved Decision Making

Real-time data is critical for conducting real-time analytics, which helps with faster decision-making. Data is collected from a variety of sources, including sensors, databases, operational systems, cameras, and social media feeds with minimal delay and processed and analyzed quickly. They could be alerts and notifications or inputs from user behavior.

Real-time data can be of two types:

Event Data: The generation of a collection of data points based on well-defined conditions within a system.
Stream Data: The continuous generation of a large volume of data without any identifiable beginning or end.

Easy access to data in real-time data enables a quick drawing of insights to make informed decisions and be responsive as events unfold. It helps with capturing trends, both past, and present, and can be analyzed in real-time to decide on the next course of action.

Some of the benefits of real-time data include

Being Proactive

In the absence of real-time data, there is a lag between insights and responses. This reactive approach can prove costly, resulting in losing customers or production-related issues escalating. Real-time data analytics allows enterprises to proactively approach developments and respond appropriately.

Enhance Customer Experience

Visibility and transparency have become key in several client-business relationships. It helps improve decision-making based on project status and enhances customer experience and retention. Responding to customer requirements and empowering them with information in real-time further strengthens the relationship between the two.

To know more about how Indium can help you, please check out more.

Unify Data

Different teams end up creating data silos to suit their requirements. This can distort the view when making strategic decisions at the enterprise level and delay the process. A cloud-based data streaming solution helps to provide a unified view in real-time while allowing different teams access to secure and permission-based data they need to make decisions for their department.

Improve Operational Excellence

Real-time data allows you to manage your organization’s assets proactively. It lets you plan downtimes for maintenance and repair, improves the life of the assets, and take timely steps to replace, where needed, with minimum disruption to operations. This naturally leads to a better quality of products and services and improved profit margins as it lowers overheads.

Striim Power For Real-time Data Analytics

The Striim unified real-time data integration and streaming platform unifies data across multiple sources and targets. It offers built-in adapters and supports more than 125 sources and targets, enabling the management of multiple data pipelines in a Striim cluster. Striim 4.1 offers features such as OJet to let customer applications read multi-terabytes of data per day and a high-performance Oracle Change Data Capture (CDC) reader. It also sends real-time alerts and notifications to identify emerging workload patterns and facilitates collaboration between developers and database administrators.

Striim users can build smart real-time data pipelines quickly for streaming large volumes of events daily. It is scalable and secure, and the features are highly available. It is easy to maintain and allows the rapid adoption of new cloud models, infrastructure modernization, and digitalizing legacy systems.

Striim enables data integration using a streaming-first approach, supporting incremental, real-time views in the cloud database and the streaming layer. It includes Streaming SQL to facilitate real-time analytics of data, as also train machine learning models in real-time.

Business analysts, data scientists, and data engineers can use Streaming SQL to build data pipelines quickly and without the need for custom coding. Striim also allows data movement in real-time, because of which stream processing applications need to operate continuously for years. These further speeds up decision-making as insights can be drawn quickly, without latency between receiving the data and running analytics on it.

Check out our case study on real-time data analytics

Case and Point: Simplifying Healthcare Predictions in 3 Expert Steps

Understanding Symptom Patterns: Our first step involves data acquisition and thorough analysis of historical patient data. We tap into the treasure trove of symptoms, medical records, and outcomes to discern intricate patterns that might remain hidden from traditional analysis.

Feature Engineering with Domain Knowledge: With a team of domain experts, we transform raw symptom data into meaningful features. These features are carefully curated to capture the nuances of various symptoms, their interplay, and potential implications. Our domain knowledge empowers us to create a robust feature set that forms the foundation of accurate predictions.

Advanced Machine Learning Models: Equipped with a rich feature set, we employ advanced machine learning models. From ensemble methods to deep learning architectures, we evaluate and fine-tune models that can effectively map symptoms to probable outcomes. This step requires rigorous experimentation to ensure optimal model performance.

The utilization of Symptom Pattern Analysis, Feature Engineering, and Advanced Machine Learning Models in the healthcare domain, along with Indium’s implementation of Striim for real-time data migration and processing, brings substantial and quantifiable business value to the table.

Healthcare Providers: Reduced diagnosis time through rapid predictions – from days to hours, thereby accelerating patient care. Enhanced efficiency with streamlined operations leads to quicker decisions and resource allocation. Improved patient care is achieved through early intervention based on predictions, resulting in improved treatment outcomes. Informed resource allocation provides predictive insights that optimize staff schedules, room usage, and equipment availability. Optimized treatment plans driven by personalized treatments yield better outcomes and patient satisfaction. The Cost savings achieved through fewer hospital stays, reduced redundant tests, and efficient resource use contribute to lowering costs. This not only benefits the patients but also benefits the providers by optimizing their resources.

Healthcare Payers and Insurance Companies: The implementation offers a competitive edge for healthcare providers, attracting patients and enhancing the providers’ reputation due to quick and accurate diagnoses. This, in turn, leads to efficient resource utilization, potentially reducing the overall cost of treatments. Cost savings arising from reduced hospital stays and redundant tests contribute to lower healthcare expenditures, benefiting healthcare payers and insurance companies. Healthcare payers such as insurance companies can also reduce fraudulent claims as they will have access to patient diagnosis history in real-time.

Medical Researchers and Innovators: The curated data fosters research opportunities, facilitating medical insights and potential innovation generation. The advanced analytical capabilities of Symptom Pattern Analysis and Machine Learning Models open avenues for new discoveries and improvements in medical practices, benefiting the broader healthcare research community.

Overall, the integration of advanced technologies, real-time data processing, and predictive analytics in the healthcare domain offers benefits that extend to healthcare providers, payers, patients, and the research community. This synergy drives efficiency, quality of care, and cost-effectiveness, ultimately transforming healthcare delivery and outcomes.

Indium for Instant Decisions with Striim

Indium Software, a cutting-edge solution provider, has deep expertise in Striim implementation and can help businesses create exciting digital experiences for their customers.

A private sector bank offering specialized services to 9 million customers across various business verticals and with a presence global presence required data to be updated in real-time from its core banking systems to a reliable destination database for downstream analytics. By migrating the data from legacy systems to Striim in real time, Indium helped the customer improve its responsiveness and operational efficiency apart from other benefits.

Indium’s team of Striim experts have cross-domain experience and can provide custom-built solutions to meet the unique needs of our customers.

To know more about Indium’s Striim capabilities and solutions

Visit Here

FAQs

Is Striim an ETL tool?

The Striim platform offers customers the flexibility to use real-time ETL and ELT on data from multiple sources, including on-prem and cloud databases.

How does Striim use the database?

Striim ingests data from major enterprise databases using log-based change data capture (CDC). This lowers the performance load on the database while making data available even before it has been processed.

The post Accelerating Data-Driven Decisions: Empowering Enterprises with Real-Time Insights using Striim appeared first on Indium.

Domo for Dummies: A Guide to Creating Powerful Data Visualizations with Domo

Kavitha V Amara — Wed, 15 Feb 2023 14:53:49 +0000

Domo is a cloud-based business intelligence platform that offers a comprehensive solution for data management, analysis, and visualisation. It enables organisations to collect data from various sources, transform and cleanse the data, and present it in the form of interactive dashboards, reports, and charts. Domo enables businesses to make data-driven decisions-making and communicate insights effectively.

With the use of interactive dashboards, personalised graphics, and intuitive data visualisations, Domo lets you create your own stories. Furthermore, it provides data governance and role-based access controls to guarantee that users can only access the information that they are authorised to see. To further safeguard your data visualisations, you can quickly and easily set up single sign-on and multi-factor authentication.

This guide is intended to assist beginners in learning the fundamentals of Domo and creating powerful data visualizations that drive business results.

Getting Started with Domo

To get started with Domo, you must first create an account. To do so, visit the Domo Website and register for a free trial. After you’ve created your account, log in to access the Domo dashboard. Domo’s dashboard is the primary interface for interacting with data in the software. It displays a summary of all your data connections, reports, and dashboards. To make a new report or dashboard, go to the dashboard and click the “Create” button.

Given the volume of the client’s data, the current procedure made it necessary for the SAS Data Admin to manually intervene and made it necessary for key decision makers to wait three days before viewing useful processed data.

Read Our Success Story to find out how our team of professionals overcame challenges and helped the client with data visualisation of tailored analytics that were D3.js-programmed (Data-driven documents)

Click Here

Workflow for Domo Data Visualization

The workflow for data visualisation in Domo typically includes the following steps:

1. Data Connection

Domo offers numerous data connection options, including databases, spreadsheets, cloud-based applications, and APIs. To connect to a data source, go to the dashboard’s “Data” tab and then click on “Connections.”

Once you’ve selected your data source, follow the on-screen instructions to connect to it. You can also use Domo’s data connectors to bring in data from popular sources such as Salesforce, Google Analytics, and more.

2. Data Transformation

After connecting to your data source, you can begin transforming the data. To do so, go to the “Data” tab and then click “Transform.”

Domo offers a simple drag-and-drop interface for data transformation. Data transformation options include filtering, merging, and pivoting. You can also use the “Calculated Fields” feature to create custom calculations based on your data.

3. Data Visualization

Once your data has been transformed, you can start visualizing it. To do so, go to the dashboard’s “Visualize” tab and then click “Create.”

Domo provides a wide range of visualisation options, such as bar charts, line charts, pie charts, and more. You can also create custom visualisations based on your data by using the “Custom Visualizations” feature. Simply drag and drop the data elements you want to visualise into the visualisation builder to create a visualisation. To further customise your visualisation, you can add filters, calculated fields, and other data elements.

4. Creating Dashboards

Dashboards are an effective way to communicate insights and share data with others. To create a dashboard, go to the dashboard’s “Visualize” tab and click on “Dashboards.”

After you’ve created your dashboard, you can add visualizations, text, and images to create a comprehensive and interactive presentation of your data. You can also use the “Widgets” feature to add interactive elements to your dashboard such as charts, graphs, and maps.

5. Sharing and Collaborating

One of Domo’s most important features is its ability to share and collaborate on data. To share a report or dashboard, simply click the “Share” button.

You can share your report or dashboard with others by emailing them a link or embedding it in a web page. You can also limit who has access to the report or dashboard and what actions they can perform on it, such as viewing, editing, or commenting on it. This facilitates your teams to deliver data-driven collaboration and decision-making.

Business intelligence and data analytics depend heavily on data visualisation because this is how customers will see the outputs and outcomes they need. Check out this blog to learn more.

Domo vs. Tableau vs. Power BI

When comparing top BI tools, many factors must be considered. Keeping track of all business affairs is becoming increasingly difficult for any company, as large volumes of data pose a challenge to companies of all sizes. However, each of these three tools is capable of solving them in its own way.

Feature	Domo	Tableau	Power BI
Integration	Wide range of data sources, including databases, spreadsheets, cloud-based applications, and APIs	Wide range of data sources, including databases, spreadsheets, cloud-based applications, and APIs	Connects to a variety of data sources, including databases, spreadsheets, cloud-based applications, and APIs
Data Transformation	Simple and intuitive interface for transforming data	Advanced data transformation and cleaning capabilities	Limited data transformation options
Visualization	Wide range of visualization options, including bar charts, line charts, pie charts, and more	Advanced visualization options, including maps, graphs, and infographics	Limited visualization options, including bar charts, line charts, and pie charts
Customization	Flexible customization options, including custom calculations and visualization	Advanced customization options, including custom calculations and visualizations	Limited customization options
Collaboration	Sharing and collaboration features make it easy for teams to work together on data	Collaboration features, including version control and team sharing	Collaboration features, including team sharing and commenting
Mobile Access	Mobile access to data and visualizations	Mobile access to data and visualizations	Mobile access to data and visualizations
Security	Robust security measures to protect data and ensure privacy	Robust security measures to protect data and ensure privacy	Robust security measures to protect data and ensure privacy
Scalability	Cloud-based platform, making it easy to scale data management and analysis capabilities	Cloud-based platform, making it easy to scale data management and analysis capabilities	Cloud-based platform, making it easy to scale data management and analysis capabilitie

Domo: How Can It Benefit Your Business?

1. Better Decision Making: By providing a centralized platform for data analysis, visualization, and collaboration, Domo enables businesses to make data-driven decisions, leading to improved outcomes and increased efficiency.

2. Improved Insights: With its powerful visualization options and ability to integrate with a wide range of data sources, Domo helps businesses gain new insights into their data, enabling them to identify trends, patterns, and opportunities.

3. Increased Productivity: By automating manual data collection and cleaning processes, Domo saves businesses time and resources, freeing up employees to focus on more strategic tasks.

4. Better Collaboration: With its sharing and collaboration features, Domo makes it easy for teams to work together on data, improving communication and collaboration among team members.

5. Improved Data Management: With its centralized platform for data management and analysis, Domo makes it easier for businesses to keep track of their data, reducing the risk of data loss or corruption.

6. Increased Flexibility: Domo’s cloud-based platform and ability to integrate with a wide range of data sources makes it easy for businesses to adapt to changing needs, increasing their ability to respond quickly to new opportunities.

7. Better Data Security: With its robust security measures, Domo helps businesses protect their data and ensure the privacy of their information.

Overall, Domo helps businesses turn data into insights and drive results, enabling them to make informed decisions, improve outcomes, and stay ahead of the competition.

If you are still unsure about the answer you need, let our knowledgeable experts assist you.

Click Here

Conclusion

When comparing Domo vs. Tableau vs. Microsoft Power BI, consider who will be using these tools.

Power BI is designed for the joint stakeholder, not the data analyst. As a result, the interface is more reliant on drag and drop and automatic features.

Tableau is equally powerful, but the interface isn’t entirely intuitive, making it more difficult to use and learn. Domo, on the other hand, is ideal for businesses looking for an all-in-one cloud-based data and analytics solution. It is used by many customers to supplement existing solutions. Given its flexibility and breadth, domo is a good choice for any organization looking to get more value from its data.

Domo is a powerful data management, analysis, and visualisation tool. Domo’s user-friendly interface, extensive data connections, and powerful visualisation tools make it simple for businesses to turn data into insights and drive results. Domo is a valuable tool for any organisation looking to harness the power of data, whether you are a beginner or an experienced data analyst.

We’ll get to the next interesting topic soon.

I hope this was useful.

Happy Reading…!!!!!

The post Domo for Dummies: A Guide to Creating Powerful Data Visualizations with Domo appeared first on Indium.

Training Custom Machine Learning Model on Vertex AI with TensorFlow

Ganesh Ghadge — Fri, 03 Feb 2023 12:11:24 +0000

“Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

AutoML vs Custom Training

To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google.

Choose a training method | Vertex AI | Google Cloud

In this article we are going to train the Custom Machine Learning Model on Vertex AI with TensorFlow.

To know about Vertex AI’s AutoML feature read my previous blog : Machine Learning using Google’s Vertex AI.

About Dataset

We will be using Crab Age Prediction dataset from Kaggle. The dataset is used to estimate the age of the crab based on the physical attributes.

To learn more about how our AI and machine learning capabilities can assist you.

Click here

There are 9 columns in the Dataset as follows.

Sex: Crab gender (Male, Female and Indeterminate)
Length: Crab length (in Feet; 1 foot = 30.48 cms)
Diameter: Crab Diameter (in Feet; 1 foot = 30.48 cms)
Height: Crab Height (in Feet; 1 foot = 30.48 cms)
Weight: Crab Weight (in ounces; 1 Pound = 16 ounces)
Shucked Weight: Without Shell Weight (in ounces; 1 Pound = 16 ounces)
Viscera Weight: Viscera Weight
Shell Weight: Shell Weight (in ounces; 1 Pound = 16 ounces)
Age: Crab Age (in months)

We must predict the Age column with the help of the rest of the columns.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from Kaggle. There is only one csv file in the downloaded dataset called CrabAgePrediction.csv, I have uploaded this csv to the bucket called vertex-ai-custom-ml on Google Cloud Storage.

Step 2: Working on Workbench

Go to Vertex AI, then to Workbench section and enable the Notebook API. Then click on New Notebook and select TensorFlow Enterprise, we are using TensorFlow Enterprise 2.6 without GPU for the project. Make sure to select us-central1 (Iowa) region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

In the JupyterLabopen the Terminal and Run following cmd one by one.

mkdir crab_folder # This will create crab_folder

cd crab_folder # To enter the folder

mkdir trainer # This will create trainer folder

touch Dockerfile # This will create a Dockerfile

We can see all the files and folder on the left side of the JupyterLab, from that open the Dockerfile and start editing with following lines of code.

FROM gcr.io/deeplearning-platform_release/tf2-cpu.2-6

WORKDIR /

COPY trainer /trainer

ENTRYPOINT [“python”,”-m”,”trainer.train”]

Now save the Docker file and with this we have given the Entrypoint for the docker file.

To save the model’s output, we’ll make a bucket called crab-age-pred-bucket.

For the model training file, I have already uploaded the python file into the GitHub Repository. To clone this Repository, click on the Git from the top of JupyterLab and select Clone a Repository and paste the repository link and hit clone.

In the Lab, we can see the crab-age-pred folder; copy the train.py file from this folder to crab_folder/ trainer /.

Let’s look at the train.py file before we create the Docker IMAGE.

#Importing the required packages..

import numpy as np

import pandas as pd

import pathlib

import tensorflow as tf

#Importing tensorflow 2.6

from tensorflow import keras

from tensorflow.keras import layers

print(tf.__version__)

#Reading data from the gcs bucket

dataset = pd.read_csv(r”gs://vertex-ai-custom/CrabAgePrediction.csv”)

dataset.tail()

BUCKET = ‘gs://vertex-ai-123-bucket’

dataset.isna().sum()

dataset = dataset.dropna()

#Data transformation..

dataset = pd.get_dummies(dataset, prefix=”, prefix_sep=”)

dataset.tail()

#Dataset splitting..

train_dataset = dataset.sample(frac=0.8,random_state=0)

test_dataset = dataset.drop(train_dataset.index)

train_stats = train_dataset.describe()

#Removing age column, since it is a target column

train_stats.pop(“Age”)

train_stats = train_stats.transpose()

train_stats

#Removing age column from train and test data

train_labels = train_dataset.pop(‘Age’)

test_labels = test_dataset.pop(‘Age’)

def norma_data(x):

#To normalise the numercial values

return (x – train_stats[‘mean’]) / train_stats[‘std’]

normed_train_data = norma_data(train_dataset)

normed_test_data = norma_data(test_dataset)

def build_model():

#model building function

model = keras.Sequential([

layers.Dense(64, activation=’relu’, input_shape=[len(train_dataset.keys())]),

layers.Dense(64, activation=’relu’),

layers.Dense(1)

])

optimizer = tf.keras.optimizers.RMSprop(0.001)

model.compile(loss=’mse’,

optimizer=optimizer,

metrics=[‘mae’, ‘mse’])

return model

#model = build_model()

#model.summary()

model = build_model()

EPOCHS = 10

early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10)

early_history = model.fit(normed_train_data, train_labels,

epochs=EPOCHS, validation_split = 0.2,

callbacks=[early_stop])

model.save(BUCKET + ‘/model’)

Summary of train.py

When all of the necessary packages are imported, TensorFlow 2.6 will be used for modelling. The pandas command will be used to read the stored csv file in the vertex-ai-custom-ml bucket, and the BUCKET variable will be used to specify the bucket where we will store the train model.

We are doing some transformation such as creating dummy variable for the categorical column. Next, we are splitting the data into training and testing and normalizing the data.

We wrote a function called build_model that includes a simple two-layer tensor flow model. The model will be constructed using ten EPOCHS. We have to save the model in the crab-age-pred-bucket/model file on Data storage and see it has been educated.

Now, in the JupyterLab Terminal, execute the following cmd one by one to create a Docker IMAGE.

PROJECT_ID=crab-age-pred

IMAGE_URI=”gcr.io/$ PROJECT_ID/crab:v1”

docker build ./ -t $IMAGE_URI

Before running the build command make sure to enable the Artifact Registry API and Google Container Registry API by going to the APIs and services in Vertex AI.

After running the CMD our Docker Image is built successfully. Now we will push the docker IMAGE with following cmd.

docker push $IMAGE_URI

Once pushed we can see our Docker IMAGE in the Container registry. To find the Container registry you can search it on Vertex AI.

Best Read: Our success story about how we assisted an oil and gas company, as well as Nested Tables and Machine Drawing Text Extraction

Step 3: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In Datasets select no managed dataset and click continue.

In Model details I have given the model’s name as “pred-age-crab” and in advance option select the available service account. For rest keep default. Make sure that the service account has the cloud storage permissions if not give the permissions from IAM and Admin section.

Select the custom container for the Container image in the Training container. Navigate to and select the newly created Docker image. Next, navigate to and select the crab-age-pred-bucket in the Model output directory. Now press the continue button.

Ignore any selections for Hyperparameters and click Continue.

In Compute and pricing, Select the machine type n1-standard-32, 32 vCPUs, 120 GiB memory and hit continue.

For Prediction Container select Pre-Built container with TensorFlow Framework 2.6 and start the model training.

You can see the model in training in the Training section.

In about 8 minutes, our custom model training is finished.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. The region should be us-central1.

Give crab_age_pred as the name of Endpoint and click Continue.

In the Model Settings select pred_age_crab as Model Name, Version 1 as Version and 2 as number of compute nodes, n1-standard-8, 8 vCPUs, 30 GiB memory as Machine Type and select service account. Click Done and Create.

In Model monitoring ignore this selection and click create to implement the version.

It may take 11 minutes to deploy the model.

With the above step our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can make predictions. For this project we are going to use Python to make predictions. We will need to give the Vertex AI Admin and Cloud Storage Admin permissions to the service account. We can do that in the IAM and administration section of Google cloud. Once the permissions are given, we will download the key of the service account in JSON format, it will be useful in authenticating the OS.

Following is the code used for the prediction.

pip install google-cloud-aiplatform

from typing import Dict

from google.cloud import aiplatform

from google.protobuf import json_format

from google.protobuf.struct_pb2 import Value

import os

def predict_tabular_sample(

project: str,

endpoint_id: str,

instance_dict: Dict,

location: str = “us-central1”,

api_endpoint: str = “us-central1-aiplatform.googleapis.com”):

# The AI Platform services require regional API endpoints.

client_options = {“api_endpoint”: api_endpoint}

# Initialize client that will be used to create and send requests.

# This client only needs to be created once, and can be reused for multiple requests.

client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

# for more info on the instance schema, please use get_model_sample.py

# and look at the yaml found in instance_schema_uri

instance = json_format.ParseDict(instance_dict, Value())

instances = [instance]

parameters_dict = {}

parameters = json_format.ParseDict(parameters_dict, Value())

endpoint = client.endpoint_path(

project=project, location=location, endpoint=endpoint_id

)

response = client.predict(

endpoint=endpoint, instances=instances, parameters=parameters

)

predictions = response.predictions

print(predictions)

#Authentication using service account.

#We are giving the path to the JSON key

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/crab-age-pred-7c1b7d9be185.json”

#normalized values

inputs =[0,0,1,1.4375,1.175,0.4125,0.63571550,0.3220325,1.5848515,0.747181]

project_id = “crab-age-pred” #Project Id from the Vertex AI

endpoint_id = 7762332189773004800 #Endpoint Id from the Enpoints Section

predict_tabular_sample(project_id,endpoint_id,inputs)

Output

[[8.01214314]]

This is how we can make the predictions. For the inputs make sure to do the same transformation and normalizing which we have done for the training data.

With this we have completed the project and learned how to train, deploy and to get predictions of the custom trained ML model.

I hope you will find it useful.

See you again.

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

Machine Learning using Google’s Vertex AI

Ganesh Ghadge — Thu, 02 Feb 2023 10:38:31 +0000

Image by Google

What is Vertex AI?

“Vertex AI is Google’s platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

Image by Google

Features of Vertex AI

We use Vertex AI to perform the following tasks in the ML workflow

Creation of dataset and Uploading data
Training ML model
Evaluate model accuracy
Hyperparameters tuning (custom training only)
Storing model in Vertex AI.
Deploying trained model to endpoint for predictions.
Send prediction requests to endpoint.
Managing models and endpoints.

To know the workflow of Vertex AI we will train a Classification model “Dogs vs Cat” using Vertex AI’s AutoML feature.

Step 1: Creating Dataset

We will download the dataset from Kaggle. In the downloaded zip file there are two zip files train.zip and test.zip. Train.zip contains the labelled images for training.

There are about 25,000 images in the train.zip file and 12,500 in the test.zip file. For this project we will only use 200 cat and 200 dog images to train. We will use the test set to evaluate the performance of our model.

After extracting the data, I uploaded the images to the google cloud storage bucket called dogs_cats_bucket1 which I have created at us-central1 region. Images are stored in two folders train and test in the bucket.

Best Read: Top 10 AI Challenges

Now we need to create a csv file with the images address and label for that I have written the following lines of code.

from google.cloud import storage

import pandas as pd

import os

#Authentication using service account.

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/dogs-vs-cats-354105-19b7b157b2b8.json”

BUCKET=’dogs_cats_bucket1′

DELIMITER=’/’

TRAIN_PREFIX=’train/’

TRAIN_BASE_PATH = f’gs://{BUCKET}/{TRAIN_PREFIX}’

print(“Starting the import file generation process”)

print(“Process Details”)

print(f”BUCKET : {BUCKET}”)

storage_client = storage.Client()

data = []

print(“Fetchig list of Train objects”)

train_blobs = storage_client.list_blobs(BUCKET, prefix=TRAIN_PREFIX, delimiter=DELIMITER)

for blob in train_blobs:

label = “cat” if “cat” in blob.name else “dog”

full_path = f”gs://{BUCKET}/{blob.name}”

data.append({

‘GCS_FILE_PATH’: full_path,

‘LABEL’: label

})

df = pd.DataFrame(data)

df.to_csv(‘train.csv’, index=False, header=False)

After running the script on Jupyter Notebook, we have the required csv file, we will upload the file to the same storage bucket as well.

Now in the Vertex AI section go to Datasets and enable the Vertex AI API.

Click Create Dataset and name it. I have named it cat_dog_classification. We will select Image Classification (Single-label). Make sure the region is us-central1. Hit Create.

In the next section mark Select import files from Cloud Storage and select the train.csv from Browse. Hit Continue

Vertex AI tool 16 minutes to import data. Now we can see the data the Browse and Analyse tab.

Now we can train the model.

Step 2: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In the Dataset select cat_dog_classification and keep default for everything else with Model Training Method as AutoML.

Click continue for the Model Details and Explainability with the default settings.

For Compute and Pricing give 8 maximum node hours.

Hit Start Training.

The model training is completed after 29 mins.

Step 3: Model Evaluation

By clicking on trained model, it will take us to the model stats page. Where we have stats like Precision-recall curve, Precision-recall by threshold and Confusion matrix.

With the above stats the model looks good.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. Make sure the region is us-central1.

Give dogs_cats as the name of Endpoint and click Continue.

In the Model Settings select cat_dog_classification as Model Name, Version 1 as Version and 2 as number of compute nodes.

Click Done and Create.

It takes about 10 minutes to deploy the model.

With this our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can test the model by uploading the test image or creating Batch Prediction.

To Test the Model, we go to the Deploy and Test section on the Model page.

Click on the Upload Image to upload the test, Image.

With this we can see our model is working good on test images.

We can also connect to the Endpoint using Python and get the results.

For more details on our AI and ML services

Visit this link

This is the end of my blog. We have learned how to train an image classification model on Google’s Vertex AI using Auto ML feature. I have enjoyed every minute while working on it.

For the next article we will see how to train custom model on Vertex AI with TensorFlow.

Stay Tuned.

The post Machine Learning using Google’s Vertex AI appeared first on Indium.