ml Archives - Indium https://www.indiumsoftware.com/blog/tag/ml/ Make Technology Work Thu, 02 May 2024 04:44:22 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://www.indiumsoftware.com/wp-content/uploads/2023/10/cropped-logo_fixed-32x32.png ml Archives - Indium https://www.indiumsoftware.com/blog/tag/ml/ 32 32 Real-Time Data Analysis and its Impact on Healthcare https://www.indiumsoftware.com/blog/real-time-data-analysis-and-its-impact-on-healthcare/ Thu, 15 Feb 2024 07:30:46 +0000 https://www.indiumsoftware.com/?p=26187 In the grand scheme of things, it’s becoming increasingly evident that data is the new black gold. Industries across the board are awakening to the realization that data is no longer just an afterthought or an add-on; it’s an essential component of success. In the 19th century, oil was the lifeblood of the global economy

The post Real-Time Data Analysis and its Impact on Healthcare appeared first on Indium.

]]>

In the grand scheme of things, it’s becoming increasingly evident that data is the new black gold. Industries across the board are awakening to the realization that data is no longer just an afterthought or an add-on; it’s an essential component of success. In the 19th century, oil was the lifeblood of the global economy and politics. In the 21st century, data is controlled to take on the same critical role.

Of course, data in its raw and unrefined form is essentially useless. It’s only when data is skillfully gathered, integrated, and analyzed that it starts to unlock its actual value. This value can manifest in many ways, from enhancing decision-making capabilities to enabling entirely new business models. In the healthcare industry, data is playing a particularly pivotal role. Refined data is helping professionals make better-informed decisions, improve patient outcomes, and unlock new frontiers of medical research. The future of healthcare is all about data, and those who know how to wield it will undoubtedly emerge as leaders in the field.

However, healthcare providers’ timely access to real-time or just-in-time information can significantly enhance patient care, optimize clinician efficiency, streamline workflows, and reduce healthcare costs.

Investing in robust electronic health record (EHR) systems encompassing all clinical data is crucial for healthcare organizations to understand patient conditions and comprehensively predict patient outcomes.

Is Data a Real Game Changer in the Healthcare Industry?

The answer to whether the analytical application of existing data will shape the future of healthcare is a resounding “yes.” With advances in data-collecting tools and healthcare technology, we’re witnessing a new era of healthcare delivery that will revolutionize the industry.

Imagine a world where wearable medical devices warn you of potential health risks or medical advice apps offer personalized guidance based on your unique DNA profile. These are just a few examples of how cutting-edge technology is making its way into the healthcare space, enabling data-driven decisions that improve patient outcomes and drive down costs.

Real-time data is a game-changer for case review and clinical time management, allowing healthcare professionals to understand patient situations and forecast outcomes more effectively. To fully realize the potential of data-driven healthcare, healthcare organizations must implement robust data management systems that can store all clinical data and provide the necessary tools for data analysis. By doing so, healthcare professionals will be empowered to make informed decisions that enhance patient care, improve outcomes, and ultimately transform the healthcare landscape.

Also, read the best approach to testing digital healthcare.

How do you use data for a better future?

When it comes to healthcare, data is everything. However, with the massive amounts of data that healthcare professionals must contend with, the sheer volume of information can be overwhelming.
As the industry has shifted toward electronic record keeping, healthcare organizations have had to allocate more resources to purchasing servers and computing power to handle the influx of data. This has led to a significant surge in spending across the sector.

Despite the clear advantages of data-driven healthcare, managing such large amounts of information presents unique challenges. Sorting through and making sense of the data requires robust data management systems and advanced analytical tools. However, with the right approach, healthcare professionals can leverage this data to make informed decisions that improve patient outcomes and transform the industry.

How does data analytics benefit the healthcare industry?

A small diagnostic error can have devastating consequences in the healthcare industry, potentially costing lives. The difference between an actual positive malignant tumor and a benign one can be the difference between life and death. This is where data analytics comes into play, helping to eliminate the potential for error by identifying the most relevant patterns in the available data and predicting the best possible outcome.

Beyond improving patient care, data analytics can also assist hospital administration in evaluating the effectiveness of their medical personnel and treatment processes. As the industry continues to shift toward providing high-quality and reasonable care, the insights derived from data analysis can help organizations stay on the cutting edge of patient care.

With data analytics, healthcare professionals can harness the power of big data to identify patterns and trends, predict patient outcomes, and improve the overall quality of care. Healthcare organizations can optimize their processes by leveraging data-driven insights, minimizing errors, and ultimately delivering better patient outcomes.

Approaches of Data Analytics

Data analytics is a complex process involving various approaches, E.g., predictive analysis, descriptive analysis, and prescriptive analysis, including feature understanding, selection, cleaning, wrangling, and transformation. These techniques are applied depending on the type of data being analyzed.

Analysts must first understand the features and variables relevant to the analysis to derive insights from the data. From there, they can select the most relevant features and begin cleaning and wrangling the data to ensure accuracy and completeness.

Once the data has been prepared, analysts can apply various transformation techniques to derive insights and patterns. The specific methods used will depend on the nature of the data being analyzed but may include methods such as regression analysis, clustering, and decision trees.

Predictive Analysis

Analysts leverage sophisticated techniques such as relational, dimensional, and entity-relationship analysis methodologies to forecast outcomes. By applying these powerful analytical methods, they can extract insights from large and complex datasets, identifying patterns and relationships that might otherwise be obscured.

Whether analyzing patient data to forecast disease progression or studying market trends to predict demand for new medical products, these advanced analytical techniques are essential for making informed decisions in today’s data-driven world. By leveraging the latest tools and techniques, healthcare professionals can stay ahead of the curve, improving patient outcomes and driving innovation in the industry.

Descriptive Analysis

In the data analytics process, descriptive analysis is a powerful technique that can be used to identify trends and patterns in large datasets. Unlike more complex analytical methods, descriptive analysis relies on simple arithmetic and statistics to extract insights from the data.

Analysts can gain a deeper understanding of data distribution by analyzing descriptive statistics such as mean, median, and mode, helping to identify common trends and patterns. This information is invaluable during the data mining phase, assisting analysts to uncover hidden insights and identify opportunities for further analysis.

Prescriptive Analysis

In data analytics, prescriptive analysis represents the pinnacle of analytical techniques. Beyond simple descriptive or predictive analysis, prescriptive analysis offers recommendations for proceeding based on insights gleaned from the data.

This highly advanced analysis is the key to unlocking new opportunities in the healthcare industry, enabling professionals to make more informed decisions about everything from treatment protocols to resource allocation. By leveraging sophisticated algorithms and machine learning techniques, prescriptive analysis can identify the optimal path forward for any situation, helping organizations optimize processes, maximize efficiency, and drive better patient outcomes.

Gathering Real-time Data in Healthcare

Real-time data refers to data that is immediately obtained upon its creation and can be collected using various methods, including:

  • Health Records
  • Prescriptions
  • Diagnostics Data
  • Apps and IoTs

Real-time data is crucial for managing the healthcare industry’s patient care, operations, and staffing routines. By leveraging real-time data, the industry can optimize its entire IT infrastructure, gaining greater insight and understanding of its complex networks.

Examples of Real-time Data Technologies in Healthcare

Role of AI/ML in healthcare

Regarding medical diagnostics, the power of data analytics cannot be overstated. Thanks to cutting-edge machine learning and deep learning methods, it’s now possible to analyze medical records and predict future outcomes with unprecedented precision.

Take machine learning, for example. By leveraging this technology, medical practitioners can reduce the risk of human error in the diagnosis process while also gaining new insights into graphic and picture data that could help improve accuracy. Additionally, analyzing healthcare consumption data using machine learning algorithms makes it possible to allocate resources more effectively and reduce waste.

But that’s not all. Deep learning is also a game-changer in the fight against cancer. Researchers have achieved remarkable results by training a model to recognize cancer cells using deep neural networks. By feeding the model a wealth of cancer cell images, it could “memorize” their appearance and use that knowledge to detect cancerous cells in future images accurately. The potential for this technology to save lives is truly staggering.

RPA (Robotic process automation) in healthcare

The potential for RPA in healthcare is fascinating. By scanning incoming data and scheduling appointments based on a range of criteria like symptoms, suspected diagnosis, doctor availability, and location, RPA can dramatically boost efficiency. This would relieve the burden of time-consuming scheduling tasks from the healthcare staff and probably improve patient satisfaction.

In addition to appointment scheduling, RPA can also be used to speed up health payment settlements. By consolidating charges for different services, including testing, medications, food, and doctor fees, into a single, more straightforward payment, healthcare practitioners can save time and avoid billing errors. Plus, if there are any issues with cost or delays, RPA can be set up to email patients with customized reminders.

But perhaps the most exciting use of RPA in healthcare is data analysis. By leveraging this technology to produce insightful analytics tailored to each patient’s needs, healthcare providers can deliver more precise diagnoses and treatment plans. Ultimately, this can lead to better outcomes and an enhanced patient care experience.

Role of Big Data in Healthcare

In today’s world, the healthcare industry needs an innovation that can empower medical practitioners to make informed decisions and ultimately enhance patient outcomes. Big data is the transformative force that can revolutionize how we approach healthcare. With the ability to analyze massive amounts of data from various sources, big data can provide medical practitioners with the insights they need to understand better and treat diseases. By leveraging this data, doctors can develop more targeted treatments and therapies that have the potential to improve patient outcomes drastically.

Beyond the immediate benefits of improved treatment options, big data also plays a vital role in driving new drug development. Through advanced clinical research analysis, big data can predict the efficacy of potential new drugs, making it easier for scientists to identify the most promising candidates for further development. This is just one example of how big data is revolutionizing the way we approach healthcare, and the benefits will only continue to grow as we explore more ways to harness its power.

Finally, big data is helping healthcare practitioners to create focused treatments that are tailored to improve population health. By analyzing population health data, big data can detect patterns and trends that would be impossible to identify through other means. With this information, medical professionals can develop targeted treatments that can be applied on a large scale, ultimately improving health outcomes for entire populations. This is just one of the many ways that big data is changing the way we approach healthcare, and it’s clear that the possibilities are endless. As we continue to explore this transformative technology, there’s no doubt that we’ll discover even more innovative ways to leverage big data to improve health outcomes for patients around the world.

Wrapping Up

In conclusion, real-time data analysis is a transformative force in the healthcare industry that has the potential to revolutionize the way we approach patient care. With the ability to analyze vast amounts of data in real-time, medical practitioners can make faster and more informed decisions, resulting in improved patient outcomes and ultimately saving lives.

From predicting potential health risks to identifying disease outbreaks and monitoring patient progress, real-time data analysis is driving innovation in healthcare and changing the way medical professionals approach treatment. By leveraging cutting-edge technologies and advanced analytics tools, healthcare organizations can collect and analyze data from various sources, including wearable devices, electronic health records, and social media, to better understand patient needs and provide personalized care.

As the healthcare industry continues to evolve, it’s clear that real-time data analysis will play an increasingly important role in delivering better health outcomes for patients worldwide. Real-time data analysis can improve patient care, reduce costs, and save lives by giving medical practitioners the insights they need to make more informed decisions. The possibilities for the future of healthcare services are endless, and I’m excited to see the continued innovations that will arise from this transformative technology.

The post Real-Time Data Analysis and its Impact on Healthcare appeared first on Indium.

]]>
Maximizing AI and ML Performance: A Guide to Effective Data Collection, Storage, and Analysis https://www.indiumsoftware.com/blog/maximizing-ai-and-ml-performance-a-guide-to-effective-data-collection-storage-and-analysis/ Fri, 12 May 2023 11:42:41 +0000 https://www.indiumsoftware.com/?p=16750 Data is often referred to as the new oil of the 21st century. Because it is a valuable resource that powers the digital economy in a similar way that oil fueled the industrial economy of the 20th century. Like oil, data is a raw material that must be collected, refined, and analyzed to extract its

The post Maximizing AI and ML Performance: A Guide to Effective Data Collection, Storage, and Analysis appeared first on Indium.

]]>
Data is often referred to as the new oil of the 21st century. Because it is a valuable resource that powers the digital economy in a similar way that oil fueled the industrial economy of the 20th century. Like oil, data is a raw material that must be collected, refined, and analyzed to extract its value. Companies are collecting vast amounts of data from various sources, such as social media, internet searches, and connected devices. This data can then be used to gain insights into customer behavior, market trends, and operational efficiencies.

In addition, data is increasingly being used to power artificial intelligence (AI) and machine learning (ML) systems, which are driving innovation and transforming businesses across various industries. AI and ML systems require large amounts of high-quality data to train models, make predictions, and automate processes. As such, companies are investing heavily in data infrastructure and analytics capabilities to harness the power of data.

Data is also a highly valuable resource because it is not finite, meaning that it can be generated, shared, and reused without diminishing its value. This creates a virtuous cycle where the more data that is generated and analyzed, the more insights can be gained, leading to better decision-making, increased innovation, and new opportunities for growth. Thus, data has become a critical asset for businesses and governments alike, driving economic growth and shaping the digital landscape of the 21st century.

There are various data storage methods in data science, each with its own strengths and weaknesses. Some of the most common data storage methods include:

  • Relational databases: Relational databases are the most common method of storing structured data. They are based on the relational model, which organizes data into tables with rows and columns. Relational databases use SQL (Structured Query Language) for data retrieval and manipulation and are widely used in businesses and organizations of all sizes.
  • NoSQL databases: NoSQL databases are a family of databases that do not use the traditional relational model. Instead, they use other data models such as document, key-value, or graph-based models. NoSQL databases are ideal for storing unstructured or semi-structured data and are used in big data applications where scalability and flexibility are key.
  • Data warehouses: Data warehouses are specialized databases that are designed to support business intelligence and analytics applications. They are optimized for querying and analyzing large volumes of data and typically store data from multiple sources in a structured format.
  • Data lakes: Data lakes are a newer type of data storage method that is designed to store large volumes of raw, unstructured data. Data lakes can store a wide range of data types, from structured data to unstructured data such as text, images, and videos. They are often used in big data and machine learning applications.
  • Cloud-based storage: Cloud-based storage solutions, such as Amazon S3, Microsoft Azure, or Google Cloud Storage, offer scalable, secure, and cost-effective options for storing data. They are especially useful for businesses that need to store and access large volumes of data or have distributed teams that need access to the data.

To learn more about : How AI and ML models are assisting the retail sector in reimagining the consumer experience.

Data collection is an essential component of data science and there are various techniques used to collect data. Some of the most common data collection techniques include:

  • Surveys: Surveys involve collecting information from a sample of individuals through questionnaires or interviews. Surveys are useful for collecting large amounts of data quickly and can provide valuable insights into customer preferences, behavior, and opinions.
  • Experiments: Experiments involve manipulating one or more variables to measure the impact on the outcome. Experiments are useful for testing hypotheses and determining causality.
  • Observations: Observations involve collecting data by watching and recording behaviors, actions, or events. Observations can be useful for studying natural behavior in real-world settings.
  • Interviews: Interviews involve collecting data through one-on-one conversations with individuals. Interviews can provide in-depth insights into attitudes, beliefs, and motivations.
  • Focus groups: Focus groups involve collecting data from a group of individuals who participate in a discussion led by a moderator. Focus groups can provide valuable insights into customer preferences and opinions.
  • Social media monitoring: Social media monitoring involves collecting data from social media platforms such as Twitter, Facebook, or LinkedIn. Social media monitoring can provide insights into customer sentiment and preferences.
  • Web scraping: Web scraping involves collecting data from websites by extracting information from HTML pages. Web scraping can be useful for collecting large amounts of data quickly.

Data analysis is an essential part of data science and there are various techniques used to analyze data. Some of the top data analysis techniques in data science include:

  • Descriptive statistics: Descriptive statistics involve summarizing and describing data using measures such as mean, median, mode, variance, and standard deviation. Descriptive statistics provide a basic understanding of the data and can help identify patterns or trends.
  • Inferential statistics: Inferential statistics involve making inferences about a population based on a sample of data. Inferential statistics can be used to test hypotheses, estimate parameters, and make predictions.
  • Data visualization: Making charts, graphs, and other visual representations of data to better understand patterns and relationships is known as data visualization. Data visualization is helpful for expressing complex information and spotting trends or patterns that might not be immediately apparent from the data.
  • Machine learning: Machine learning involves using algorithms to learn patterns in data and make predictions or decisions based on those patterns. Machine learning is useful for applications such as image recognition, natural language processing, and recommendation systems.
  • Text analytics: Text analytics involves analyzing unstructured data such as text to identify patterns, sentiment, and topics. Text analytics is useful for applications such as customer feedback analysis, social media monitoring, and content analysis.
  • Time series analysis: Time series analysis involves analyzing data over time to identify trends, seasonality, and cycles. Time series analysis is useful for applications such as forecasting, trend analysis, and anomaly detection.

Use Cases

To illustrate the importance of data in AI and ML, let’s consider a few use cases:

  • Predictive Maintenance: In manufacturing, AI and ML can be used to predict when machines are likely to fail, enabling organizations to perform maintenance before a breakdown occurs. To achieve this, the algorithms require vast amounts of data from sensors and other sources to learn patterns that indicate when maintenance is necessary.
  • Fraud Detection: AI and ML can also be used to detect fraud in financial transactions. This requires large amounts of data on past transactions to train algorithms to identify patterns that indicate fraudulent behavior.
  • Personalization: In e-commerce, AI and ML can be used to personalize recommendations and marketing messages to individual customers. This requires data on past purchases, browsing history, and other customer behaviors to train algorithms to make accurate predictions.

Real-Time Analysis

To achieve optimal results in AI and ML applications, data must be analyzed in real-time. This means that organizations must have the infrastructure and tools necessary to process large volumes of data quickly and accurately. Real-time analysis also requires the ability to detect and respond to anomalies or unexpected events, which can impact the accuracy of the algorithms.

Wrapping Up

In conclusion, data is an essential component of artificial intelligence (AI) and machine learning (ML) applications. Collecting, storing, and analyzing data effectively is crucial to maximizing the performance of AI and ML systems and obtaining optimal results. Data visualization, machine learning, time series analysis, and other data analysis techniques can be used to gain valuable insights from data and make data-driven decisions.

No matter where you are in your transformation journey, contact us and our specialists will help you make technology work for your organization.

Click here

 

The post Maximizing AI and ML Performance: A Guide to Effective Data Collection, Storage, and Analysis appeared first on Indium.

]]>
Training Custom Machine Learning Model on Vertex AI with TensorFlow https://www.indiumsoftware.com/blog/training-custom-machine-learning-model-on-vertex-ai-with-tensorflow/ Fri, 03 Feb 2023 12:11:24 +0000 https://www.indiumsoftware.com/?p=14404 “Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.” AutoML vs Custom Training To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google. Choose a training method | Vertex AI | Google Cloud In this

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

]]>
“Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

AutoML vs Custom Training

To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google.

Choose a training method | Vertex AI | Google Cloud

In this article we are going to train the Custom Machine Learning Model on Vertex AI with TensorFlow.

To know about Vertex AI’s AutoML feature read my previous blog : Machine Learning using Google’s Vertex AI.

About Dataset

We will be using Crab Age Prediction dataset from Kaggle. The dataset is used to estimate the age of the crab based on the physical attributes.

To learn more about how our AI and machine learning capabilities can assist you.

Click here

There are 9 columns in the Dataset as follows.

  1. Sex: Crab gender (Male, Female and Indeterminate)
  2. Length: Crab length (in Feet; 1 foot = 30.48 cms)
  3. Diameter: Crab Diameter (in Feet; 1 foot = 30.48 cms)
  4. Height: Crab Height (in Feet; 1 foot = 30.48 cms)
  5. Weight: Crab Weight (in ounces; 1 Pound = 16 ounces)
  6. Shucked Weight: Without Shell Weight (in ounces; 1 Pound = 16 ounces)
  7. Viscera Weight: Viscera Weight
  8. Shell Weight: Shell Weight (in ounces; 1 Pound = 16 ounces)
  9. Age: Crab Age (in months)

We must predict the Age column with the help of the rest of the columns.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from Kaggle. There is only one csv file in the downloaded dataset called CrabAgePrediction.csv, I have uploaded this csv to the bucket called vertex-ai-custom-ml on Google Cloud Storage.

Step 2: Working on Workbench

Go to Vertex AI, then to Workbench section and enable the Notebook API. Then click on New Notebook and select TensorFlow Enterprise, we are using TensorFlow Enterprise 2.6 without GPU for the project. Make sure to select us-central1 (Iowa) region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

In the JupyterLabopen the Terminal and Run following cmd one by one.

mkdir crab_folder     # This will create crab_folder                       

cd crab_folder        # To enter the folder

mkdir trainer         # This will create trainer folder

touch Dockerfile      # This will create a Dockerfile

We can see all the files and folder on the left side of the JupyterLab, from that open the Dockerfile and start editing with following lines of code.

FROM gcr.io/deeplearning-platform_release/tf2-cpu.2-6

WORKDIR /

COPY trainer /trainer

ENTRYPOINT [“python”,”-m”,”trainer.train”]

Now save the Docker file and with this we have given the Entrypoint for the docker file.

To save the model’s output, we’ll make a bucket called crab-age-pred-bucket.

For the model training file, I have already uploaded the python file into the GitHub Repository. To clone this Repository, click on the Git from the top of JupyterLab and select Clone a Repository and paste the repository link and hit clone.

In the Lab, we can see the crab-age-pred folder; copy the train.py file from this folder to crab_folder/ trainer /.

Let’s look at the train.py file before we create the Docker IMAGE.

#Importing the required packages..

import numpy as np

import pandas as pd

import pathlib

import tensorflow as tf

#Importing tensorflow 2.6

from tensorflow import keras

from tensorflow.keras import layers

print(tf.__version__)

#Reading data from the gcs bucket

dataset = pd.read_csv(r”gs://vertex-ai-custom/CrabAgePrediction.csv”)

dataset.tail()

BUCKET = ‘gs://vertex-ai-123-bucket’

dataset.isna().sum()

dataset = dataset.dropna()

#Data transformation..

dataset = pd.get_dummies(dataset, prefix=”, prefix_sep=”)

dataset.tail()

#Dataset splitting..

train_dataset = dataset.sample(frac=0.8,random_state=0)

test_dataset = dataset.drop(train_dataset.index)

train_stats = train_dataset.describe()

#Removing age column, since it is a target column

train_stats.pop(“Age”)

train_stats = train_stats.transpose()

train_stats

#Removing age column from train and test data

train_labels = train_dataset.pop(‘Age’)

test_labels = test_dataset.pop(‘Age’)

def norma_data(x):

    #To normalise the numercial values

    return (x – train_stats[‘mean’]) / train_stats[‘std’]

normed_train_data = norma_data(train_dataset)

normed_test_data = norma_data(test_dataset)

def build_model():

    #model building function

    model = keras.Sequential([

    layers.Dense(64, activation=’relu’, input_shape=[len(train_dataset.keys())]),

    layers.Dense(64, activation=’relu’),

    layers.Dense(1)

  ])

    optimizer = tf.keras.optimizers.RMSprop(0.001)

    model.compile(loss=’mse’,

                optimizer=optimizer,

                metrics=[‘mae’, ‘mse’])

    return model

#model = build_model()

#model.summary()

model = build_model()

EPOCHS = 10

early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10)

early_history = model.fit(normed_train_data, train_labels,

                    epochs=EPOCHS, validation_split = 0.2,

                    callbacks=[early_stop])

model.save(BUCKET + ‘/model’)

Summary of train.py

When all of the necessary packages are imported, TensorFlow 2.6 will be used for modelling. The pandas command will be used to read the stored csv file in the vertex-ai-custom-ml bucket, and the BUCKET variable will be used to specify the bucket where we will store the train model.

We are doing some transformation such as creating dummy variable for the categorical column. Next, we are splitting the data into training and testing and normalizing the data.

We wrote a function called build_model that includes a simple two-layer tensor flow model. The model will be constructed using ten EPOCHS. We have to save the model in the crab-age-pred-bucket/model file on Data storage and see it has been educated.

Now, in the JupyterLab Terminal, execute the following cmd one by one to create a Docker IMAGE.

PROJECT_ID=crab-age-pred

IMAGE_URI=”gcr.io/$ PROJECT_ID/crab:v1”

docker build ./ -t $IMAGE_URI

Before running the build command make sure to enable the Artifact Registry API and Google Container Registry API by going to the APIs and services in Vertex AI.

After running the CMD our Docker Image is built successfully. Now we will push the docker IMAGE with following cmd.

docker push $IMAGE_URI

Once pushed we can see our Docker IMAGE in the Container registry. To find the Container registry you can search it on Vertex AI.

Best Read: Our success story about how we assisted an oil and gas company, as well as Nested Tables and Machine Drawing Text Extraction

Step 3: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In Datasets select no managed dataset and click continue.

In Model details I have given the model’s name as “pred-age-crab” and in advance option select the available service account. For rest keep default. Make sure that the service account has the cloud storage permissions if not give the permissions from IAM and Admin section.

Select the custom container for the Container image in the Training container. Navigate to and select the newly created Docker image. Next, navigate to and select the crab-age-pred-bucket in the Model output directory. Now press the continue button.

Ignore any selections for Hyperparameters and click Continue.

In Compute and pricing, Select the machine type n1-standard-32, 32 vCPUs, 120 GiB memory and hit continue.

For Prediction Container select Pre-Built container with TensorFlow Framework 2.6 and start the model training.

You can see the model in training in the Training section.

In about 8 minutes, our custom model training is finished.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. The region should be us-central1.

Give crab_age_pred as the name of Endpoint and click Continue.

In the Model Settings select pred_age_crab as Model NameVersion 1 as Version and 2 as number of compute nodes, n1-standard-8, 8 vCPUs, 30 GiB memory as Machine Type and select service account. Click Done and Create.

In Model monitoring ignore this selection and click create to implement the version.

It may take 11 minutes to deploy the model.

With the above step our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can make predictions. For this project we are going to use Python to make predictions. We will need to give the Vertex AI Admin and Cloud Storage Admin permissions to the service account. We can do that in the IAM and administration section of Google cloud. Once the permissions are given, we will download the key of the service account in JSON format, it will be useful in authenticating the OS.

Following is the code used for the prediction.

pip install google-cloud-aiplatform

from typing import Dict

from google.cloud import aiplatform

from google.protobuf import json_format

from google.protobuf.struct_pb2 import Value

import os

def predict_tabular_sample(

    project: str,

    endpoint_id: str,

    instance_dict: Dict,

    location: str = “us-central1”,

    api_endpoint: str = “us-central1-aiplatform.googleapis.com”):

    # The AI Platform services require regional API endpoints.

    client_options = {“api_endpoint”: api_endpoint}

    # Initialize client that will be used to create and send requests.

    # This client only needs to be created once, and can be reused for multiple requests.

    client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

    # for more info on the instance schema, please use get_model_sample.py

    # and look at the yaml found in instance_schema_uri

    instance = json_format.ParseDict(instance_dict, Value())

    instances = [instance]

    parameters_dict = {}

    parameters = json_format.ParseDict(parameters_dict, Value())

    endpoint = client.endpoint_path(

        project=project, location=location, endpoint=endpoint_id

    )

    response = client.predict(

        endpoint=endpoint, instances=instances, parameters=parameters

    )

    predictions = response.predictions

    print(predictions)

#Authentication using service account.

#We are giving the path to the JSON key

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/crab-age-pred-7c1b7d9be185.json”

#normalized values

inputs =[0,0,1,1.4375,1.175,0.4125,0.63571550,0.3220325,1.5848515,0.747181]

<emstyle=”color:blue;”>project_id = “crab-age-pred”                         #Project Id from the Vertex AI</emstyle=”color:blue;”>

endpoint_id = 7762332189773004800                    #Endpoint Id from the Enpoints Section

predict_tabular_sample(project_id,endpoint_id,inputs)

Output

[[8.01214314]]

This is how we can make the predictions. For the inputs make sure to do the same transformation and normalizing which we have done for the training data.

With this we have completed the project and learned how to train, deploy and to get predictions of the custom trained ML model.

I hope you will find it useful.

See you again.

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

]]>
Overview of Big Query’s Unique feature, BQML with a regression model example https://www.indiumsoftware.com/blog/overview-of-big-querys-unique-feature-bqml-with-a-regression-model-example/ Thu, 02 Feb 2023 13:20:43 +0000 https://www.indiumsoftware.com/?p=14393 In this Blog you are going to see what Big Query is, its best feature of Big Query (BQML), Areas of BQML, with clear example to understand its easiness of building machine learning model with simple SQL code. The blog will go through the following topics: Let’s dive into the article., What is Big Query?

The post Overview of Big Query’s Unique feature, BQML with a regression model example appeared first on Indium.

]]>
In this Blog you are going to see what Big Query is, its best feature of Big Query (BQML), Areas of BQML, with clear example to understand its easiness of building machine learning model with simple SQL code.

The blog will go through the following topics:

  • What is Big Query?
  • Best features of Big Query?
  • Why BQML? Areas of BQML?
  • Regression model to show efficiency of BQML

Let’s dive into the article.,

What is Big Query?

With built-in technologies like machine learning, business intelligence and geospatial analysis, Big Query is a managed service data management warehouse that can enable you to manage and analyse your data. With no need for infrastructure administration, Big Query’s serverless architecture enables you to leverage SQL queries to tackle the most critical issues facing your company. You may query data in terabytes in a matter of seconds and petabytes of data in a matter of minutes thanks to Big Query’s robust, distributed analytical engine.

Best features of Big Query?

Built-in ML Integration (BQ ML), Multi cloud Functionality (BQ Omni), Geospatial Analysis (BQ GIS), Foundation for BI (BQ BI Engine), Free Access (BQ Sandbox), Automated Data Transfer (BQ Data Transfer Service). These are the amazing features of BQ, in this blog we will discuss the most amazing feature of Big Query which is Big Query ML.

*An amazing feature of Big Query is Big Query ML,

Big Query ML allows you to use standard SQL queries to develop and run machine learning models in Big Query. Machine learning on huge datasets requires extensive programming and ML framework skills. These criteria restrict solution development within each organization to a small group of people, and they exclude data analysts who understand the data but lack machine learning and programming skills. This is where Big Query ML comes in handy; it allows data analysts to employ machine learning using their existing SQL tools and skills. Big Query ML allows analysts to create and evaluate machine learning models in Big Query with large volumes of data.

For more information on Big Query Machine Learning services and solutions

Contact us today

Why BQML?

The major advantages I’ve identified using BQML

  • There is no need to read your data from local memory because, like any other ML language, BQML can subsample your dataset, but BQML can also train your model directly in your database.
  • Working in SQL can help you collaborate more easily if you’re working in a team and the majority of your teammates don’t know Python, R, or your favourite modelling language. 
  • Because your model will be in the same location as your data, you can serve it immediately after it has been trained and make predictions directly from it.

Areas we can use BQML

  • Retail Industry (Demand forecasting, Customer segmentation, Propensity to purchase or propensity to click on item, Product recommendations by emails and ads).
  • Logistics Industry (Time estimation of package delivery, Predictive maintenance).
  • Finance Industry (Product recommendations by emails and ads, Product recommendations by emails and ads, Product recommendations by emails and ads, Product recommendations by emails and ads).
  • Gaming Industry (Content recommendation, Predicting churn customers).

 Another blog worth reading: Databricks Overview, Why Databricks, and More

Regression model to show efficiency of BQML

  • For this we will build a linear regression model to predict the house prices in the USA, as it is best fit to predict the value of one variable using another. Also, for understanding about model working in the article I am using example of regression model as it is simpler to communicate how the model itself works and interpret results.
  • With the USA housing dataset, we will see how efficient and easy Big Query ML feature is to build machine learning linear regression model with SQL code.

Step 1: Creating the Model

CREATE OR REPLACE MODEL

`testproject-351804.regression.house_prices2` OPTIONS(model_type = ‘linear_reg’, input_label_cols = [‘price’],l2_reg = 1, early_stop = false, max_iterations = 12, optimize_strategy = ‘batch_gradient_descent’) ASSELECT avg_house_age, avg_rooms, avg_bedrooms, avg_income, population, price/100000 AS priceFROM `regression.usa_housing_train`

SELECT avg_house_age, avg_rooms, avg_bedrooms, avg_income, population, price/100000 AS price FROM  `regression.usa_housing_train

Model creation

  • The above code will create and train the model.
  • With the simple CREATE MODEL function we can create the ML model, you need to specify the OPTIONS, we need basically only model_type and input_label_cols(predicting variable) to create the model but why I used other OPTIONS, you will see in evaluation section.

Step 2: Evaluating the Model

SELECT * FROM ML.EVALUATE(MODEL `regression.house_prices2`,TABLE ` testproject- 351804._8b41b9f5a2e85d72c62e834e3e9dd60a58ba542d.anoncb5de70d_1e3d_4213_8c5d_bb10d6b9385b_imported_data_split_eval_data`)

Model Evaluation

  • We have to see how well our model is performing by using ML.EVALUATE funtion, So now we will see why I used other OPTIONS in creating the model,
  • First I created a model in BigQuery ML, with model options model_type= ‘linear_reg’ and input_label_cols = ‘price’ but while evaluating the model “r square” is only 0.3 which I felt less accurate and I came to know that model is overfitt by seeing huge difference between the training loss and evaluation loss.
  • So, as a solution I added options in creating model, used L2 regularization to overcome overfitt and generalize the model to adapt the data points and changed values for three times to made it generalize and after the r square is 0.92 with above 90% accuracy.

*We need to look upon R-Squared, which is Coefficient of determination. Higher is better.

Step 3: Predicting the Model


The model’s prediction process is as simple as calling ML.PREDICT

SELECT * FROM ML.PREDICT (Model `regression.house_prices2`,TABLE `regression.usa_housing_predict`)

Model Prediction

See, how efficient is Big Query ML feature of Big Query, it predicted the house prices basing upon the trained data of avg_house_age, avg_rooms, avg_bedrooms, avg_income, avg_population.

Summary

Now you know how to create linear regression models in BigQuery ML. We have discussed how to build a model, assess it, apply it to make predictions, and analyse model coefficients.

In next coming blogs you will see other unique features of Big Query like Geospatial Analytics and Array/Structs.

Happy Reading

Hope you find this useful.

The post Overview of Big Query’s Unique feature, BQML with a regression model example appeared first on Indium.

]]>
Kubeflow Pipeline on Vertex AI for Custom ML Models https://www.indiumsoftware.com/blog/kubeflow-pipeline-on-vertex-ai-for-custom-ml-models/ Thu, 02 Feb 2023 11:56:32 +0000 https://www.indiumsoftware.com/?p=14381 What is Kubeflow? “Kubeflow is an open-source project created to help deployment of ML pipelines. It uses components as python functions for each step of pipeline. Each component runs on the isolated container with all the required libraries. It runs all the components in the series one by one.” In this article we are going

The post Kubeflow Pipeline on Vertex AI for Custom ML Models appeared first on Indium.

]]>
What is Kubeflow?

“Kubeflow is an open-source project created to help deployment of ML pipelines. It uses components as python functions for each step of pipeline. Each component runs on the isolated container with all the required libraries. It runs all the components in the series one by one.”

In this article we are going to train a custom machine learning model on Vertex AI using Kubeflow Pipeline.

About Dataset

Credit Card Customers dataset from Kaggle will be used. The 10,000 customer records in this dataset include columns for age, salary, marital status, credit card limit, credit card category, and other information. In order to predict the customers who are most likely to leave, we must analyse the data to determine the causes of customer churn.

Interesting Read: In the world of hacking, we’ve reached the point where we’re wondering who is a better hacker: humans or machines.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from GitHub. There are two csv files in the downloaded dataset called churner_p1 and churner_p2, I have created a Big Query dataset credit_card_churn with the tables as churner_p1 and churner_p2 with this csv files. I have also created the bucket called credit-card-churn on Cloud Storage. This bucket will be used to store the artifacts of the pipeline

Step 2: Employing Workbench

Enable the Notebook API by going to Vertex AI and then to the Workbench section. Then select Python 3 by clicking on New Notebook. Make sure to choose the us-central1 region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

We will also have to enable the following APIs from API and services section of Vertex AI.

  1. Artifact Registry API
  2. Container Registry API
  3. AI Platform API
  4. ML API
  5. Cloud Functions API
  6. Cloud Build API

Now click on the Python 3 to open a jupyter notebook in the JupyterLab Notebook section and run the below code cells.

USER_FLAG = “–user”

!pip3 install {USER_FLAG} google-cloud-aiplatform==1.7.0

!pip3 install {USER_FLAG} kfp==1.8.9

This will install google cloud AI platform and Kubeflow packages. Make sure to restart the kernel after the packages are installed.

import os

PROJECT_ID = “”

# Get your Google Cloud project ID from gcloud

if not os.getenv(“IS_TESTING”):

    shell_output=!gcloud config list –format ‘value(core.project)’ 2>/dev/null

    PROJECT_ID = shell_output[0]

    print(“Project ID: “, PROJECT_ID)

Create the variable PROJECT_ID with the name of project.

BUCKET_NAME=”gs://” + PROJECT_ID

BUCKET_NAME

Create the variable BUCKET_NAME, this will return the same bucket name we have created earlier.

import matplotlib.pyplot as plt

import pandas as pd

from kfp.v2 import compiler, dsl

from kfp.v2.dsl import pipeline, component, Artifact, Dataset, Input, Metrics, Model, Output, InputPath, OutputPath

from google.cloud import aiplatform

# We’ll use this namespace for metadata querying

from google.cloud import aiplatform_v1

PATH=%env PATH

%env PATH={PATH}:/home/jupyter/.local/bin

REGION=”us-central1″

PIPELINE_ROOT = f”{BUCKET_NAME}/pipeline_root/”

PIPELINE_ROOT

This will import required packages and create the pipeline folder in the credit-card-churn bucket.

#First Component in the pipeline to fetch data from big query.

#Table1 data is fetched

@component(

    packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

    base_image=”python:3.9″,

    output_component_file=”dataset_creating_1.yaml”

)

def get_data_1(

   bq_table: str,

   output_data_path: OutputPath(“Dataset”)

):

    from google.cloud import bigquery

    import pandas as pd

    bqclient = bigquery.Client()

   table = bigquery.TableReference.from_string(

      bq_table

    )

    rows = bqclient.list_rows(

        table

    )

   dataframe = rows.to_dataframe(

        create_bqstorage_client=True,

    )

   dataframe.to_csv(output_data_path)

The first component of the pipeline will fit the data from the table churner_p1 from big query and pass the csv file as the output for the next component. The structure is the same for every component. We have used the @component decorator to install the required packages and specify the base image and output file, then we create the get_data_1 function to get the data from big query.

#Second Component in the pipeline to fetch data from big query.

#Table2 data is fetched

#First component and second component doesnt need inputs from any components

@component(

    packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

    base_image=”python:3.9″,

    output_component_file=”dataset_creating_2.yaml”

)

def get_data_2(

    bq_table: str,

    output_data_path: OutputPath(“Dataset”)

):

   from google.cloud import bigquery

   import pandas as pd

    bqclient = bigquery.Client()

   table = bigquery.TableReference.from_string(

       bq_table

    )

   rows = bqclient.list_rows(

        table

    )

    dataframe = rows.to_dataframe(

        create_bqstorage_client=True,

    )

    dataframe.to_csv(output_data_path)

The second component of the pipeline will fit the data from the table churner_2 from big query and pass the csv file as the output for the next component. The first component and second component do not need inputs from any components.

#Third component in the pipeline to to combine data from 2 sources and for some data transformation

@component(

    packages_to_install=[“sklearn”, “pandas”, “joblib”],

   base_image=”python:3.9″,

  output_component_file=”model_training.yaml”,

)

def data_transformation(

    dataset1: Input[Dataset],

    dataset2: Input[Dataset],

    output_data_path: OutputPath(“Dataset”),

):

    from sklearn.metrics import roc_curve

    from sklearn.model_selection import train_test_split

    from joblib import dump

    from sklearn.metrics import confusion_matrix

    from sklearn.tree import DecisionTreeClassifier

    from sklearn.ensemble import RandomForestClassifier

   import pandas as pd

    data1 = pd.read_csv(dataset1.path)

    data2 = pd.read_csv(dataset2.path)

    data=pd.merge(data1, data2, on=’CLIENTNUM’, how=’outer’)

    data.drop([“CLIENTNUM”],axis=1,inplace=True)

   data = data.dropna()

   cols_categorical = [‘Gender’,’Dependent_count’, ‘Education_Level’, ‘Marital_Status’,’Income_Category’,’Card_Category’]

    data[‘Attrition_Flag’] = [1 if cust == “Existing Customer” else 0 for cust in data[‘Attrition_Flag’]]

    data_encoded = pd.get_dummies(data, columns = cols_categorical)

    data_encoded.to_csv(output_data_path)

The third component is where we have combined the data from the first and second component and did the data transformation such as dropping the “CLIENTNUM” column, dropping the null values and converting the categorical columns into numerical. we will pass this transformed data as csv to the next component.

#Fourth component in the pipeline to train the classification model using decision Trees or Randomforest

@component(

    packages_to_install=[“sklearn”, “pandas”, “joblib”],

    base_image=”python:3.9″,

    output_component_file=”model_training.yaml”,

)

def training_classmod(

    data1: Input[Dataset],

   metrics: Output[Metrics],

    model: Output[Model]

):

    from sklearn.metrics import roc_curve

    from sklearn.model_selection import train_test_split

    from joblib import dump

    from sklearn.metrics import confusion_matrix

    from sklearn.ensemble import RandomForestClassifier

    import pandas as pd

    data_encoded=pd.read_csv(data1.path)

    X = data_encoded.drop(columns=[‘Attrition_Flag’])

    y = data_encoded[‘Attrition_Flag’]

   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100,stratify=y)

   model_classifier = RandomForestClassifier()

    model_classifier.fit(X_train,y_train)

    y_pred=model_classifier.predict(X_test)

    score = model_classifier.score(X_test,y_test)

    print(‘accuracy is:’,score)

    metrics.log_metric(“accuracy”,(score * 100.0))

    metrics.log_metric(“model”, “RandomForest”)

    dump(model_classifier, model.path + “.joblib”)

In the fourth component we will train the model with the Random Classifier and we have used the “accuracy” as the evaluation metric.

@component(

   packages_to_install=[“google-cloud-aiplatform”],

    base_image=”python:3.9″,

    output_component_file=”model_deployment.yaml”,

)

def model_deployment(

    model: Input[Model],

    project: str,

    region: str,

    vertex_endpoint: Output[Artifact],

   vertex_model: Output[Model]

):

    from google.cloud import aiplatform

   aiplatform.init(project=project, location=region)

    deployed_model = aiplatform.Model.upload(

        display_name=”custom-model-pipeline”,

      artifact_uri = model.uri.replace(“model”, “”),

        serving_container_image_uri=”us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest”

    )

    endpoint = deployed_model.deploy(machine_type=”n1-standard-4″)

    # Save data to the output params

    vertex_endpoint.uri = endpoint.resource_name

    vertex_model.uri = deployed_model.resource_name

Fifth component is the last component, in this we will create the endpoint on the Vertex AI and deploy the model. We have used Docker as base IMAGE and have deployed the model on “n1-standard-4” machine.

@pipeline(

    # Default pipeline root. You can override it when submitting the pipeline.

    pipeline_root=PIPELINE_ROOT,

    # A name for the pipeline.

    name=”custom-pipeline”,

)

def pipeline(

   bq_table_1: str = “”,

    bq_table_2: str = “”,

    output_data_path: str = “data.csv”,

    project: str = PROJECT_ID,

    region: str = REGION

):

    dataset_task_1 = get_data_1(bq_table_1)

   dataset_task_2 = get_data_2(bq_table_2)

   data_transform=data_transformation(dataset_task_1.output,dataset_task_2.output)

    model_task = training_classmod(data_transform.output)

    deploy_task = model_deployment(model=model_task.outputs[“model”],project=project,region=region)

In the last we have pipeline function which will call all the components in the sequential manner: dataset_tast_1 and dataset_tast_2 will get the data from the big query, data_transform will transform the data, model_task will train the Random Classifier model and deploy_task will deploy the model on Vertex AI.

compiler.Compiler().compile(pipeline_func=pipeline, package_path=”custom-pipeline-classifier.json”)

Compiling the pipeline.

run1 = aiplatform.PipelineJob(

    display_name=”custom-training-vertex-ai-pipeline”,

    template_path=”custom-pipeline-classifier.json”,

    job_id=”custom-pipeline-rf8″,

   parameter_values={“bq_table_1”: “credit-card-churn.credit_card_churn.churner_p1″,”bq_table_2”: “credit-card-churn.credit_card_churn.churner_p2”},

   enable_caching=False,)

Creating the pipeline job.

run1.submit()

Running the pipeline job.

With this we have completed creating the Kubeflow pipeline and we can see it on the Pipelines section of Vertex AI.

 

Our Pipeline has run successfully and we have managed to get 100% accuracy for the classification.

We can use this model to get the online prediction using Rest API or Python. We can also create the different pipelines and compare their metrics on Vertex AI.

With this we have completed the project and learned how to create the Pipeline on Vertex AI for custom train models.

I hope you will find it useful.

To learn more about our AI & ML Solutions and Capabilities

Contact Us

See you again.

The post Kubeflow Pipeline on Vertex AI for Custom ML Models appeared first on Indium.

]]>
Machine Learning using Google’s Vertex AI https://www.indiumsoftware.com/blog/machine-learning-using-googles-vertex-ai/ Thu, 02 Feb 2023 10:38:31 +0000 https://www.indiumsoftware.com/?p=14347 Image by Google What is Vertex AI? “Vertex AI is Google’s platform which provides many Machine learning services such as training models using AutoML or Custom Training.” Image by Google Features of Vertex AI We use Vertex AI to perform the following tasks in the ML workflow To know the workflow of Vertex AI we

The post Machine Learning using Google’s Vertex AI appeared first on Indium.

]]>
Image by Google

What is Vertex AI?

“Vertex AI is Google’s platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

Image by Google

Features of Vertex AI

We use Vertex AI to perform the following tasks in the ML workflow

  • Creation of dataset and Uploading data
  • Training ML model
  • Evaluate model accuracy
  • Hyperparameters tuning (custom training only)
  • Storing model in Vertex AI.
  • Deploying trained model to endpoint for predictions.
  • Send prediction requests to endpoint.
  • Managing models and endpoints.

To know the workflow of Vertex AI we will train a Classification model “Dogs vs Cat” using Vertex AI’s AutoML feature.

Step 1: Creating Dataset

We will download the dataset from Kaggle. In the downloaded zip file there are two zip files train.zip and test.zip. Train.zip contains the labelled images for training.

There are about 25,000 images in the train.zip file and 12,500 in the test.zip file. For this project we will only use 200 cat and 200 dog images to train. We will use the test set to evaluate the performance of our model.

After extracting the data, I uploaded the images to the google cloud storage bucket called dogs_cats_bucket1 which I have created at us-central1 region. Images are stored in two folders train and test in the bucket.

Best Read: Top 10 AI Challenges

Now we need to create a csv file with the images address and label for that I have written the following lines of code.

from google.cloud import storage

import pandas as pd

import os

#Authentication using service account.

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/dogs-vs-cats-354105-19b7b157b2b8.json”

BUCKET=’dogs_cats_bucket1′

DELIMITER=’/’

TRAIN_PREFIX=’train/’

TRAIN_BASE_PATH = f’gs://{BUCKET}/{TRAIN_PREFIX}’

print(“Starting the import file generation process”)

print(“Process Details”)

print(f”BUCKET : {BUCKET}”)

storage_client = storage.Client()

data = []

print(“Fetchig list of Train objects”)

train_blobs = storage_client.list_blobs(BUCKET, prefix=TRAIN_PREFIX, delimiter=DELIMITER)

for blob in train_blobs:

label = “cat” if “cat” in blob.name else “dog”

full_path = f”gs://{BUCKET}/{blob.name}”

data.append({

‘GCS_FILE_PATH’: full_path,

‘LABEL’: label

})

df = pd.DataFrame(data)

df.to_csv(‘train.csv’, index=False, header=False)

After running the script on Jupyter Notebook, we have the required csv file, we will upload the file to the same storage bucket as well.

Now in the Vertex AI section go to Datasets and enable the Vertex AI API.

Click Create Dataset and name it. I have named it cat_dog_classification. We will select Image Classification (Single-label). Make sure the region is us-central1. Hit Create.

In the next section mark Select import files from Cloud Storage and select the train.csv from Browse. Hit Continue

 

Vertex AI tool 16 minutes to import data. Now we can see the data the Browse and Analyse tab.

 

Now we can train the model.

Step 2: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In the Dataset select cat_dog_classification and keep default for everything else with Model Training Method as AutoML.

Click continue for the Model Details and Explainability with the default settings.

For Compute and Pricing give 8 maximum node hours.

Hit Start Training.

 

The model training is completed after 29 mins.

Step 3: Model Evaluation

By clicking on trained model, it will take us to the model stats page. Where we have stats like Precision-recall curve, Precision-recall by threshold and Confusion matrix.

With the above stats the model looks good.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. Make sure the region is us-central1.

Give dogs_cats as the name of Endpoint and click Continue.

In the Model Settings select cat_dog_classification as Model NameVersion 1 as Version and 2 as number of compute nodes.

Click Done and Create.

It takes about 10 minutes to deploy the model.

With this our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can test the model by uploading the test image or creating Batch Prediction.

To Test the Model, we go to the Deploy and Test section on the Model page.

Click on the Upload Image to upload the test, Image.

With this we can see our model is working good on test images.

We can also connect to the Endpoint using Python and get the results.

For more details on our AI and ML services

Visit this link

This is the end of my blog. We have learned how to train an image classification model on Google’s Vertex AI using Auto ML feature. I have enjoyed every minute while working on it.

For the next article we will see how to train custom model on Vertex AI with TensorFlow.

Stay Tuned.

The post Machine Learning using Google’s Vertex AI appeared first on Indium.

]]>
Using Amazon QuickSight for Cloud BI with ML Capabilities  https://www.indiumsoftware.com/blog/using-amazon-quicksight-for-cloud-bi-with-ml-capabilities/ Thu, 06 Oct 2022 08:45:44 +0000 https://www.indiumsoftware.com/?p=12509 Forty-six of the 50 top telecasts of 2017 were the NFL games (National Football League) with an average of 15.8 million viewers through the season. What made it very interesting was the kind of statistics about the game and the players NFL was able to present using an advanced system to collect data and draw

The post Using Amazon QuickSight for Cloud BI with ML Capabilities  appeared first on Indium.

]]>

Forty-six of the 50 top telecasts of 2017 were the NFL games (National Football League) with an average of 15.8 million viewers through the season. What made it very interesting was the kind of statistics about the game and the players NFL was able to present using an advanced system to collect data and draw insights from it.  

NFL uses Next Gen Stats (NGS), a sophisticated tracking technology to collect data about the players in its 32 teams, playing in different stadiums at any given moment. It uses RFID devices embedded in the shoulder pads of the players to identify the location of the player within inches and calculate the speed and direction in which they are moving. This wealth of information shared through its multiple media partners hooks its more than 180 million fans worldwide. NFL partners with Amazon Web Services (AWS) to leverage its data using QuickSight for generating sophisticated analytics and machine learning models to interpret the gameplay, enthrall viewers, and improve the quality of the game.  

To know more about how Indium can help you leverage QuickSight for your cloud BI, contact us now

Get in touch

Amazon QuickSight is a cloud-based, machine learning-powered business intelligence tool that uses natural language for exploring data using interactive dashboards. It helps business users identify patterns and outliers through relevant visualizations without the need for IT admins or experts to generate charts and graphs. It breaks silos by accessing data stored in different sources to provide a holistic view.

QuickSight Features

One of the key features of QuickSight is ML Insights, which enables advanced data analytics solutions for deeper insights by leveraging AWS’s machine learning (ML) and natural language capabilities. Using these powerful, out-of-the-box features, users can identify hidden trends and outliers, determine key business opportunities, and forecast trends to accelerate growth. It does not require technical know-how or ML experience to use the tool, thus empowering business users with the tool to generate their insights quickly and easily to make timely decisions. 

Are you looking for an AWS partner for your clouds and data analytics implementations? Contact us

Get in touch

Contextual visualization is made possible by QuickSight’s machine learning capabilities that assess the meaning and relationships between the different business data. This helps improve the accuracy of the response to queries, supported by relevant visualizations. 

The tool can be trained for different functions and using the language and terms relevant to your organization. This helps to customize and increase the relevance of the insights. Machine learning also helps to improve the answers to the questions over time. 

Amazon QuickSight allows the seamless embedding of interactive dashboards and visualizations into applications. This helps to personalize the user interface for reports and dashboards. This also allows for a consistent user experience across the application. 

The QuickSight Enterprise Edition allows the embedding of rich, interactive dashboards in wikis, apps, and portals without the need to call embedding APIs. It does not need server deployments or infrastructure licensing, allowing authorized end-users to access the dashboards instantly. With the QuickSight Enterprise Edition, custom-branded reports can be shared over emails to increase credibility.  

Ad-hoc analysis and ML capabilities help with detecting anomalies, forecasting trends, and customizing natural language queries to generate dashboards and reports. This also increases avenues for differentiating and monetizing your applications. 

Being serverless, QuickSight analytics workflows can be managed programmatically, be it setting up a multi-tenant architecture to isolating data between multiple users, automating deployments, shifting dashboards across accounts, or managing access for users with Single Sign-On (SSO).

Benefits of QuickSight

Some of the key benefits of QuickSight include

Scalability: Using robust AWS APIs, developers can deploy scalable embedded analytics for thousands of users.  

Collaboration: Being cloud-based, QuckSight allows data visualization and insights to be shared across the organization using the web, email, mobile, or embedded applications. 

Consistency: Admins can ensure consistent, scalable performance. Periodic updates help improve performance without version conflicts, any downtime, or compatibility issues.  

Cost-Effective: QuickSight’s pay-per-session pricing makes it a cost-effective solution even for large-scale deployments.  

Forecast Trends: Interactive, point-and-click what-if analysis using business metrics helps businesses predict future trends and make informed decisions. 

Auto-Narratives: Customizable auto-narratives that can be woven into dashboards provide deeper context for improving actionable insights. 

Self-Service: End-users can generate their own reports with technical know-how to accelerate the speed of decision making 

Simple Data Pipelines: With Amazon SageMaker integration. sophisticated ML can be integrated to generate models without needing complex data pipelines 

Compliance: The solution provides end-to-end data encryption for data and encryption at rest for data in SPICE, ensuring security, governance, and compliance. 

Check out our customizable AI/ML-powered text analytics accelerator

Use Cases of Amazon QuickSight

Some of the use cases of Amazon Quicksight include doing sentiment analysis to understand customer views and customize products and solutions to improve customer delight. It can help create focused campaigns and targeted promotions to increase sales.  

The forecasting ability can help predict demand and manage production schedules effectively to reduce costs, and wastages, and improve profitability. It can help with increasing operational efficiency by identifying areas of improvement and strengths. 

QuickSight can be used across industries and in organizations of any size. It is a next-gen solution that can scale as the company grows, thereby protecting investments and increasing RoI. 

Indium Approach to QuickSight

Indium Software is an Amazon partner that has a team of experts in AWS solutions including QuickSight to help businesses break barriers to innovation. Our team of data engineers have experience working across industries and across functional areas, which helps to develop solutions that incorporate best practices for optimum result. 

The post Using Amazon QuickSight for Cloud BI with ML Capabilities  appeared first on Indium.

]]>
Why Indium is the Right Partner for Advanced Analytics https://www.indiumsoftware.com/blog/why-indium-is-the-right-partner-for-advanced-analytics/ Mon, 27 Apr 2020 09:38:09 +0000 https://www.indiumsoftware.com/blog/?p=3035 In a recent report published by Gartner, highlighting the Top 10 Data and Analytics Trends, there was a section dedicated to how augmented analytics will be the dominant driver of new purchases related to analytics and Business Intelligence (BI).   The phrase augmented analytics refers to the use of machine learning (ML) and natural language processing

The post Why Indium is the Right Partner for Advanced Analytics appeared first on Indium.

]]>
In a recent report published by Gartner, highlighting the Top 10 Data and Analytics Trends, there was a section dedicated to how augmented analytics will be the dominant driver of new purchases related to analytics and Business Intelligence (BI).  

The phrase augmented analytics refers to the use of machine learning (ML) and natural language processing (NLP) to enhance data analytics, data sharing and business intelligence.

According to the report, businesses will increasingly implement augmented analytics where automation will help them find the most important insights – both opportunities and risks. It’ll optimize the entire process of decision making, bringing in efficiency into several key processes that may have been manual in the past.

Advantage Indium: Expertise + Experience

Indium Software is a cutting-edge Advanced Analytics solutions provider leveraging machine learning and artificial intelligence to automate data-centric applications. It offers clustering, regression and classification services not just extracting insights from numeric data but also from text, audio-visual and image inputs.

Check out our Advanced Analytics Services

Read More

Indium also has cross-domain experience, having served clients across sectors including banking and financial services, retail, e-commerce, FMCG and next-generation technology companies. Indium’s current clients include a Fortune 500 FMCG company and one of the world’s leading online cab operators.

The company has built AI/ML and NLP-based solutions for different functional teams including those in marketing, finance and operations.

The team uses Open Source tool sets such as Python, R, or proprietary tools such as STSS for data management and dashboarding tools such as Qlikview, Tableau, Power BI and R Shiny. The team’s expertise in machine text analytics has also led to the development of a proprietary, patent-pending tool teX.ai.

Moreover, Indium provides end-to-end services including Big Data Analytics, Pricing Analytics, Customer Analytics and Social Media analytics. The company is well-positioned to serve as a pure-play analytics partner or an end-to-end solutions provider delivering a range of analytics and digital services including product development.

Considering most modern applications (apps) incorporate data engineering and analytical models, it makes sense to work with a single partner, who can integrate big data solutions with product development.

In the last 20 years, Indium has worked on 350+ client engagements, serving a range of customers from startups to mid-size growth companies to Fortune 500 clients. From undertaking complete projects to augmenting resources, Indium Software has had the opportunity to display its capabilities and provide customer delight on every occasion.

Customer Success Story #1: Working with a leading price comparison platform for e-commerce

A leading online price comparison platform  helps consumers with quick comparative information on various products from a large number of third-party online shopping players. The platform combines data collection from 1500+ online retailers. The key challenge in the workflow was to enable end-users with real time visibility of products (including availability, description, etc.) and their dynamic changing prices.

However, each site would have its own breadcrumb trail, making it difficult to identify and process all products that belonged to the same sub-category. Indium was able to use text analytics to process data in real time with minimal lag and effectively crawl dozens of e-commerce sites to return only the most relevant information with the correct taxonomy.

Using batch processing, the team was able to:

–         Reduce the process time from 25 hours to 7 hours

–         Reduce the data refresh and response time from 2 seconds to 1 second providing real time price information

–         Reduced cost of operations (including the need for manual tasks) by 35 percent

Indium’s text analytics capabilities delivered value for money for a tough problem with limited resources and in a short duration of time.

Subsequently the client has engaged Indium for two of its other projects: one in the areas of web analytics and another in data-driven campaign management.

Customer Success Story #2: Working with a fintech company specializing in payday advances

A Singapore-based financial services company offering a streamlined mobile app that gives payday advances was embarking on a technology modernization program to improve business workflow, increase revenue and decrease costs.

Apart from improving the performance of several business operations and legacy systems, the company was looking to derive insights from advanced analytics in two specific areas: Churn Analysis and User Profiling.

Using tool sets such as Elasticsearch, R, Python, Logistic Regression, XGBoost and K-means Clustering for Descriptive and Predictive Analytics, Indium was able to increase customer retention, lower costs through loan losses and increase ‘tips’, which was its main source of revenue.

·          Effective user profiling led to an increase in the user acquisition rate by 10 per cent

·          Identifying and addressing the underlying reasons led to a churn reduction of more than 20 per cent

·          The combined power of an operationally and intuitive interface resulted in the surge in tip collections in the range of 6.5 per cent

·          Potentially bad clients were weeded out

Our approach to a client engagement

Indium is highly process-driven and follows a three-step process:

·          Step 1 revolves around Data Preparation.

·          Step 2: Data exploratory analysis is like a preliminary investigation to assess the data and correlate it with the final output.

·          Step 3: Models are built and fine-tuned following testing and training of datasets. Once it has been tuned to the optimum extent, it is then deployed and integrated with the client’s system.

Leverge your Biggest Asset Data

Inquire Now

In some projects involving text analytics, multiple models are explored before arriving at the optimal solution keeping in mind the client’s context.

As mentioned before, Indium Software is a trusted partner for fast growing organizations around the world. We bring to the fore our deep expertise in advanced analytics, to formulate data-driven solutions for a multitude of business situations.

The post Why Indium is the Right Partner for Advanced Analytics appeared first on Indium.

]]>
Top AI and ML Trends in 2019! https://www.indiumsoftware.com/blog/top-ai-and-ml-trends/ Sun, 19 May 2019 12:49:00 +0000 https://www.indiumsoftware.com/blog/?p=347 Introduction Since, John McCarthy coined the term AI way back in 1955; AI has come a really long way. Fast forward 63 years and artificial intelligence is transforming industries like healthcare, fintech and various other industries across the spectrum. The end goal to achieve the most human like AI is still a little far from

The post Top AI and ML Trends in 2019! appeared first on Indium.

]]>
Introduction

Since, John McCarthy coined the term AI way back in 1955; AI has come a really long way. Fast forward 63 years and artificial intelligence is transforming industries like healthcare, fintech and various other industries across the spectrum.

The end goal to achieve the most human like AI is still a little far from reach. However, machine learning (ML) and Big Data advancements have aided AI to go mainstream.

Let’s have a look at the top AI and ML trends in 2019.

1. Health

A report by Accenture suggests that the AI Healthcare market in the US is expected to reach 6.6 billion $ by 2021. In the health technology space, diagnostic and medical imaging companies are fueling the majority of AI’s growth. There is even proof to this statement.

A cloud based AI assistant called Arterys is being used by radiologists today. Arterys received FDA clearance to analyze images of tumors in the liver and lungs making use of its Oncology AI suite in Feb 2018.

Researchers from University of Singapore made use of AI to identify and optimize different drug combinations for the treatment of myeloma.

This proves that Ai is taking personalized healthcare and precision medicine to another level.

2. Fintech

Cash advance loansmobile payments, asset management and the whole gamut of technologies have already established fintech as a disruptive force in its own right.

When you add artificial intelligence to this mix, the fintech industry begins to soar. You can now deposit checks using image recognition to your preferred bank via your mobile device.

If that wasn’t enough, the first AI powered Exchange Trade Fund was released last October.

Check out our Machine Learning and Deep Learning Services

Read More

Another example from this space is Quantopian which is a community of quantitative traders who look to test and share trading algorithms.

A few of these are powered by Machine Learning.

3. Energy

From the deep pockets of oil and gas to the rather greener pastures of renewable green tech, a lot of data is generated from this industry.

From the viewpoint of energy producers and consumers, their aim is to use the electrical grid optimally.

For this to happen, AI is ideal as it processes large datasets and produces actionable insights with ease.

Take the case of Athena which is a storage optimizer. Athena streamlines the timing of energy use by processing 400 MB of energy data per minute across close to 800 energy storage systems and saves its customers 8 million $ annually.

Another example is the smart thermostat called Nest. It reduces the consumption of energy as it adapts to the habits of the occupants of the house.

The application of AI has come so far that Great Britain’s National Grid and Google’s DeepMind are on the verge of a partnership to streamline UK’s electrical grid.

4. Enterprise

Technavio’s market research reveals that the global enterprise AI market is going to grow at 45% CAGR from 2018 to 2022.

This growth is actually due to the increasing number of businesses using customer service chatbots.

This substantial growth can also be attributed to ML solutions that help process big data. The big guns like Amazon, Alphabet, Microsoft and IBM provide and support the AI frameworks and tools which have helped bring ML to the masses.

Google uses DeepMind, AWS combined with Apache Spark and IBM Watson help big enterprise handle extremely large volumes of data.

5. Retail

The dumping ground of innovation in the AI space is the retail industry. Chatbots are retailers answer to reducing customer overheads.

Predictive analytics is used to optimize product pricing and build customer personae making use of the reservoir of data.

Ai in retail is so common today that almost all of us are aware of it. The AI technology has matured and businesses have gotten more and more creative with this technology.

Chatbots have become old and that is hard to believe. In physical retail spaces, actual AI powered robots are used to boost footfalls by 70%.

It has even gone so far that Kairos, an AI uses facial recognition to learn a customer’s preference as soon as they walk through the door and notify the store executives.

When SoftBank is willing to invest 750 million $ in Zume which a robot pizza delivery service, you definitely know that the future is now!

I just spoke about a few trends, a few industries and a few applications of AI and ML. Artificial Intelligence and Machine Learning are technologies that are revolutionizing each and every sector and industry as we know.

Leverge your Biggest Asset Data

Inquire Now

In this blog, I just scratched the surface of how far AI and ML have reached how far they can go. 

From being a myth which wasn’t to be, to becoming the most disruptive technology to ever being invented, the world of machine learning and artificial intelligence has literally become an integral part of our world and lives!

The post Top AI and ML Trends in 2019! appeared first on Indium.

]]>