AI Archives - Indium

Collaboration of Synthetics: ML’s Evolutionary Edge

Nizamuddin M — Thu, 15 Feb 2024 07:54:00 +0000

The desire for data is like an endless hole in the world of data and analytics today. The big data analytics business is predicted to reach $103 billion this year, and 181 zettabytes of data will be produced by 2025.

Despite massive data being generated, access to and availability remains a problem. Although public databases partially address this issue, certain dangers are still involved. One of them is bias caused by improper usage of data sets. The second difficulty is requiring different data to train the algorithms and satisfy real-world requirements properly. The quality of the algorithm will also be impacted by data accuracy. It is regulated to preserve privacy and might be expensive to obtain.

These problems can be resolved by using synthetic data, which enables businesses to quickly produce the data sets required to satisfy the demands of their clients. Gartner predicts that by 2030, synthetic data will likely surpass actual data in AI models, even though accurate data is still regarded as superior.

Decoding the Exceptional Synthetic Data

So, what do you get when we say synthetic data? At the forefront of modern data-driven research, institutions like the Massachusetts Institute of Technology (MIT) are pioneering the utilization of synthetic data. Synthetic data refers to artificially generated datasets that mimic real-world data distributions, maintaining statistical properties while safeguarding privacy. This innovative approach ensures that sensitive information remains confidential, as exemplified by MIT’s creation of synthetic healthcare records that retain essential patterns for analysis without compromising patient privacy. This technique’s relevance extends to various domains, from machine learning advancements to societal insights, offering a powerful tool to unlock valuable knowledge while upholding data security and ethical considerations.

Using synthetic data, new systems can be tested without live data or if the data is biased. Small datasets not being used can be supplemented, and the accuracy of learning models can be improved. Synthetic data can also be used when real data cannot be used, shared, or moved. It can create prototypes, conduct product demos, capture market trends, and prevent fraud. It can even be used to generate novel, futuristic conditions.

Most importantly, it can help businesses comply with privacy laws, mainly health-related and personal data. It can reduce the bias in data sets by providing diverse data that reflects the real world better.

Use Cases of Synthetic Data

Synthetic data can be used in different industries for different use cases. For instance, computer graphics and image processing algorithms can generate synthetic images, audio, and video that can be used for training purposes.

Synthetic text data can be used for sentiment analysis or for building chatbots and machine translation algorithms. Synthetically generated tabular data sets are used in data analytics and training models. Unstructured data, including images, audio, and video, are being leveraged for speech recognition, computer vision, and autonomous vehicle technology. Financial institutions can use synthetic data to detect fraud, manage risks, and assess credit risk. In the manufacturing industry, it can be used for quality control testing and predictive maintenance.

Also read: The Transformative Impact Of Generative AI On The Future Of Work.

Generating Synthetic Data
How synthetic data is generated will depend on the tools and algorithms used and the use case for which it is created. Three of the popular techniques used include:

Technique #1 Random Selection of Numbers: One standard method is randomly selecting numbers from a distribution. Though this may not provide insights like real-world data, the data distribution matches it closely.

Technique #2 Generating Agent-based Models: Unique agents are created using simulation techniques to enable them to communicate with each other. This is especially useful in complex systems where multiple agents, such as mobile phones, apps, and people, are required to interact with each other. Pre-built core components and Python packages such as Mesa are used to develop the models quickly, and a browser-based interface is used to view them.

Technique #3 Generative Models: Synthetic data replicating real-world data’s statistical properties or features is generated using algorithms. Training data learns the statistical patterns and relationships in the data and generates new synthetic data similar to the original. Generative adversarial networks and variational autoencoders are examples of generative models.

The model quality should be reliable to ensure the quality of synthetic data. Additional verification is required and involves comparing the model results with the real-world data that has been annotated manually. Users must be sure that the synthetic data is not misleading, reliable, and 100% fail-safe for privacy.

Synthetic Data with Databricks

Databricks offers dbldatagen, a Python library, to generate synthetic data for testing, creating POCs, and other uses such as Delta Live Tables pipelines in Databricks environments. It helps to:

● Create unique values for a column.
● Allow templated text generation based on specifications.
● Generate data from a specific set of values.
● Generate weighted data in case of repeating values.
● The data generated in a data frame can be written to storage in any format.
● Billions of rows of data can be generated quickly.
● A random seed can be used to generate data based on the value of other fields.

To learn more about Indium Software, please visit

Click Here

The post Collaboration of Synthetics: ML’s Evolutionary Edge appeared first on Indium.

BFSI’s Tech Ride with NLP and Sentiment Analysis! Chatting with Erica, EVA, Amy, and Aida.

Abishek Balakumar — Tue, 17 Oct 2023 09:50:00 +0000

Have you crossed paths with Erica from Bank of America, EVA from HDFC, Amy from HSBC, or Aida from SEB in Sweden?

If you’ve been dealing with banks and financial organizations, chances are you’ve chatted with these super-smart virtual assistants and chatbots. The use of Natural Language Processing (NLP) in the financial sector has been on the rise worldwide. More and more financial institutions are embracing advanced tech innovations, taking NLP beyond banking, insurance, and hedge funds (especially for sentiment analysis).

Artificial Intelligence and Machine Learning, alongside NLP, are making their mark in various areas of the financial sector like, operations, risk assessment, sales, research and development, customer support, and many other fields. This expansion boosts efficiency, productivity, cost-effectiveness, and time and resource management.

Take, for instance, the convenience it brings: Instead of the hassle of logging into individual accounts to check your balance, users can now effortlessly access their account information through chatbots and voice assistants. These digital companions are everywhere, from chatbots to voice assistants like Amazon Alexa, Google Assistant, and Siri.

Sentiment Analysis, often hailed as the next game-changer in the finance sector, plays a central role in chatbots, voice assistants, text analysis, and NLP technology. It’s a key component of natural language processing used to decipher the sentiments behind data. Companies frequently employ sentiment analysis on various text sources such as customer reviews, social media conversations, support tickets, and more to uncover genuine customer sentiments and evaluate brand perception.

Sentiment analysis aids in recognizing the polarity of information (positive or negative), emotional cues (like anger, happiness, or sadness), and intent (e.g., interest or disinterest). It is crucial in brand reputation management by providing insights into overall customer attitudes, challenges, and needs. This allows for data categorization by different sentiments, resulting in more accurate predictions and informed strategic decisions.

So, how can BFSI make the most of sentiment analysis? This emerging field has firmly rooted itself in the financial industry. Banks and financial institutions can employ AI-driven sentiment analysis systems to understand customer opinions regarding their financial products and the overall brand perception.

Of course, this approach may necessitate a certain level of data proficiency that financial companies must acquire before launching full-fledged sentiment analysis projects. Sentiment analysis stands as a highly promising domain within NLP and is undoubtedly poised to play a substantial role in the future of financial services.

Here, we’ll delve into the seven most prominent applications of sentiment analysis in financial services.

1. Portfolio Management and Optimization: NLP can help financial professionals analyze vast amounts of textual data from financial news and market trends to assess the sentiment surrounding specific investments. This sentiment analysis can aid in making informed decisions about portfolio management, identifying potential risks, and optimizing investment strategies.
2. Financial Data Analytics: Sentiment analysis enables financial firms to gauge the market’s sentiment toward specific assets or companies by analyzing news articles, social media, and reports. This information can be used to assess the volatility of investments and make data-driven decisions.
3. Predictive Analysis: NLP can be used to analyze historical data and predict the future performance of investment funds. This involves assessing sentiment and other textual data to identify high-risk investments and optimize growth potential, even in uncertain market conditions.
4. Customer Services and Analysis: Financial institutions employ NLP-driven chatbots and virtual assistants to enhance customer service. These AI-driven tools use NLP to process and understand customer queries, improving customer experience and satisfaction.
5. Gathering Customer Insights: By applying sentiment analysis and intelligent document search, financial firms can gain insights into customer preferences, challenges, and overall sentiments. This information is valuable for personalizing offers, measuring customer response, and refining products and services.
6. Researching Customer Emotional Responses: AI-powered tools process vast amounts of customer data, such as social media posts, chatbot interactions, reviews, and survey responses, to determine customer sentiments. This allows companies to better understand customer attitudes toward their products, services, and brands and analyze responses to competitors’ campaigns.
7. Credit Market Monitoring: Sentiment analysis tracks credit sentiments in the media. Financial institutions can use NLP to process information from news articles and press releases to monitor the sentiment related to specific bonds or organizations. This data can reveal correlations between media updates and credit securities’ market performance, streamlining financial research efforts.

Future of NLP – Sentimental Analysis: Where does it stand today and tomorrow?

NLP has made significant strides in the banking and financial sector, supporting various services. It enables real-time insights from call transcripts, data analysis with grammatical parsing, and contextual analysis at the paragraph level. NLP solutions extract and interpret data to provide in-depth insights into profitability, trends, and future business performance in the market.

Soon, we can anticipate NLP, alongside NLU and NLG, being extensively applied to sentiment analysis and coherence resolution, further enhancing its role in this domain.

Training computers to comprehend and process text and speech inputs is pivotal in elevating business intelligence. Driven by escalating demand, Natural Language Processing (NLP) has emerged as one of AI’s most rapidly advancing subsectors. Experts anticipate reaching a global market value of $239.9 billion by 2032, boasting a robust Compound Annual Growth Rate (CAGR) of 31.3%, per Allied Market Research.

NLP-based sentiment analysis is an innovative technique that enables financial companies to effectively process and structure extensive volumes of customer data, yielding maximum benefits for both banks and customers. This technology is positioned to empower traditional financial institutions and neo-banks alike, as it enhances current customer experiences, diminishes friction in financial services, and facilitates the creation of superior financial products.

In the finance and banking sectors, NLP is harnessed to streamline repetitive tasks, reduce errors, analyze sentiments, and forecast future performance by drawing insights from historical data. Such applications enable firms to realize time and cost savings, enhance productivity and efficiency, and uphold the delivery of quality services.

The post BFSI’s Tech Ride with NLP and Sentiment Analysis! Chatting with Erica, EVA, Amy, and Aida. appeared first on Indium.

How the SDOH machine learning model improves patients’ health and your bottom line

Uma Raj — Thu, 24 Aug 2023 12:36:50 +0000

Preventive care management—Transcending traditional ways

The healthcare paradigm is shifting from a reactive approach to a proactive and holistic model. Preventive care is important for staying healthy and identifying problems early before they lead to other complications or become more difficult to treat. While early intervention has proven instrumental in advancing diagnostics and treatments, a critical element has been missing until now: the incorporation of social determinants of health (SDOH). Recognizing that health outcomes are intricately woven into the fabric of our lives, the integration of SDOH into preventive care emerges as a transformative solution.

Beyond genetics and clinical data, social determinants encompass factors like socioeconomic status, living conditions, education, and access to nutritious food. By embedding these key influencers into preventive care, healthcare providers gain an unprecedented understanding of their patients’ lives, empowering them to offer personalized and proactive interventions.

Discover the transformative potential of our Social Determinants of Health (SDOH) model and its ability to revolutionize patient care while driving significant cost savings for payers and providers.

Download White Paper

Social Determinants of Health: Impact on healthcare outcomes

The non-medical elements that affect health outcomes are referred to as social determinants of health (SDOH). Socioeconomic position, education, physical environment and neighborhood, job, and social support systems are a few of these variables. SDOH has a major effect on health and can impact healthcare outcomes in a number of ways.

For example, a patient with a lower socioeconomic status is more likely to have chronic diseases, such as diabetes and heart ailment. By understanding this patient’s social determinants, a healthcare provider can recommend preventive care measures that are tailored to their needs, such as financial assistance for medication or enrolling them in wellness programs.

Patient 360: A holistic view of patient data

Patient 360 is a comprehensive view of a patient’s health information, including their medical history, social determinants, and other relevant data. By integrating SDOH into patient 360, healthcare providers can gain a better understanding of the factors that are affecting their patients’ health and make more informed decisions about preventive care.

Here are some of the benefits of leveraging SDOH parameters in the patient 360 framework:

Better patient care: Integrating SDOH elements into the patient 360 approach helps improve treatment efficiency by empowering physicians to address the factors that influence healthcare outcomes. This can save time and resources, which can be used to provide better care for patients.

Enhanced patient engagement: Addressing SDOH factors helps enhance patient engagement by giving patients more awareness of their health data. This can lead to patients being more involved in their care management and being more likely to follow treatment plans.

Clinical notes to actionable insights: Physician notes record important patient medical histories, symptoms, demographics, and clinical data. These observations provide a holistic picture of the patient’s health. SDOH factors are important predictors of preventive care needs, which is why it is important to include them in patient records.

The integration of SDOH into patient 360 is a promising way to improve preventive care and achieve better health outcomes for all patients.

Manual SDOH data extraction: Typical challenges in the current system

Manually extracting social determinants of health (SDOH) elements, poses numerous challenges that can hinder the efficiency and accuracy of the process. SDOH data is often embedded in unstructured sources such as physician notes, medical records, or social service assessments, making it laborious and time-consuming for healthcare professionals to extract relevant information. Here are some of the difficulties associated with manual data extraction for SDOH:

Unstructured data: SDOH elements are often scattered throughout free-text narratives, that lack a standardized format.

Human error: Human analysts are susceptible to making errors during data extraction, leading to inaccuracies in the collected information.

Incomplete data capture: Due to the sheer volume of information, manually extracting SDOH elements from various sources may result in incomplete data capture.

Limited scalability: As healthcare organizations grow and data volumes increase, manual data extraction becomes less scalable and impractical.

Cracking the code of health: Indium’s SDOH machine learning model

Indium’s expertise in developing the SDOH ML model is based on two pillars: NLP technology and a deep understanding of the healthcare landscape. With a team of experts in data science, engineering, and healthcare, Indium is at the forefront of using AI to transform preventive care.

Indium’s journey began with a recognition of the importance of social factors in determining health outcomes. The company’s ML model is designed to identify and address these factors, which can help improve the health of individuals and communities. Recognizing that manually extracting these factors from unstructured physician notes is labor-intensive and prone to errors, Indium sought to create an efficient and accurate solution. Leveraging Natural Language Processing (NLP) techniques, the team precisely crafted a robust ML model that swiftly identifies key social determinants hidden within vast amounts of textual data.

The success of Indium’s SDOH ML model lies in its ability to provide healthcare providers and payers with invaluable insights. By seamlessly integrating social determinants into preventive care, the model empowers stakeholders to offer personalized preventive interventions, optimize patient care, and drive cost savings within the healthcare ecosystem.

Uncover the unique insights and benefits our SDOH model offers, and witness how it can be seamlessly integrated into existing healthcare systems to optimize care delivery.

Download White Paper

SDOH ML model

ML techniques can be used to identify and extract SDOH from physician notes. These techniques can identify patterns in text, such as the presence of certain words or phrases that are associated with SDOH. For example, the phrase “food insecurity” might be associated with the SDOH of food insecurity. By using the SDOH ML model, healthcare providers can make right interventions to help improve healthcare outcomes and reduce costs.

Once SDOH have been identified and extracted from physician notes, they can be integrated into preventive care management. This information can be used to provide a more comprehensive understanding of the patient’s overall well-being and to develop a more personalized treatment plan.

The power of precision: Partner with Indium

As a leading healthcare service provider and a leader in the digital engineering space, Indium has developed the SDOH machine learning model. Understanding the profound influence that social factors have on health outcomes, and recognizing the value of this information is crucial to bring transformative advancements in patient care, the SDOH model is trained to accurately extract social factors from patient records. Beyond improving patient care, the integration of social determinants also serves as a strategic tool in reducing healthcare costs by proactively addressing health issues. Unlike the traditional method, our model is 90% accurate and can identify SDOH attributes from thousands of patient records in a matter of seconds.

Want to learn in detail about how our SDOH model empowers payers and providers to transform patient care and drive significant cost savings?

Download White Paper

The post How the SDOH machine learning model improves patients’ health and your bottom line appeared first on Indium.

Maximizing AI and ML Performance: A Guide to Effective Data Collection, Storage, and Analysis

Kavitha V Amara — Fri, 12 May 2023 11:42:41 +0000

Data is often referred to as the new oil of the 21st century. Because it is a valuable resource that powers the digital economy in a similar way that oil fueled the industrial economy of the 20th century. Like oil, data is a raw material that must be collected, refined, and analyzed to extract its value. Companies are collecting vast amounts of data from various sources, such as social media, internet searches, and connected devices. This data can then be used to gain insights into customer behavior, market trends, and operational efficiencies.

In addition, data is increasingly being used to power artificial intelligence (AI) and machine learning (ML) systems, which are driving innovation and transforming businesses across various industries. AI and ML systems require large amounts of high-quality data to train models, make predictions, and automate processes. As such, companies are investing heavily in data infrastructure and analytics capabilities to harness the power of data.

Data is also a highly valuable resource because it is not finite, meaning that it can be generated, shared, and reused without diminishing its value. This creates a virtuous cycle where the more data that is generated and analyzed, the more insights can be gained, leading to better decision-making, increased innovation, and new opportunities for growth. Thus, data has become a critical asset for businesses and governments alike, driving economic growth and shaping the digital landscape of the 21st century.

There are various data storage methods in data science, each with its own strengths and weaknesses. Some of the most common data storage methods include:

Relational databases: Relational databases are the most common method of storing structured data. They are based on the relational model, which organizes data into tables with rows and columns. Relational databases use SQL (Structured Query Language) for data retrieval and manipulation and are widely used in businesses and organizations of all sizes.

NoSQL databases: NoSQL databases are a family of databases that do not use the traditional relational model. Instead, they use other data models such as document, key-value, or graph-based models. NoSQL databases are ideal for storing unstructured or semi-structured data and are used in big data applications where scalability and flexibility are key.

Data warehouses: Data warehouses are specialized databases that are designed to support business intelligence and analytics applications. They are optimized for querying and analyzing large volumes of data and typically store data from multiple sources in a structured format.

Data lakes: Data lakes are a newer type of data storage method that is designed to store large volumes of raw, unstructured data. Data lakes can store a wide range of data types, from structured data to unstructured data such as text, images, and videos. They are often used in big data and machine learning applications.

Cloud-based storage: Cloud-based storage solutions, such as Amazon S3, Microsoft Azure, or Google Cloud Storage, offer scalable, secure, and cost-effective options for storing data. They are especially useful for businesses that need to store and access large volumes of data or have distributed teams that need access to the data.

To learn more about : How AI and ML models are assisting the retail sector in reimagining the consumer experience.

Data collection is an essential component of data science and there are various techniques used to collect data. Some of the most common data collection techniques include:

Surveys: Surveys involve collecting information from a sample of individuals through questionnaires or interviews. Surveys are useful for collecting large amounts of data quickly and can provide valuable insights into customer preferences, behavior, and opinions.

Experiments: Experiments involve manipulating one or more variables to measure the impact on the outcome. Experiments are useful for testing hypotheses and determining causality.

Observations: Observations involve collecting data by watching and recording behaviors, actions, or events. Observations can be useful for studying natural behavior in real-world settings.

Interviews: Interviews involve collecting data through one-on-one conversations with individuals. Interviews can provide in-depth insights into attitudes, beliefs, and motivations.

Focus groups: Focus groups involve collecting data from a group of individuals who participate in a discussion led by a moderator. Focus groups can provide valuable insights into customer preferences and opinions.

Social media monitoring: Social media monitoring involves collecting data from social media platforms such as Twitter, Facebook, or LinkedIn. Social media monitoring can provide insights into customer sentiment and preferences.

Web scraping: Web scraping involves collecting data from websites by extracting information from HTML pages. Web scraping can be useful for collecting large amounts of data quickly.

Data analysis is an essential part of data science and there are various techniques used to analyze data. Some of the top data analysis techniques in data science include:

Descriptive statistics: Descriptive statistics involve summarizing and describing data using measures such as mean, median, mode, variance, and standard deviation. Descriptive statistics provide a basic understanding of the data and can help identify patterns or trends.

Inferential statistics: Inferential statistics involve making inferences about a population based on a sample of data. Inferential statistics can be used to test hypotheses, estimate parameters, and make predictions.

Data visualization: Making charts, graphs, and other visual representations of data to better understand patterns and relationships is known as data visualization. Data visualization is helpful for expressing complex information and spotting trends or patterns that might not be immediately apparent from the data.

Machine learning: Machine learning involves using algorithms to learn patterns in data and make predictions or decisions based on those patterns. Machine learning is useful for applications such as image recognition, natural language processing, and recommendation systems.

Text analytics: Text analytics involves analyzing unstructured data such as text to identify patterns, sentiment, and topics. Text analytics is useful for applications such as customer feedback analysis, social media monitoring, and content analysis.

Time series analysis: Time series analysis involves analyzing data over time to identify trends, seasonality, and cycles. Time series analysis is useful for applications such as forecasting, trend analysis, and anomaly detection.

Use Cases

To illustrate the importance of data in AI and ML, let’s consider a few use cases:

Predictive Maintenance: In manufacturing, AI and ML can be used to predict when machines are likely to fail, enabling organizations to perform maintenance before a breakdown occurs. To achieve this, the algorithms require vast amounts of data from sensors and other sources to learn patterns that indicate when maintenance is necessary.
Fraud Detection: AI and ML can also be used to detect fraud in financial transactions. This requires large amounts of data on past transactions to train algorithms to identify patterns that indicate fraudulent behavior.
Personalization: In e-commerce, AI and ML can be used to personalize recommendations and marketing messages to individual customers. This requires data on past purchases, browsing history, and other customer behaviors to train algorithms to make accurate predictions.

Real-Time Analysis

To achieve optimal results in AI and ML applications, data must be analyzed in real-time. This means that organizations must have the infrastructure and tools necessary to process large volumes of data quickly and accurately. Real-time analysis also requires the ability to detect and respond to anomalies or unexpected events, which can impact the accuracy of the algorithms.

Wrapping Up

In conclusion, data is an essential component of artificial intelligence (AI) and machine learning (ML) applications. Collecting, storing, and analyzing data effectively is crucial to maximizing the performance of AI and ML systems and obtaining optimal results. Data visualization, machine learning, time series analysis, and other data analysis techniques can be used to gain valuable insights from data and make data-driven decisions.

No matter where you are in your transformation journey, contact us and our specialists will help you make technology work for your organization.

Click here

The post Maximizing AI and ML Performance: A Guide to Effective Data Collection, Storage, and Analysis appeared first on Indium.

Revolutionizing Data Warehousing: The Role of AI & NLP

Kavitha V Amara — Wed, 10 May 2023 13:07:04 +0000

In today’s quick-paced, real-time digital era, does the data warehouse still have a place?Absolutely! Despite the rapid advancements in technologies such as AI and NLP, data warehousing continues to play a crucial role in today’s fast-moving, real-time digital enterprise. Gone are the days of traditional data warehousing methods that relied solely on manual processes and limited capabilities. With the advent of AI and NLP, data warehousing has transformed into a dynamic, efficient, and intelligent ecosystem, empowering organizations to harness the full potential of their data and gain invaluable insights.

The integration of AI and NLP in data warehousing has opened new horizons for organizations, enabling them to unlock the hidden patterns, trends, and correlations within their data that were previously inaccessible. AI, with its cognitive computing capabilities, empowers data warehousing systems to learn from vast datasets, recognize complex patterns, and make predictions and recommendations with unprecedented accuracy. NLP, on the other hand, enables data warehousing systems to understand, analyze, and respond to human language, making it possible to derive insights from non-formatted data sources such as social media posts, customer reviews, and textual data.

The importance of AI and NLP in data warehousing cannot be overstated. These technologies are transforming the landscape of data warehousing in profound ways, offering organizations unparalleled opportunities to drive innovation, optimize operations, and gain a competitive edge in today’s data-driven business landscape.

Challenges Faced by C-Level Executives

Despite the immense potential of AI and NLP in data warehousing, C-level executives face unique challenges when it comes to implementing and leveraging these technologies. Some of the key challenges include:

Data Complexity: The sheer volume, variety, and velocity of data generated by organizations pose a significant challenge in terms of data complexity. AI and NLP technologies need to be able to handle diverse data types, formats, and sources, and transform them into actionable insights.

Data Quality and Accuracy: The accuracy and quality of data are critical to the success of AI and NLP in data warehousing. Ensuring data accuracy, consistency, and integrity across different data sources can be a daunting task, requiring robust data governance practices.

Talent and Skills Gap: Organizations face a shortage of skilled professionals who possess the expertise in AI and NLP, making it challenging to implement and manage these technologies effectively. C-level executives need to invest in building a skilled workforce to leverage the full potential of AI and NLP in data warehousing.

Ethical and Legal Considerations: The ethical and legal implications of using AI and NLP in data warehousing cannot be ignored. Organizations need to adhere to data privacy regulations, ensure transparency, and establish ethical guidelines for the use of AI and NLP to avoid potential risks and liabilities.

Also check out our Success Story on Product Categorization Using Machine Learning To Boost Conversion Rates.

The Current State of Data Warehousing

Increasing Data Complexity: In today’s data-driven world, organizations are grappling with vast amounts of data coming from various sources such as social media, IoT devices, and customer interactions. This has led to data warehousing becoming more complex and challenging to manage.

Manual Data Processing: Traditional data warehousing involves manual data processing, which is labor-intensive and time-consuming. Data analysts spend hours sifting through data, which can result in delays and increased chances of human error.

Limited Insights: Conventional data warehousing provides limited insights, as it relies on predefined queries and reports, making it difficult to discover hidden patterns and insights buried in the data.

Language Barriers: Data warehousing often faces language barriers, as data is generated in various languages, making it challenging to process and analyze non-English data.

The Future of Data Warehousing

Augmented Data Management: AI and NLP are transforming data warehousing with augmented data management capabilities, including automated data integration, data profiling, data quality assessment, and data governance.
Automation with AI & NLP: The future of data warehousing lies in leveraging the power of AI and NLP to automate data processing tasks. AI-powered algorithms can analyze data at scale, identify patterns, and provide real-time insights, reducing manual efforts and improving efficiency.
Enhanced Data Insights: With AI and NLP, organizations can gain deeper insights from their data. These technologies can analyze unstructured data, such as social media posts or customer reviews, to uncover valuable insights and hidden patterns that can inform decision-making.
Advanced Language Processing: NLP can overcome language barriers in data warehousing. It can process and analyze data in multiple languages, allowing organizations to tap into global markets and gain insights from multilingual data.
Predictive Analytics: AI and NLP can enable predictive analytics in data warehousing, helping organizations forecast future trends, identify potential risks, and make data-driven decisions proactively. Example: By using predictive analytics through AI and NLP, a retail organization can forecast the demand for a particular product during a particular time and adjust their inventory levels accordingly, reducing the risk of stock outs and improving customer satisfaction.

Discover how Indium Software is harnessing the power of AI & NLP for data warehousing.

Conclusion

In conclusion, AI and NLP are reshaping the landscape of data warehousing, enabling automation, enhancing data insights, overcoming language barriers, and facilitating predictive analytics. Organizations that embrace these technologies will be better positioned to leverage their data for competitive advantage in the digital era. At Indium Software, we are committed to harnessing the power of AI and NLP to unlock new possibilities in data warehousing and help businesses thrive in the data-driven world.

The post Revolutionizing Data Warehousing: The Role of AI & NLP appeared first on Indium.

Training Custom Machine Learning Model on Vertex AI with TensorFlow

Ganesh Ghadge — Fri, 03 Feb 2023 12:11:24 +0000

“Vertex AI is Googles platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

AutoML vs Custom Training

To quickly compare AutoML and custom training functionality, and expertise required, check out the following table given by Google.

Choose a training method | Vertex AI | Google Cloud

In this article we are going to train the Custom Machine Learning Model on Vertex AI with TensorFlow.

To know about Vertex AI’s AutoML feature read my previous blog : Machine Learning using Google’s Vertex AI.

About Dataset

We will be using Crab Age Prediction dataset from Kaggle. The dataset is used to estimate the age of the crab based on the physical attributes.

To learn more about how our AI and machine learning capabilities can assist you.

Click here

There are 9 columns in the Dataset as follows.

Sex: Crab gender (Male, Female and Indeterminate)
Length: Crab length (in Feet; 1 foot = 30.48 cms)
Diameter: Crab Diameter (in Feet; 1 foot = 30.48 cms)
Height: Crab Height (in Feet; 1 foot = 30.48 cms)
Weight: Crab Weight (in ounces; 1 Pound = 16 ounces)
Shucked Weight: Without Shell Weight (in ounces; 1 Pound = 16 ounces)
Viscera Weight: Viscera Weight
Shell Weight: Shell Weight (in ounces; 1 Pound = 16 ounces)
Age: Crab Age (in months)

We must predict the Age column with the help of the rest of the columns.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from Kaggle. There is only one csv file in the downloaded dataset called CrabAgePrediction.csv, I have uploaded this csv to the bucket called vertex-ai-custom-ml on Google Cloud Storage.

Step 2: Working on Workbench

Go to Vertex AI, then to Workbench section and enable the Notebook API. Then click on New Notebook and select TensorFlow Enterprise, we are using TensorFlow Enterprise 2.6 without GPU for the project. Make sure to select us-central1 (Iowa) region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

In the JupyterLabopen the Terminal and Run following cmd one by one.

mkdir crab_folder # This will create crab_folder

cd crab_folder # To enter the folder

mkdir trainer # This will create trainer folder

touch Dockerfile # This will create a Dockerfile

We can see all the files and folder on the left side of the JupyterLab, from that open the Dockerfile and start editing with following lines of code.

FROM gcr.io/deeplearning-platform_release/tf2-cpu.2-6

WORKDIR /

COPY trainer /trainer

ENTRYPOINT [“python”,”-m”,”trainer.train”]

Now save the Docker file and with this we have given the Entrypoint for the docker file.

To save the model’s output, we’ll make a bucket called crab-age-pred-bucket.

For the model training file, I have already uploaded the python file into the GitHub Repository. To clone this Repository, click on the Git from the top of JupyterLab and select Clone a Repository and paste the repository link and hit clone.

In the Lab, we can see the crab-age-pred folder; copy the train.py file from this folder to crab_folder/ trainer /.

Let’s look at the train.py file before we create the Docker IMAGE.

#Importing the required packages..

import numpy as np

import pandas as pd

import pathlib

import tensorflow as tf

#Importing tensorflow 2.6

from tensorflow import keras

from tensorflow.keras import layers

print(tf.__version__)

#Reading data from the gcs bucket

dataset = pd.read_csv(r”gs://vertex-ai-custom/CrabAgePrediction.csv”)

dataset.tail()

BUCKET = ‘gs://vertex-ai-123-bucket’

dataset.isna().sum()

dataset = dataset.dropna()

#Data transformation..

dataset = pd.get_dummies(dataset, prefix=”, prefix_sep=”)

dataset.tail()

#Dataset splitting..

train_dataset = dataset.sample(frac=0.8,random_state=0)

test_dataset = dataset.drop(train_dataset.index)

train_stats = train_dataset.describe()

#Removing age column, since it is a target column

train_stats.pop(“Age”)

train_stats = train_stats.transpose()

train_stats

#Removing age column from train and test data

train_labels = train_dataset.pop(‘Age’)

test_labels = test_dataset.pop(‘Age’)

def norma_data(x):

#To normalise the numercial values

return (x – train_stats[‘mean’]) / train_stats[‘std’]

normed_train_data = norma_data(train_dataset)

normed_test_data = norma_data(test_dataset)

def build_model():

#model building function

model = keras.Sequential([

layers.Dense(64, activation=’relu’, input_shape=[len(train_dataset.keys())]),

layers.Dense(64, activation=’relu’),

layers.Dense(1)

])

optimizer = tf.keras.optimizers.RMSprop(0.001)

model.compile(loss=’mse’,

optimizer=optimizer,

metrics=[‘mae’, ‘mse’])

return model

#model = build_model()

#model.summary()

model = build_model()

EPOCHS = 10

early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10)

early_history = model.fit(normed_train_data, train_labels,

epochs=EPOCHS, validation_split = 0.2,

callbacks=[early_stop])

model.save(BUCKET + ‘/model’)

Summary of train.py

When all of the necessary packages are imported, TensorFlow 2.6 will be used for modelling. The pandas command will be used to read the stored csv file in the vertex-ai-custom-ml bucket, and the BUCKET variable will be used to specify the bucket where we will store the train model.

We are doing some transformation such as creating dummy variable for the categorical column. Next, we are splitting the data into training and testing and normalizing the data.

We wrote a function called build_model that includes a simple two-layer tensor flow model. The model will be constructed using ten EPOCHS. We have to save the model in the crab-age-pred-bucket/model file on Data storage and see it has been educated.

Now, in the JupyterLab Terminal, execute the following cmd one by one to create a Docker IMAGE.

PROJECT_ID=crab-age-pred

IMAGE_URI=”gcr.io/$ PROJECT_ID/crab:v1”

docker build ./ -t $IMAGE_URI

Before running the build command make sure to enable the Artifact Registry API and Google Container Registry API by going to the APIs and services in Vertex AI.

After running the CMD our Docker Image is built successfully. Now we will push the docker IMAGE with following cmd.

docker push $IMAGE_URI

Once pushed we can see our Docker IMAGE in the Container registry. To find the Container registry you can search it on Vertex AI.

Best Read: Our success story about how we assisted an oil and gas company, as well as Nested Tables and Machine Drawing Text Extraction

Step 3: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In Datasets select no managed dataset and click continue.

In Model details I have given the model’s name as “pred-age-crab” and in advance option select the available service account. For rest keep default. Make sure that the service account has the cloud storage permissions if not give the permissions from IAM and Admin section.

Select the custom container for the Container image in the Training container. Navigate to and select the newly created Docker image. Next, navigate to and select the crab-age-pred-bucket in the Model output directory. Now press the continue button.

Ignore any selections for Hyperparameters and click Continue.

In Compute and pricing, Select the machine type n1-standard-32, 32 vCPUs, 120 GiB memory and hit continue.

For Prediction Container select Pre-Built container with TensorFlow Framework 2.6 and start the model training.

You can see the model in training in the Training section.

In about 8 minutes, our custom model training is finished.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. The region should be us-central1.

Give crab_age_pred as the name of Endpoint and click Continue.

In the Model Settings select pred_age_crab as Model Name, Version 1 as Version and 2 as number of compute nodes, n1-standard-8, 8 vCPUs, 30 GiB memory as Machine Type and select service account. Click Done and Create.

In Model monitoring ignore this selection and click create to implement the version.

It may take 11 minutes to deploy the model.

With the above step our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can make predictions. For this project we are going to use Python to make predictions. We will need to give the Vertex AI Admin and Cloud Storage Admin permissions to the service account. We can do that in the IAM and administration section of Google cloud. Once the permissions are given, we will download the key of the service account in JSON format, it will be useful in authenticating the OS.

Following is the code used for the prediction.

pip install google-cloud-aiplatform

from typing import Dict

from google.cloud import aiplatform

from google.protobuf import json_format

from google.protobuf.struct_pb2 import Value

import os

def predict_tabular_sample(

project: str,

endpoint_id: str,

instance_dict: Dict,

location: str = “us-central1”,

api_endpoint: str = “us-central1-aiplatform.googleapis.com”):

# The AI Platform services require regional API endpoints.

client_options = {“api_endpoint”: api_endpoint}

# Initialize client that will be used to create and send requests.

# This client only needs to be created once, and can be reused for multiple requests.

client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

# for more info on the instance schema, please use get_model_sample.py

# and look at the yaml found in instance_schema_uri

instance = json_format.ParseDict(instance_dict, Value())

instances = [instance]

parameters_dict = {}

parameters = json_format.ParseDict(parameters_dict, Value())

endpoint = client.endpoint_path(

project=project, location=location, endpoint=endpoint_id

)

response = client.predict(

endpoint=endpoint, instances=instances, parameters=parameters

)

predictions = response.predictions

print(predictions)

#Authentication using service account.

#We are giving the path to the JSON key

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/crab-age-pred-7c1b7d9be185.json”

#normalized values

inputs =[0,0,1,1.4375,1.175,0.4125,0.63571550,0.3220325,1.5848515,0.747181]

project_id = “crab-age-pred” #Project Id from the Vertex AI

endpoint_id = 7762332189773004800 #Endpoint Id from the Enpoints Section

predict_tabular_sample(project_id,endpoint_id,inputs)

Output

[[8.01214314]]

This is how we can make the predictions. For the inputs make sure to do the same transformation and normalizing which we have done for the training data.

With this we have completed the project and learned how to train, deploy and to get predictions of the custom trained ML model.

I hope you will find it useful.

See you again.

The post Training Custom Machine Learning Model on Vertex AI with TensorFlow appeared first on Indium.

Kubeflow Pipeline on Vertex AI for Custom ML Models

Ganesh Ghadge — Thu, 02 Feb 2023 11:56:32 +0000

What is Kubeflow?

“Kubeflow is an open-source project created to help deployment of ML pipelines. It uses components as python functions for each step of pipeline. Each component runs on the isolated container with all the required libraries. It runs all the components in the series one by one.”

In this article we are going to train a custom machine learning model on Vertex AI using Kubeflow Pipeline.

About Dataset

Credit Card Customers dataset from Kaggle will be used. The 10,000 customer records in this dataset include columns for age, salary, marital status, credit card limit, credit card category, and other information. In order to predict the customers who are most likely to leave, we must analyse the data to determine the causes of customer churn.

Interesting Read: In the world of hacking, we’ve reached the point where we’re wondering who is a better hacker: humans or machines.

Let’s Start

Custom Model Training

Step 1: Getting Data

We will download the dataset from GitHub. There are two csv files in the downloaded dataset called churner_p1 and churner_p2, I have created a Big Query dataset credit_card_churn with the tables as churner_p1 and churner_p2 with this csv files. I have also created the bucket called credit-card-churn on Cloud Storage. This bucket will be used to store the artifacts of the pipeline

Step 2: Employing Workbench

Enable the Notebook API by going to Vertex AI and then to the Workbench section. Then select Python 3 by clicking on New Notebook. Make sure to choose the us-central1 region.

It will take a few minutes to create the Notebook instance. Once the notebook is created click on the Open JupyterLab to launch the JupyterLab.

We will also have to enable the following APIs from API and services section of Vertex AI.

Artifact Registry API
Container Registry API
AI Platform API
ML API
Cloud Functions API
Cloud Build API

Now click on the Python 3 to open a jupyter notebook in the JupyterLab Notebook section and run the below code cells.

USER_FLAG = “–user”

!pip3 install {USER_FLAG} google-cloud-aiplatform==1.7.0

!pip3 install {USER_FLAG} kfp==1.8.9

This will install google cloud AI platform and Kubeflow packages. Make sure to restart the kernel after the packages are installed.

import os

PROJECT_ID = “”

# Get your Google Cloud project ID from gcloud

if not os.getenv(“IS_TESTING”):

shell_output=!gcloud config list –format ‘value(core.project)’ 2>/dev/null

PROJECT_ID = shell_output[0]

print(“Project ID: “, PROJECT_ID)

Create the variable PROJECT_ID with the name of project.

BUCKET_NAME=”gs://” + PROJECT_ID

BUCKET_NAME

Create the variable BUCKET_NAME, this will return the same bucket name we have created earlier.

import matplotlib.pyplot as plt

import pandas as pd

from kfp.v2 import compiler, dsl

from kfp.v2.dsl import pipeline, component, Artifact, Dataset, Input, Metrics, Model, Output, InputPath, OutputPath

from google.cloud import aiplatform

# We’ll use this namespace for metadata querying

from google.cloud import aiplatform_v1

PATH=%env PATH

%env PATH={PATH}:/home/jupyter/.local/bin

REGION=”us-central1″

PIPELINE_ROOT = f”{BUCKET_NAME}/pipeline_root/”

PIPELINE_ROOT

This will import required packages and create the pipeline folder in the credit-card-churn bucket.

#First Component in the pipeline to fetch data from big query.

#Table1 data is fetched

@component(

packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

base_image=”python:3.9″,

output_component_file=”dataset_creating_1.yaml”

)

def get_data_1(

bq_table: str,

output_data_path: OutputPath(“Dataset”)

from google.cloud import bigquery

import pandas as pd

bqclient = bigquery.Client()

table = bigquery.TableReference.from_string(

bq_table

)

rows = bqclient.list_rows(

table

)

dataframe = rows.to_dataframe(

create_bqstorage_client=True,

)

dataframe.to_csv(output_data_path)

The first component of the pipeline will fit the data from the table churner_p1 from big query and pass the csv file as the output for the next component. The structure is the same for every component. We have used the @component decorator to install the required packages and specify the base image and output file, then we create the get_data_1 function to get the data from big query.

#Second Component in the pipeline to fetch data from big query.

#Table2 data is fetched

#First component and second component doesnt need inputs from any components

@component(

packages_to_install=[“google-cloud-bigquery==2.34.2”, “pandas”, “pyarrow”],

base_image=”python:3.9″,

output_component_file=”dataset_creating_2.yaml”

)

def get_data_2(

bq_table: str,

output_data_path: OutputPath(“Dataset”)

from google.cloud import bigquery

import pandas as pd

bqclient = bigquery.Client()

table = bigquery.TableReference.from_string(

bq_table

)

rows = bqclient.list_rows(

table

)

dataframe = rows.to_dataframe(

create_bqstorage_client=True,

)

dataframe.to_csv(output_data_path)

The second component of the pipeline will fit the data from the table churner_2 from big query and pass the csv file as the output for the next component. The first component and second component do not need inputs from any components.

#Third component in the pipeline to to combine data from 2 sources and for some data transformation

@component(

packages_to_install=[“sklearn”, “pandas”, “joblib”],

base_image=”python:3.9″,

output_component_file=”model_training.yaml”,

)

def data_transformation(

dataset1: Input[Dataset],

dataset2: Input[Dataset],

output_data_path: OutputPath(“Dataset”),

from sklearn.metrics import roc_curve

from sklearn.model_selection import train_test_split

from joblib import dump

from sklearn.metrics import confusion_matrix

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

import pandas as pd

data1 = pd.read_csv(dataset1.path)

data2 = pd.read_csv(dataset2.path)

data=pd.merge(data1, data2, on=’CLIENTNUM’, how=’outer’)

data.drop([“CLIENTNUM”],axis=1,inplace=True)

data = data.dropna()

cols_categorical = [‘Gender’,’Dependent_count’, ‘Education_Level’, ‘Marital_Status’,’Income_Category’,’Card_Category’]

data[‘Attrition_Flag’] = [1 if cust == “Existing Customer” else 0 for cust in data[‘Attrition_Flag’]]

data_encoded = pd.get_dummies(data, columns = cols_categorical)

data_encoded.to_csv(output_data_path)

The third component is where we have combined the data from the first and second component and did the data transformation such as dropping the “CLIENTNUM” column, dropping the null values and converting the categorical columns into numerical. we will pass this transformed data as csv to the next component.

#Fourth component in the pipeline to train the classification model using decision Trees or Randomforest

@component(

packages_to_install=[“sklearn”, “pandas”, “joblib”],

base_image=”python:3.9″,

output_component_file=”model_training.yaml”,

)

def training_classmod(

data1: Input[Dataset],

metrics: Output[Metrics],

model: Output[Model]

from sklearn.metrics import roc_curve

from sklearn.model_selection import train_test_split

from joblib import dump

from sklearn.metrics import confusion_matrix

from sklearn.ensemble import RandomForestClassifier

import pandas as pd

data_encoded=pd.read_csv(data1.path)

X = data_encoded.drop(columns=[‘Attrition_Flag’])

y = data_encoded[‘Attrition_Flag’]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100,stratify=y)

model_classifier = RandomForestClassifier()

model_classifier.fit(X_train,y_train)

y_pred=model_classifier.predict(X_test)

score = model_classifier.score(X_test,y_test)

print(‘accuracy is:’,score)

metrics.log_metric(“accuracy”,(score * 100.0))

metrics.log_metric(“model”, “RandomForest”)

dump(model_classifier, model.path + “.joblib”)

In the fourth component we will train the model with the Random Classifier and we have used the “accuracy” as the evaluation metric.

@component(

packages_to_install=[“google-cloud-aiplatform”],

base_image=”python:3.9″,

output_component_file=”model_deployment.yaml”,

)

def model_deployment(

model: Input[Model],

project: str,

region: str,

vertex_endpoint: Output[Artifact],

vertex_model: Output[Model]

from google.cloud import aiplatform

aiplatform.init(project=project, location=region)

deployed_model = aiplatform.Model.upload(

display_name=”custom-model-pipeline”,

artifact_uri = model.uri.replace(“model”, “”),

serving_container_image_uri=”us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest”

)

endpoint = deployed_model.deploy(machine_type=”n1-standard-4″)

# Save data to the output params

vertex_endpoint.uri = endpoint.resource_name

vertex_model.uri = deployed_model.resource_name

Fifth component is the last component, in this we will create the endpoint on the Vertex AI and deploy the model. We have used Docker as base IMAGE and have deployed the model on “n1-standard-4” machine.

@pipeline(

# Default pipeline root. You can override it when submitting the pipeline.

pipeline_root=PIPELINE_ROOT,

# A name for the pipeline.

name=”custom-pipeline”,

)

def pipeline(

bq_table_1: str = “”,

bq_table_2: str = “”,

output_data_path: str = “data.csv”,

project: str = PROJECT_ID,

region: str = REGION

dataset_task_1 = get_data_1(bq_table_1)

dataset_task_2 = get_data_2(bq_table_2)

data_transform=data_transformation(dataset_task_1.output,dataset_task_2.output)

model_task = training_classmod(data_transform.output)

deploy_task = model_deployment(model=model_task.outputs[“model”],project=project,region=region)

In the last we have pipeline function which will call all the components in the sequential manner: dataset_tast_1 and dataset_tast_2 will get the data from the big query, data_transform will transform the data, model_task will train the Random Classifier model and deploy_task will deploy the model on Vertex AI.

compiler.Compiler().compile(pipeline_func=pipeline, package_path=”custom-pipeline-classifier.json”)

Compiling the pipeline.

run1 = aiplatform.PipelineJob(

display_name=”custom-training-vertex-ai-pipeline”,

template_path=”custom-pipeline-classifier.json”,

job_id=”custom-pipeline-rf8″,

parameter_values={“bq_table_1”: “credit-card-churn.credit_card_churn.churner_p1″,”bq_table_2”: “credit-card-churn.credit_card_churn.churner_p2”},

enable_caching=False,)

Creating the pipeline job.

run1.submit()

Running the pipeline job.

With this we have completed creating the Kubeflow pipeline and we can see it on the Pipelines section of Vertex AI.

Our Pipeline has run successfully and we have managed to get 100% accuracy for the classification.

We can use this model to get the online prediction using Rest API or Python. We can also create the different pipelines and compare their metrics on Vertex AI.

With this we have completed the project and learned how to create the Pipeline on Vertex AI for custom train models.

I hope you will find it useful.

To learn more about our AI & ML Solutions and Capabilities

See you again.

The post Kubeflow Pipeline on Vertex AI for Custom ML Models appeared first on Indium.

Machine Learning using Google’s Vertex AI

Ganesh Ghadge — Thu, 02 Feb 2023 10:38:31 +0000

Image by Google

What is Vertex AI?

“Vertex AI is Google’s platform which provides many Machine learning services such as training models using AutoML or Custom Training.”

Image by Google

Features of Vertex AI

We use Vertex AI to perform the following tasks in the ML workflow

Creation of dataset and Uploading data
Training ML model
Evaluate model accuracy
Hyperparameters tuning (custom training only)
Storing model in Vertex AI.
Deploying trained model to endpoint for predictions.
Send prediction requests to endpoint.
Managing models and endpoints.

To know the workflow of Vertex AI we will train a Classification model “Dogs vs Cat” using Vertex AI’s AutoML feature.

Step 1: Creating Dataset

We will download the dataset from Kaggle. In the downloaded zip file there are two zip files train.zip and test.zip. Train.zip contains the labelled images for training.

There are about 25,000 images in the train.zip file and 12,500 in the test.zip file. For this project we will only use 200 cat and 200 dog images to train. We will use the test set to evaluate the performance of our model.

After extracting the data, I uploaded the images to the google cloud storage bucket called dogs_cats_bucket1 which I have created at us-central1 region. Images are stored in two folders train and test in the bucket.

Best Read: Top 10 AI Challenges

Now we need to create a csv file with the images address and label for that I have written the following lines of code.

from google.cloud import storage

import pandas as pd

import os

#Authentication using service account.

os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] =”/content/dogs-vs-cats-354105-19b7b157b2b8.json”

BUCKET=’dogs_cats_bucket1′

DELIMITER=’/’

TRAIN_PREFIX=’train/’

TRAIN_BASE_PATH = f’gs://{BUCKET}/{TRAIN_PREFIX}’

print(“Starting the import file generation process”)

print(“Process Details”)

print(f”BUCKET : {BUCKET}”)

storage_client = storage.Client()

data = []

print(“Fetchig list of Train objects”)

train_blobs = storage_client.list_blobs(BUCKET, prefix=TRAIN_PREFIX, delimiter=DELIMITER)

for blob in train_blobs:

label = “cat” if “cat” in blob.name else “dog”

full_path = f”gs://{BUCKET}/{blob.name}”

data.append({

‘GCS_FILE_PATH’: full_path,

‘LABEL’: label

})

df = pd.DataFrame(data)

df.to_csv(‘train.csv’, index=False, header=False)

After running the script on Jupyter Notebook, we have the required csv file, we will upload the file to the same storage bucket as well.

Now in the Vertex AI section go to Datasets and enable the Vertex AI API.

Click Create Dataset and name it. I have named it cat_dog_classification. We will select Image Classification (Single-label). Make sure the region is us-central1. Hit Create.

In the next section mark Select import files from Cloud Storage and select the train.csv from Browse. Hit Continue

Vertex AI tool 16 minutes to import data. Now we can see the data the Browse and Analyse tab.

Now we can train the model.

Step 2: Model Training

Go to Vertex AI, then to Training section and click Create. Make sure the region is us-central1.

In the Dataset select cat_dog_classification and keep default for everything else with Model Training Method as AutoML.

Click continue for the Model Details and Explainability with the default settings.

For Compute and Pricing give 8 maximum node hours.

Hit Start Training.

The model training is completed after 29 mins.

Step 3: Model Evaluation

By clicking on trained model, it will take us to the model stats page. Where we have stats like Precision-recall curve, Precision-recall by threshold and Confusion matrix.

With the above stats the model looks good.

Step 4: Model Deployment

Go to Vertex AI, then to the Endpoints section and click Create Endpoint. Make sure the region is us-central1.

Give dogs_cats as the name of Endpoint and click Continue.

In the Model Settings select cat_dog_classification as Model Name, Version 1 as Version and 2 as number of compute nodes.

Click Done and Create.

It takes about 10 minutes to deploy the model.

With this our model is deployed.

Step 5: Testing Model

Once the model is deployed, we can test the model by uploading the test image or creating Batch Prediction.

To Test the Model, we go to the Deploy and Test section on the Model page.

Click on the Upload Image to upload the test, Image.

With this we can see our model is working good on test images.

We can also connect to the Endpoint using Python and get the results.

For more details on our AI and ML services

Visit this link

This is the end of my blog. We have learned how to train an image classification model on Google’s Vertex AI using Auto ML feature. I have enjoyed every minute while working on it.

For the next article we will see how to train custom model on Vertex AI with TensorFlow.

Stay Tuned.

The post Machine Learning using Google’s Vertex AI appeared first on Indium.

10 Promising Enterprise AI Trends 2022

Indium — Fri, 10 Jun 2022 12:20:16 +0000

A Deloitte survey shows rapid adoption of artificial intelligence solutions. Though only a few organizations are completely fueled by AI, businesses are accelerating capability building for their AI applications to scale to the enterprise level. More importantly, the willingness to experiment with AI has laid a foundation in the enterprise, which is expected to bear fruit in the coming few years.

Some of the reasons for this fast adoption of AI include:

● Using data for human-centric, systematic deployment and scaling of AI for executing core business processes

● Improving data-backed decision-making

● Enhancing customer delight and workforce experiences

● Strengthening competitiveness

To know more about Indium’s AI expertise, visit

Get in touch

Top 10 AI Trends for 2022

For businesses already on the path or planning to invest in AI, it is important to understand the trends and invest accordingly to optimize resources and performance.

1. Automated Machine Learning (AutoML): In an age of automation, iterative tasks associated with applying machine learning are also automated. It covers the entire pipeline from raw dataset to developing the ML model to be deployed. Some of the emerging trends in this area include better tools for data labelling and automatic tuning of neural net architectures. This is expected to lower the cost of AI, hastening its wider adoption. The next step would be XOps, as Gartner calls it, with improvement in processes such as PlatformOps, MLOps and DataOps for operationalizing the models.

2. Conceptual Design with AI: DALL·E and CLIP (Contrastive Language-Image Pre-training) from OpenAI enable generating new visual designs from a text description by combining it with images. This can help create innovative designs that can be implemented on production scale and disrupt industries that rely on creativity.

3. Multi-Modal Learning: As AI matures, ML models are able to support multiple modalities including IoT sensor data, text, speech, and vision. This is being leveraged to efficiently perform common tasks such as comprehending documents. This has wide use, including in the medical field where medical diagnosis can benefit from multi-modal techniques such as optical character recognition and machine vision.

4. Tiny ML: As AI and ML find their way into devices of all sizes, Tiny ML is becoming popular in hardware-constrained devices such as microcontrollers that power cars, utility meters, and refrigerators. Localized analysis can be made possible for sound, gestures, environmental conditions, and vital signs. Solutions for security and management of Tiny ML need to be developed for greater effectiveness and governance.

5. Multi-Objective Models: Currently, AI models tend to be developed for a single objective at a time. Going forward, multi-task models that execute multiple objectives will become possible. This will improve the outcomes of the AI models because of a holistic approach to the tasks.

6. Improved Employee Experience: AI will increasingly be used to supplement human efforts, reducing the burden on employees by taking over repetitive jobs. This will free up resources to provide value and lower personnel cost, enabling businesses to remain lean and effective.

7. Democratized AI: AI tools these days do not require technical skills to be used. As a result, it empowers even the non-technical staff to use them and build AI models. Subject matter experts will be able to participate more proactively in the AI development process, which will further accelerate time-to-market.

8. Responsible AI: AI development is guided by regulations such as GDPR and CCPA for greater AI transparency due to the use of personal data for making substantive decisions. Responsible AI even to develop AI algorithms will become important

9. Quantum ML: Powerful AI and machine learning models are become possible due to the use of quantum computing. Cloud providers such as Amazon, Microsoft, and IBM are providing quantum computing resources and simulators that can help businesses solve problems for which solutions have not been found yet.

10. Digital Twins Mature: Digital twins, which are virtual models simulating reality, are becoming popular for modeling and simulating human behaviors. They can help forecast the future and identify alternative solutions. Converging digital twins with traditional industrial simulations and AI-based agent-based simulation will also find use in applications such as ESG modeling, drug design, and smart cities.

Indium–Your AI Adoption Partner

Realizing the value of AI in improving outcomes, businesses today are investing in proof-of-concept even without a clear business value. AI is being used across functions and is improving smart decision making but requires collaboration between the technology team and subject matter and business experts to achieve the goals.

An AI, data, and software specialist like Indium Software can help businesses initiate AI projects, design them for scale, and achieve results efficiently. Our team of technology and domain experts can help businesses identify AI opportunities and accelerate AI innovation. We can create intelligent systems that mimic human behavior in completing tasks.

Must Read: CEOs and CXOs, how are you handling the shift to AI technologies?

We create self-learning algorithms that increase accuracy over time. The machine learning systems constantly evolve by examining data and learning new things from it without human intervention.

Our capabilities include:

● Predictive Models using cutting-edge methodologies to design processes for complete automation using AI techniques

● Natural Language Processing to analyze existing text data and capture trends, threats, and opportunities

● Predictive & Perspective Analysis

● Cognitive Analytics using computer vision techniques to read not just text, but also images and videos

● Text Analytics to gain insights from big data through our automatic AI powered text analytics software

● Internet of Things (IoT) to reduce accidents and minimize equipment downtime by identifying trends and anomalies in sensor data.

The post 10 Promising Enterprise AI Trends 2022 appeared first on Indium.

Artificial Intelligence And Its Impact On Mobile Applications

Vaibhavi Tamizhkumaran — Mon, 03 May 2021 02:11:05 +0000

Mobile apps and user experience have evolved dramatically over the last decade.

At the beginning, we had simple apps that did very little. However, everything has changed in the last decade as a result of the Smartphone revolution.

These mobile apps influence everything from your daily chores to your social interactions to your business strategy.

When we think of artificial intelligence (AI), the first names that pop up are probably Siri, Bixby, Cortana or Alexa.

According to the most recent McKinsey Global Institute reports, Google and Apple have invested billions of dollars in artificial intelligence. According to the report, AI advancements brought in $139 billion in investment in 2020, which was more than three times the money invested in AI three years prior.

The concept of a ‘smart assistant’ which can solve everyday tasks has captivated millions of users across all business sectors, not to mention education, healthcare, and finance. However, AI is not limited to smart assistance; it is progressing at a rapid pace. Many mobile apps are now utilising AI to improve user satisfaction.

Next Gen Application Development at your fingertips!

AI is continuing to improve mobile apps by acting as a catalyst. It enables the evolution of mobile apps by transforming them into intelligent pieces of software capable of predicting user behaviour and making decisions. AI algorithms also enable mobile apps to learn from user-generated data.

It is important to note that AI in this context does not refer to pure self-aware intelligence machines. Rather, it is a catch-all term for a variety of applications used by website and mobile app developers.

Contributions Of AI to Mobile Application Development

Facial recognition: Because of the ‘easy to use’ nature and added layer of security, face recognition lock has become one of the most popular features on Android smartphones. These systems use AI and ML-based algorithms to recognise a person’s face to unlock the phone and the various apps that are installed on it.

Smartphone manufacturers are expected to implement even more advanced AI and ML in the coming years to identify a person as their facial features change, such as growing a beard or wearing glasses.

Search Engines on mobile phones: The use of voice search and voice commands is perhaps one of the most common and popular advancement in artificial intelligence and machine learning. Customers used to type their queries into search bars. It is now as easy as asking your virtual assistant to look for something for you.

Instead of signing into your computer or unlocking your phone, something as easy and simple as “hey google what’s the best restaurant near me?” provides users with the quick answer they seek while also directing them to your business. Voice command allows you to respond to text messages without having to type.

Smart Camera apps: The smartphone camera is one of the most important areas in which custom android app development personnel and android mobile manufacturers are making significant advances in AI and ML. These advanced cameras can detect the subject within the frame, such as faces, fireworks, or food, and adjust the settings to produce the best possible image.

Artificial intelligence and machine learning can now automatically identify and enhance facial features for outstanding portrait images. More advanced features can even count the calories you eat from a simple photo of your food or provide information to businesses about how and where their products are being used when photos are shared on social media platforms.

Emotion recognition: Emotion recognition is a rising star in AI development. We have now advanced to the point where we can incorporate ML & AI into apps and capture micro& macro expressions. Through image and voice data processing, software can now read human emotions through the capturing of subtle variations, body language cues and vocal inflection. Companies can use these analytics to enhance consumer experiences by identifying the need for a product or service or to get new ideas for new product.

Real-time translation: There is a vast array of translation apps available. However, the majority of these apps are inoperable without access to the internet. AI could allow smartphones to translate & transliterate different languages in real-time without requiring an internet connection.

AI can provide a language instruction tool that allows sentences and phrases to be translated almost instantly without a time lag, similar to how interpreters work. The translation tool can be adjusted for latency using AI. This means that a user can specify the amount of time between a spoken word and its translation. This would be incredibly beneficial for languages that require a longer time lag for accurate translation.

Advantages of implementing AI

AI assists you in completing monotonous tasks quickly
Accuracy and completeness
Enhanced customer experiences
Intelligent interactions with users
User retainment

Personalized user experiences

The advancement of AI technology has enabled mobile users to completely redesign the value benchmark of existing user experience. Users are starting to demand more detailed and personalised mobile app performance.

Retail brands such as Tommy Hilfiger, Starbucks, Nike, etc can deliver personalised experiences that include recommendations unique to each user by collecting and analysing customer data based on purchases and locations.

In reference to Tommy Hilfiger’s chatbot, users can use the chatbot to browse their most recent collections or get a behind-the-scenes look at the most recent fashion show. The chatbot also employs natural language processing to provide style advice and product recommendations in addition to responding to customer inquiries. The bot gathers information about the user’s style preferences by asking a series of questions and then suggests an outfit based on the information gathered.

Smartphones have GPS tracking capabilities, as well as microphone and camera features, making them an ideal platform for AI applications. Furthermore, Apple revealed that the iPhone XR, 11, and 12 will include an A12 Bionic chip with a neural engine designed to use AI hardware in previously unimaginable ways.

When AI technology is combined with these built-in features, apps become more relevant and personalised. Using artificial intelligence to contextualise user behaviour will make each app session better than the previous one.

Leverge your Biggest Asset Data

Inquire Now

Wind-Up

AI opens up a plethora of opportunities for innovation in the mobile app industry. AI is the wave of the future in mobile app development. Users’ interactions with app services and products are changing as a result of artificial intelligence. Users of mobile apps will also be linked to an ecosystem of intelligent applications that will collaborate to provide a personalised user experience.

The greater role of AI in mobile apps has demonstrated its value in terms of business growth and user engagement. Here are some examples of how AI can help you understand your users:

AI can collect and store user data by analysing user behaviour and interactions with the app.
AI collects essential data such as location, contacts, and daily actions to better serve users.
AI products level up the user experience.

A smart ecosystem will collect a large amount of social data and behavioural interest, which can be used to boost revenue and improve user experience. It is not an exaggeration to say that AI is reshaping the smartphone industry. As a result, it is critical to include AI in your business and mobile applications.

The post Artificial Intelligence And Its Impact On Mobile Applications appeared first on Indium.