Elevator Pitch
In Software Development it is common practice to treat data as a second class citizen. By bringing data to the forefront of our architecture and development it becomes easier to integrate ML into our products in ways that simplify the development process and higher Return on Investment.
Description
Abstract
In Software Development it is common practice to treat data as a second class citizen. It is becoming more necessary to integrate data heavy machine learning tools into products and platforms to gain a competitive edge in the market. This has traditionally been hard due to lack of experience on engineering teams and well and the high cost and low Return on Investment in hiring and building dedicated data teams. With advancements in and productization of Artificial Intelligence, it is easier for software engineers to leverage machine learning tools and models in product development. By bringing data to the forefront of our architecture and development it becomes easier to integrate ML into our products in ways that simplify the development process and higher Return on Investment.
Talk: Format
Types of Data
Data: Data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols given meaning by specific act(s) of interpretation.
Types: - Machine-generated data - User generated data - structured: data with a schema - unstructed data: raw loose data
Machine-generated data
What are types of machine generated data does your software produce? * Event Data * Error Data * Network Usage Data * App Actions (click-stream) data * Database change capture (anamoly detection)
User generated
What are types of user generated data does your software produce? * Account information * Personal information * Transaction (payment/currency) * Product Data * *Metadata
Ways to use your data
- Direct in Product
- Dashboard/Reports
Prose that Machine Learning from stored data is generally untapped for software teams
Machine Learning Applications
- the reason for data first architecture: the Easier the data is to consume by a data scientist the quicker you see ROI*
Monitoring usage to trigger an action (predictive autoscaling).
Forcasting outages/high network usage
Product enhancement (weave sentiment analysis)
Data Architecture as part of planning process
### Questions to ask when planning
- Who is creating, consuming, or communicating the data?
- Where is the final resting consumption of my data?
- How much does it cost to store the data?
- What are availability needs?
- What are availability needs?
- Does the product need to consume this data?
- Do other services need to consume this data?
- How often is the data consumed?
### Methods for adding Machine Learning Models - API calls - Static Model - Part of Event Stream
Where do we start?
- Pain points
- Central Features
- Product needs
Questions?
Resources
*wikipedia data-computing *Software Engineering Daily May 28, 2020 *machine-generated vs human-generated *cloudera autoscaling *Data Driven Applications *kubernetes autoscaling *autoscale wikipedia *predictive autoscaling aws
Notes
This talk was originally designed to be very interactive with heavy audience involvement.