You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Be comfortable asking questions about how to access your data
Give a broad survey of what is possible and enough concepts so you can dig into whatever might be applicable to your environment
Think of software development in three stages: get it working, get it working right, get it working quickly
Here, we focus on making it possible to get it working by providing samples of different methods of data access
Learn concepts that enable asking more informed questions when you undertake a data project
Awareness that A beginning is a very delicate time - Dune, Frank Herbert
Two phases of a software process to consider: development and operations
How you think about accessing your data influences the rest of your project
Be comfortable as you develop, but don't forget you might need to live with what you wrote for a while
What is a Datastore?
Where data is stored, organized, and persisted... what does this mean?
Method for abstracting storage details from your application
lots of different concerns/use cases result in lots of different ways to read and write data
Does this matter? yes and no... if you're reading data, there are some things you might care about:
Where it comes from
What its shape is
Performance/storage
Datastore Examples
Files (stored in formats - e.g. CSV, XML, JSON, binary, etc. - see below)
SQL databases
No-SQL databases
API
What is an API?
API == Application Programming Interface
A way for a program (application) to talk to another program programmatically
Types of APIs
Memory based (e.g programs importing logic/capabilities from other programs)
importing libraries
talking to the operating system
Network/protocols (difference between protocol and API... we'll pretend there are none for the sake of this
discussion, but I'm happy to discuss more offline)
standard protocols (e.g. HTTP, ODBC, FTP, etc.)
Custom formats accessed via networking protocols
typically data you might get over the Internet for a certain dataset or website, tailored for accessing that content
this brings us to data formats (which applies to files as well)
Web pages (plug for web scraping)
Data Formats
Assumes you have access to your pile of data
Formats are how you turn that pile of bytes into something intelligible