In part 2, entire pipeline to ingest data into Snowflake was automated using Azure Logic App and SnowPipe. JSON Data was loaded in Snowflake landing table with a single 1 column called JSON_DATA.
Ideally, there should also be a datetime column that will contain the date and time of when data was loaded into landing table. However, due to a limitation in SnowPipe it will not allow any additional column when using JSON format. If you try, you will get following error.
In part 1, vehicle inventory data was downloaded locally before pushing into Snowflake stage. It included manual steps to run Python script to download JSON data, run commands in SnowSql to upload files and then COPY INTO command to load data into Snowflake table.
I was able to automate all of it with use of Azure Logic Apps and Blob Storage. The idea is to automatically run this entire process on a daily basis to get updated inventory from each car dealer in Snowflake.
Here is how automated workflow looks like:
Like everyone else, I found car buying process complex and time consuming. Mainly so because one has to visit many dealer’s website for inventory to look for the perfect car with all the desired options.
After looking at few of the dealers website, I realized that all of them looked alike. I wondered if writing a web scrapper in Python could help me query all the dealers’ inventory in a single go.
Based on the interactions with filters, it seems like websites were using REST APIs to load new data. …
I live in Surrey, Canada. I enjoy good Coffee and Outdoors.