This helps protect our community. Learn more

Doing More with Data: An Introduction to Arrow for R Users

1.35K subscribers

4.5K views 2 years ago

Speaker: Danielle Navarro, Developer Advocate at Voltron Data As datasets become larger and more complex, the boundaries between data engineering and data science are becoming blurred. Data analysis pipelines with larger-than-memory data are becoming commonplace, creating a gap that needs to be bridged: between engineering tools designed to work with very large datasets on the one hand, and data science tools that provide the analysis capabilities used in data workflows on the other. One way to build this bridge is with Apache Arrow, a multi-language toolbox for working with larger-than-memory tabular data. Arrow is designed to improve performance and efficiency, and places emphasis on standardization and interoperability among workflow components, programming languages, and systems. This talk gives an introduction to the Arrow package in R, a mature interface to Apache Arrow, that provides an appealing solution for data scientists working

…

...more

These chapters are auto-generated

Intro

0:00

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

0:37

Doing More with Data: An Introduction to Arrow for R Users

150Likes

4,504Views

2022Jun 23

Transcript

Follow along using the transcript.

Voltron Data

1.35K subscribers

Doing More with Data: An Introduction to Arrow for R Users

Chapters View all

Intro

Intro

Intro

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

Accelerating data interchange

Accelerating data interchange

Accelerating data interchange

Efficient in-memory processing

Efficient in-memory processing

Efficient in-memory processing

Where does R fit (in Arrow)?

Where does R fit (in Arrow)?

Where does R fit (in Arrow)?

Where does Arrow fit (in R)?

Where does Arrow fit (in R)?

Where does Arrow fit (in R)?

dplyr connects to an Arrow backend

dplyr connects to an Arrow backend

dplyr connects to an Arrow backend

Get data

Get data

Get data

Voltron Data

Doing More with Data: An Introduction to Arrow for R Users

Comments 9

Chapters

Intro

Intro

Intro

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

What is Apache Arrow? A multi-language toolbox for accelerated data interchange and in-memory processing

Accelerating data interchange

Accelerating data interchange

Accelerating data interchange

Efficient in-memory processing

Efficient in-memory processing

Efficient in-memory processing

Where does R fit (in Arrow)?

Where does R fit (in Arrow)?

Where does R fit (in Arrow)?

Where does Arrow fit (in R)?

Where does Arrow fit (in R)?

Where does Arrow fit (in R)?

dplyr connects to an Arrow backend

dplyr connects to an Arrow backend

dplyr connects to an Arrow backend

Get data

Get data

Get data

The NYC taxi data . The data set

The NYC taxi data . The data set

The NYC taxi data . The data set

Downloading the data

Downloading the data

Downloading the data

Opening a dataset

Opening a dataset

Opening a dataset

Using dplyr verbs: Select and filter

Using dplyr verbs: Select and filter

Using dplyr verbs: Select and filter

Find airports

Find airports

Find airports

A secondary table

A secondary table

A secondary table

Extract airport zones

Extract airport zones

Extract airport zones

Wrangle data

Wrangle data

Wrangle data

Let's find the airport pickups...

Let's find the airport pickups...

Chapters

Chapters