Getting Started with Numpy – Lesson 1

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Introduction

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

Creating, Getting Info, Selecting and Util Functions

The 2009 data set  ‘Wine Quality Dataset’ elaborated by Cortez et al. available at UCI Machine Learning , is a well-known dataset that contains wine quality information.It includes data about red and white wine physicochemical properties and a quality score. 

 Before we start, we are going to visualize the head a little example dataset 

t ozBeiHHe7CXrn7kqTQb7yhWmbBp3i3dPEEAx4uyG5DLf4TZWrK8ww83eOtvVjZffZkoRBFAHgNvsvRaB46G0vxTtZbe29TC 5gCKlMX 9Zk7w3Oc0nWOLbYi7HMYPGdHfRHsVg=s0


Conversion Post EN

Creating

In Numpy you can create arrays in different ways, we are going to see examples of the most common and those that can be most useful for data processing.

Unidimensional array from list:

Import numpy as np
list = [1, 2, 3]
uni_numpy_array = np.array(list)

array([1, 2, 3])

Multidimensional array from list:

list = [[1, 2, 3], [4, 5, 6]]
multi_numpy_array = np.array(list)

array([[1, 2, 3],
       [4, 5, 6]])

Multidimensional array all values are zeros:

zeros_array = np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Multidimensional array all values are random:

random_array = np.random.rand(3, 4)

array([[0.98195491, 0.34964712, 0.13426036, 0.55065786],
       [0.4180283 , 0.36018953, 0.44374156, 0.4366695 ],
       [0.69893273, 0.01089244, 0.4297768 , 0.6985924 ]])

Getting Info

There are several functions that can help us extract information from the data. We are going to explain one by one with examples of its operation and its usefulness.

Get array dimensions:

For this we are going to use the `shape()` function that returns the number of rows and the number of columns (rows, columns).

wines_df.shape

(1599, 12)

Get data type:

NumPy has several different data types, which mostly map to Python data types, like float, and str. You can find a full listing of most important NumPy data types here:

1. float – numeric floating point data.

2. int – integer data.

3. string – character data.

4. object – Python objects.

In this case we will use the `dtype` attribute that returns the data type of the array.

wines_df.dtype

dtype('float64')

Selecting

Use the syntax np.array[i,j] to retrieve an element at row index i and column index j from the array.

To retrieve multiple elements, use the syntax np.array[(row_values), (column_values)] where row_values and column_values are a tuple of the same size.

Now we are going to show different examples of how to select elements within an array:

Get first row:

first_row = wines_df[:1]

array([[ 7.4   ,  0.7   ,  0.    ,  1.9   ,  0.076 , 11.    , 34.    ,
         0.9978,  3.51  ,  0.56  ,  9.4   ,  5.    ]])

Select the second element from the third row:

second_third = wines_df[2, 1:2]

array([0.76])

Select the first three items from the fourth column:

first_three_items = wines_df[:3, 3]

array([1.9, 2.6, 2.3])

Select the entire fourth column:

fourth_column = wines_df[:, 3]

array([1.9, 2.6, 2.3, ..., 2.3, 2. , 3.6])

Util Functions

Numpy is a library that has an infinity of mathematical operation functions, so we are going to try to summarize in several examples the functions that as Data Scientist we are going to use with more probability.

Sum up the whole 11th column:

twelveth_column_sum = wines_df[:, 11].sum()

9012.0

Sum up all the columns:

all_columns_sum = wines_df.sum(axis=0)

array([13303.1    ,   843.985  ,   433.29   ,  4059.55   ,   139.859  ,
       25384.     , 74302.     ,  1593.79794,  5294.47   ,  1052.38   ,
       16666.35   ,  9012.     ])

Mean of the first row:

first_row_mean = wines_df[:1].mean()

6.211983333333333

Return a bool array where the position value of the 11th column is True if the value was minor than 5 and False in other case:

bool_array = wines_df[:,11] > 5

array([False, False, False, ...,  True, False,  True])

Get the traspose matrix of wines matrix:

traspose = np.transpose(wines_df)
traspose.shape

(12, 1599)

Get the flatten array of wines:

flatten = wines_df.ravel()
flatten.shape

(19188,)

Turn the 12th row of wines into a 2-dimensional array with 3 rows and 4 columns:

wines_df[1:2].reshape((3,4))

array([[ 7.8   ,  0.88  ,  0.    ,  2.6   ],
       [ 0.098 , 25.    , 67.    ,  0.9968],
       [ 3.2   ,  0.68  ,  9.8   ,  5.    ]])

Training your abilities

If you want to bring your skills further in Data Science, we have created a course that you can download for free here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe To Our Newsletter

Get updates from our latest tech findings

About Apiumhub

Apiumhub brings together a community of software developers & architects to help you transform your idea into a powerful and scalable product. Our Tech Hub specialises in Software ArchitectureWeb Development & Mobile App Development. Here we share with you industry tips & best practices, based on our experience.

Popular posts
Free PDF with Software Architecture Interviews

Have a challenging project?

We Can Work On It Together