Dtype object python что это

18.04.202218.04.2022 admin 0 Comments

Тип данных Object (dtype) в NumPy Python

Каждый ndarray имеет связанный объект типа данных (dtype). Этот объект типа данных (dtype) информирует нас о компоновке массива. Это означает, что дает нам информацию о:

Значения ndarray хранятся в буфере, который можно рассматривать как непрерывный блок байтов памяти. То, как эти байты будут интерпретироваться, задается объектом dtype.

Параметры:

# Программа Python для создания объекта типа данных

import numpy as np

# np.int16 преобразуется в объект типа данных.

Выход:

# Программа Python для создания объекта типа данных
# содержит 32-разрядное целое число с прямым порядком байтов

import numpy as np

# i4 представляет целое число размером 4 байта
#> представляет порядок байтов с прямым порядком байтов, а
# dt является объектом dtype

Выход:

Спецификатор типа (i4 в вышеприведенном случае) может принимать различные формы:

Замечания :

# Программа Python для дифференциации
# между типом и dtype.

import numpy as np

Выход:

# Python программа для демонстрации
# использование полей

import numpy as np

# Структурированный тип данных, содержащий 16-символьную строку (в поле «имя»)
# и подмассив из двух 64-битных чисел с плавающей точкой (в поле ‘grades’):

# Тип данных объекта с оценками поля

# Тип данных объекта с именем поля

Выход:

# Программа Python для демонстрации
# использование объекта типа данных со структурированным массивом.

import numpy as np

Выход:

Ссылки :

Пожалуйста, пишите комментарии, если вы обнаружите что-то неправильное, или вы хотите поделиться дополнительной информацией по обсуждаемой выше теме.

Источник

Data type objects ( dtype )¶

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the following aspects of the data:

Type of the data (integer, float, Python object, etc.)

Size of the data (how many bytes is in e.g. the integer)

Byte order of the data ( little-endian or big-endian )

which part of the memory block each field takes.

If the data type is a sub-array, what is its shape and data type.

To describe the type of scalar data, there are several built-in scalar types in NumPy for various precision of integers, floating-point numbers, etc. An item extracted from an array, e.g., by indexing, will be a Python object whose type is the scalar type associated with the data type of the array.

Note that the scalar types are not dtype objects, even though they can be used in place of one whenever a data type specification is needed in NumPy.

Finally, a data type can describe items that are themselves arrays of items of another data type. These sub-arrays must, however, be of a fixed size.

Sub-arrays always have a C-contiguous memory layout.

A simple data type containing a 32-bit big-endian integer: (see Specifying and constructing data types for details on construction)

A structured data type containing a 16-character string (in field ‘name’) and a sub-array of two 64-bit floating-point number (in field ‘grades’):

Items of an array of this data type are wrapped in an array scalar type that also has two fields:

Specifying and constructing data types¶

Whenever a data-type is required in a NumPy function or method, either a dtype object or something that can be converted to one can be supplied. Such conversions are done by the dtype constructor:

What can be converted to a data-type object is described below:

The 24 built-in array scalar type objects all convert to an associated data-type object. This is true for their sub-classes as well.

Note that not all data-type information can be supplied with a type-object: for example, flexible data-types have a default itemsize of 0, and require an explicitly given size to be useful.

The generic hierarchical type objects convert to corresponding type objects according to the associations:

Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:

All other types map to object_ for convenience. Code should expect that such types may map to a specific (new) dtype in the future.

Any type object with a dtype attribute: The attribute will be accessed and used directly. The attribute must return something that is convertible into a dtype object.

Several kinds of strings can be converted. Recognized strings can be prepended with ‘>’ ( big-endian ), ‘ ( little-endian ), or ‘=’ (hardware-native, the default), to specify the byte order.

Each built-in data-type has a character code (the updated Numeric typecodes), that uniquely identifies it.

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are

Источник

Data type objects ( dtype )В¶

Note that the scalar types are not dtype objects, even though they can be used in place of one whenever a data type specification is needed in NumPy.

Finally, a data type can describe items that are themselves arrays of items of another data type. These sub-arrays must, however, be of a fixed size.

Sub-arrays always have a C-contiguous memory layout.

A simple data type containing a 32-bit big-endian integer: (see Specifying and constructing data types for details on construction)

A structured data type containing a 16-character string (in field ‘name’) and a sub-array of two 64-bit floating-point number (in field ‘grades’):

Items of an array of this data type are wrapped in an array scalar type that also has two fields:

Specifying and constructing data typesВ¶

Whenever a data-type is required in a NumPy function or method, either a dtype object or something that can be converted to one can be supplied. Such conversions are done by the dtype constructor:

dtype

Create a data type object.

What can be converted to a data-type object is described below:

Источник

Трюки Pandas от RealPython

К старту флагманского курса по Data Science делимся сокращённым переводом из блога RealPython о трюках с Pandas, материал начинается с конфигурирования запуска библиотеки и заканчиваются примерами работы с операторами и их приоритетом. Затрагивается тема экономии памяти, сжатие фреймов, интроспекция GroupBy через итерацию и другие темы. Подробности, как всегда, под катом.

1. Параметры запуска интерпретатора

Запустив сеанс интерпретатора, вы увидите, что сценарий запуска выполнен и Pandas автоматически импортируется с вашим набором опций:

Воспользуемся данными abalone в репозитории машинного обучения UCI, чтобы продемонстрировать заданное в файле запуска форматирование. Сократим данные до 14 строк с точностью до 4 цифр для чисел с плавающей точкой:

Позже вы увидите этот набор данных и в других примерах.

2. Игрушечные cтруктуры данных с помощью модуля тестирования Pandas

В модуле Pandas testing скрыт ряд удобных функций для быстрого построения квазиреалистичных Series и фреймов данных:

Их около 30, полный список можно увидеть, вызвав dir() на объекте модуля. Вот несколько вариантов:

Они полезны для бенчмаркинга, тестирования утверждений и экспериментов с не очень хорошо знакомыми методами Pandas.

3. Используйте преимущества методов доступа

Возможно, вы слышали о термине акcессор, который чем-то напоминает геттер (хотя геттеры и сеттеры используются в Python нечасто). В нашей статье будем называть аксессором свойство, которое служит интерфейсом для дополнительных методов. В Series [на момент написания оригинальной статьи] их три, сегодня их 4:

Да, приведённое выше определение многозначно, поэтому до обсуждения внутреннего устройства посмотрим на примеры.

.cat — для категориальных данных;

.str — для строковых (объектных) данных;

.dt — для данных, подобных времени.

4. Создание индекса времени даты из столбцов компонентов

Наконец, вы можете отказаться от старых отдельных столбцов и преобразовать их в Series:

Интуитивно суть передачи фрейма данных в том, что DataFrame похож на словарь Python, где имена столбцов — это ключи, а отдельные столбцы (Series) — значения словаря. Поэтому pd.to_datetime (df[datecols].to_dict (orient=’list’)) здесь также будет работать.

5. Использование категориальных данных для экономии времени и места

А что если бы мы могли взять перечисленные выше уникальные цвета и отобразить каждый из них в занимающее меньше места целое число? Наивная реализация:

Другой способ сделать то же самое в Pandas — pd.factorize (colors) :

Так или иначе объект кодируется как перечислимый тип (категориальная переменная).

«Использование памяти Categorical пропорционально количеству категорий плюс длина данных. Напротив, object dtype — это константа, умноженная на длину данных» (Источник).

В colors выше есть соотношение двух значений на каждое уникальное значение, то есть на категорию:

Экономия памяти от преобразования в Categorical хороша, но невелика:

Но, если у вас будет, например, много демографических данных, где мало уникальных значений, объём требуемой памяти уменьшится в 10 раз:

Можно воспроизвести что-то похожее на пример выше, который делался вручную:

Всё, что вам нужно сделать, чтобы в точности повторить предыдущий ручной вывод, — это изменить порядок кодов:

Обратите внимание, что dtype — это int8 NumPy, 8-битное знаковое целое, которое может принимать значения от −127 до 128. Для представления значения в памяти требуется только один байт. 64-битные знаковые int были бы излишеством с точки зрения потребления памяти. Грубый пример привёл к данным int64 по умолчанию, тогда как Pandas достаточно умна, чтобы привести категориальные данные к минимально возможному числовому dtype.

6. Интроспекция объектов Groupby через итерацию

При вызове df.groupby (‘x’) результирующие объекты Pandas groupby могут быть немного непрозрачными. Этот объект инстанцируется лениво и сам по себе не имеет никакого осмысленного представления. Продемонстрируем это на наборе данных abalone из первого примера:

7. Используйте этот трюк с отображением для бининга

Представьте: есть Series и соответствующая «таблица сопоставления», где каждое значение принадлежит к многочленной группе или вообще не принадлежит ни одной группе:

Другими словами, вам нужно сопоставить countries со следующим результатом:

Код значительно быстрее, чем вложенный цикл Python по группам для каждой страны:

Задача — сопоставить каждую группу в groups целому числу. Однако Series.map() не распознаёт ‘ab’ — ему нужна разбитая версия, где каждый символ из каждой группы отображён на целое число. Это делается охватом словаря:

Этот словарь может передаваться в s.map() для сопоставления или «перевода» его значений в соответствующие индексы групп.

8. Загрузка данных из буфера обмена

Это позволяет копировать структурированный текст непосредственно в DataFrame или Series. В Excel данные будут выглядеть примерно так:

Его текстовое представление может выглядеть так:

Просто выделите и скопируйте текст выше и вызовите pd.read_clipboard() :

9. Запись объектов Pandas в сжатый формат

Этот короткий пример завершает список. Начиная с версии Pandas 0.21.0 вы можете записывать объекты Pandas непосредственно для сжатия gzip, bz2, zip или xz, а не хранить несжатый файл в памяти и преобразовывать его. Вот пример, использующий данные abalone из первого трюка:

Коэффициент разницы в размерах равен 11,6:

Data Science — это не только статистика, но и написание кода, который с учётом работы с большими данными должен быть эффективным. В этом одна из причин высокой зарплаты специалиста в науке о данных, стать которым мы можем помочь вам на нашем курсе. Также вы можете узнать, как начать карьеру аналитика или инженера данных, начать с нуля или прокачаться в других направлениях, например, в Fullstack-разработке на Python:

Data Science и Machine Learning

Источник

Practical Business Python

Taking care of business, one python script at a time

Overview of Pandas Data Types

Introduction

When doing data analysis, it is important to make sure you are using the correct data types; otherwise you may get unexpected results or errors. In the case of pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on theВ topic.

Despite how well pandas works, at some point in your data analysis processes, you will likely need to explicitly convert data from one type to another. This article will discuss the basic pandas data types (aka dtypes ), how they map to python and numpy data types and the options for converting from one pandas type toВ another.

Pandas DataВ Types

A data type is essentially an internal construct that a programming language uses to understand how to store and manipulate data. For instance, a program needs to understand that you can add two numbers together like 5 + 10 to get 15. Or, if you have two strings such as вЂњcatвЂќ and вЂњhatвЂќ you could concatenate (add) them together to getВ вЂњcathat.вЂќ

A possible confusing point about pandas data types is that there is some overlap between pandas, python and numpy. This table summarizes the keyВ points:

Pandas dtype mapping

Pandas dtype	Python type	NumPy type	Usage
object	str or mixed	string_, unicode_, mixed types	Text or mixed numeric and non-numeric values
int64	int	int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64	Integer numbers
float64	float	float_, float16, float32, float64	Floating point numbers
bool	bool	bool_	True/False values
datetime64	NA	datetime64[ns]	Date and time values
timedelta[ns]	NA	NA	Differences between two datetimes
category	NA	NA	Finite list of text values

For the most part, there is no need to worry about determining if you should try to explicitly force the pandas type to a corresponding to NumPy type. Most of the time, using pandas default int64 and float64 types will work. The only reason I included in this table is that sometimes you may see the numpy types pop up on-line or in your ownВ analysis.

For this article, I will focus on the follow pandasВ types:

The category and timedelta types are better served in an article of their own if there is interest. However, the basic approaches outlined in this article apply to these types asВ well.

Why do weВ care?

Data types are one of those things that you donвЂ™t tend to care about until you get an error or some unexpected results. It is also one of the first things you should check once you load a new data into pandas for furtherВ analysis.

I will use a very simple CSV file to illustrate a couple of common errors you might see in pandas if the data type is not correct. Additionally, an example notebook is up onВ github.

Customer Number	Customer Name	2016	2017	Percent Growth	Jan Units	Month	Day	Year	Active
0	10002.0	Quest Industries	$125,000.00	$162500.00	30.00%	500	1	10	2015	Y
1	552278.0	Smith Plumbing	$920,000.00	$101,2000.00	10.00%	700	6	15	2014	Y
2	23477.0	ACME Industrial	$50,000.00	$62500.00	25.00%	125	3	29	2016	Y
3	24900.0	Brekke LTD	$350,000.00	$490000.00	4.00%	75	10	27	2015	Y
4	651029.0	Harbor Co	$15,000.00	$12750.00	-15.00%	Closed	2	2	2014	N

Upon first glance, the data looks ok so we could try doing some operations to analyze the data. LetвЂ™s try adding together the 2016 and 2017В sales:

This does not look right. We would like to get totals added together but pandas is just concatenating the two values together to create one long string. A clue to the problem is the line that says dtype: object. An object is a string in pandas so it performs a string operation instead of a mathematicalВ one.

If we want to see what all the data types are in a dataframe, use df.dtypes

Additionally, the df.info() function shows even more usefulВ info.

After looking at the automatically assigned data types, there are severalВ concerns:

Until we clean up these data types, it is going to be very difficult to do much additional analysis on thisВ data.

In order to convert data types in pandas, there are three basicВ options:

Using the astype() function

In order to actually change the customer number in the original dataframe, make sure to assign it back since the astype() functions returns aВ copy.

And here is the new data frame with the Customer Number as anВ integer:

Customer Number	Customer Name	2016	2017	Percent Growth	Jan Units	Month	Day	Year	Active
0	10002	Quest Industries	$125,000.00	$162500.00	30.00%	500	1	10	2015	Y
1	552278	Smith Plumbing	$920,000.00	$101,2000.00	10.00%	700	6	15	2014	Y
2	23477	ACME Industrial	$50,000.00	$62500.00	25.00%	125	3	29	2016	Y
3	24900	Brekke LTD	$350,000.00	$490000.00	4.00%	75	10	27	2015	Y
4	651029	Harbor Co	$15,000.00	$12750.00	-15.00%	Closed	2	2	2014	N

This all looks good and seems pretty simple. LetвЂ™s try to do the same thing to our 2016 column and convert it to a floating pointВ number:

In a similar manner, we can try to conver the Jan Units column to anВ integer:

Both of these return ValueError exceptions which mean that the conversions did notВ work.

In each of the cases, the data included values that could not be interpreted as numbers. In the sales columns, the data includes a currency symbol as well as a comma in each value. In the Jan Units columnm the last value is вЂњClosedвЂќ which is not a number; so we get theВ exception.

So far itвЂ™s not looking so good for astype() as a tool. We should give it one more try on the Active column.

At first glance, this looks ok but upon closer inspection, there is a big problem. All values were interpreted as True but the last customer has an Active flag of N so this does not seemВ right.

The takeaway from this section is that astype() will only workВ if:

If the data has non-numeric characters or is not homogeneous, then astype() will not be a good choice for type conversion. You will need to do additional transforms for the type change to workВ correctly.

Custom ConversionВ Functions

Since this data is a little more complex to convert, we can build a custom function that we apply to each value and convert to the appropriate dataВ type.

For currency conversion (of this specific data set), here is a simple function we canВ use:

The code uses pythonвЂ™s string functions to strip out the вЂ$вЂќ and вЂ,вЂ™ and then convert the value to a floating point number. In this specific case, we could convert the values to integers as well but IвЂ™m choosing to use floating point in thisВ case.

I also suspect that someone will recommend that we use a Decimal type for currency. This is not a native data type in pandas so I am purposely sticking with the floatВ approach.

Also of note, is that the function converts the number to a python float but pandas internally converts it to a float64. As mentioned earlier, I recommend that you allow pandas to convert to specific size float or int as it determines appropriate. There is no need for you to try to downcast to a smaller or upcast to a larger byte size unless you really know why you need to doВ it.

Now, we can use the pandas apply function to apply this to all the values in the 2016В column.

Success! All the values are showing as float64 so we can do all the math functions we needВ to.

IвЂ™m sure that the more experienced readers are asking why I did not just use a lambda function? Before I answer, here is what we could do in 1 line with a lambda function:

Using lambda we can streamline the code into 1 line which is a perfectly valid approach. I have three main concerns with thisВ approach:

Some may also argue that other lambda-based approaches have performance improvements over the custom function. That may be true but for the purposes of teaching new users, I think the function approach isВ preferrable.

HereвЂ™s a full example of converting the data in both sales columns using the convert_currency function.

For another example of using lambda vs. a function, we can look at the process for fixing the Percent Growth column.

Doing the same thing with a customВ function:

Both produce the sameВ value:

The final custom function I will cover is using np.where() to convert the active column to a boolean. There are several possible ways to solve this specific problem. The np.where() approach is useful for many types of problems so IвЂ™m choosing to include itВ here.

The basic idea is to use the np.where() function to convert all вЂњYвЂќ values to True and everything else assigned False

Which results in the followingВ dataframe:

Customer Number	Customer Name	2016	2017	Percent Growth	Jan Units	Month	Day	Year	Active
0	10002.0	Quest Industries	$125,000.00	$162500.00	30.00%	500	1	10	2015	True
1	552278.0	Smith Plumbing	$920,000.00	$101,2000.00	10.00%	700	6	15	2014	True
2	23477.0	ACME Industrial	$50,000.00	$62500.00	25.00%	125	3	29	2016	True
3	24900.0	Brekke LTD	$350,000.00	$490000.00	4.00%	75	10	27	2015	True
4	651029.0	Harbor Co	$15,000.00	$12750.00	-15.00%	Closed	2	2	2014	False

Pandas helperВ functions

Pandas has a middle ground between the blunt astype() function and the more complex custom functions. These helper functions can be very useful for certain data typeВ conversions.

The reason the Jan Units conversion is problematic is the inclusion of a non-numeric value in the column. If we tried to use astype() we would get an error (as described earlier). The pd.to_numeric() function can handle these values moreВ gracefully:

In this case, the function combines the columns into a new series of the appropriate datateime64 dtype.

We need to make sure to assign these values back to theВ dataframe:

Customer Number	Customer Name	2016	2017	Percent Growth	Jan Units	Month	Day	Year	Active	Start_Date
0	10002	Quest Industries	125000.0	162500.0	0.30	500.0	1	10	2015	True	2015-01-10
1	552278	Smith Plumbing	920000.0	1012000.0	0.10	700.0	6	15	2014	True	2014-06-15
2	23477	ACME Industrial	50000.0	62500.0	0.25	125.0	3	29	2016	True	2016-03-29
3	24900	Brekke LTD	350000.0	490000.0	0.04	75.0	10	27	2015	True	2015-10-27
4	651029	Harbor Co	15000.0	12750.0	-0.15	NaN	2	2	2014	False	2014-02-02

Now the data is properly converted to all the types weВ need:

The dataframe is ready forВ analysis!

Bringing it allВ together

The basic concepts of using astype() and custom functions can be included very early in the data intake process. If you have a data file that you intend to process repeatedly and it always comes in the same format, you can define the dtype and converters to be applied when reading the data. It is helpful to think of dtype as performing astype() on the data. The converters arguments allow you to apply functions to the various input columns similar to the approaches outlinedВ above.

It is important to note that you can only apply a dtype or a converter function to a specified column once using this approach. If you try to apply both to the same column, then the dtype will beВ skipped.

Here is a streamlined example that does almost all of the conversion at the time the data is read into theВ dataframe:

Summary

One of the first steps when exploring a new data set is making sure the data types are set correctly. Pandas makes reasonable inferences most of the time but there are enough subtleties in data sets that it is important to know how to use the various data conversion options available in pandas. If you have any other tips you have used or if there is interest in exploring the category data type, feel free to commentВ below.

Источник

Для чего мы создаем…

Dtype object python что это

Тип данных Object (dtype) в NumPy Python

Data type objects ( dtype )¶

Specifying and constructing data types¶

Data type objects ( dtype )В¶

Specifying and constructing data typesВ¶

Трюки Pandas от RealPython

1. Параметры запуска интерпретатора

2. Игрушечные cтруктуры данных с помощью модуля тестирования Pandas

3. Используйте преимущества методов доступа

4. Создание индекса времени даты из столбцов компонентов

5. Использование категориальных данных для экономии времени и места

6. Интроспекция объектов Groupby через итерацию

7. Используйте этот трюк с отображением для бининга

8. Загрузка данных из буфера обмена

9. Запись объектов Pandas в сжатый формат

Practical Business Python

Overview of Pandas Data Types

Introduction

Pandas DataВ Types

Why do weВ care?

Using the astype() function

Custom ConversionВ Functions

Pandas helperВ functions

Bringing it allВ together

Summary

Добавить комментарий Отменить ответ

Тип данных Object (dtype) в NumPy Python

Data type objects ( dtype )¶

Specifying and constructing data types¶

Data type objects ( dtype )В¶

Specifying and constructing data typesВ¶

Трюки Pandas от RealPython

1. Параметры запуска интерпретатора

2. Игрушечные cтруктуры данных с помощью модуля тестирования Pandas

3. Используйте преимущества методов доступа

4. Создание индекса времени даты из столбцов компонентов

5. Использование категориальных данных для экономии времени и места

6. Интроспекция объектов Groupby через итерацию

7. Используйте этот трюк с отображением для бининга

8. Загрузка данных из буфера обмена

9. Запись объектов Pandas в сжатый формат

Practical Business Python

Overview of Pandas Data Types

Introduction

Pandas DataВ Types

Why do weВ care?

Using the astype() function

Custom ConversionВ Functions

Pandas helperВ functions

Bringing it allВ together

Summary

Вам также понравится

Для чего нужен учредитель в ооо

Для чего нужен dhcp

Для чего пьют нистатин с антибиотиками

Добавить комментарий Отменить ответ