# Types¶

The set of all types comprises dtypes and arrays.

The rest of this document assumes that the `ndtypes`

module has been
imported:

```
from ndtypes import ndt
```

## Dtypes¶

An important notion in datashape is the `dtype`

, which roughly translates to
the element type of an array. In datashape, the `dtype`

can be of arbitrary
complexity and can contain e.g. tuples, records and functions.

### Scalars¶

Scalars are the primitive C/C++ types. Most scalars are fixed-size and platform independent.

#### Fixed size¶

Datashape offers a number of fixed-size scalars. Here’s how to construct a simple
`int64_t`

type:

```
>>> ndt('int64')
ndt("int64")
```

All fixed-size scalars:

void boolean signed int unsigned int float [2] complex `void`

`bool`

[1]`int8`

`uint8`

`float16`

`complex32`

`int16`

`uint16`

`float32`

`complex64`

[3]`int32`

`uint32`

`float64`

`complex128`

[4]`int64`

`uint64`

`bfloat16`

`bcomplex32`

[1] implemented as `char`

[2] IEEE 754-2008 binary floating point types

[3] implemented as `complex<float32>`

[4] implemented as `complex<float64>`

#### Aliases¶

Datashape has a number of aliases for scalars, which are internally mapped
to their corresponding platform specific fixed-size types. This is how to
construct an `intptr_t`

:

```
>>> ndt('intptr')
ndt("int64")
```

Machine dependent aliases:

`intptr`

`intptr_t`

`uintptr`

`uintptr_t`

### Chars, strings, bytes¶

#### Encodings¶

Datashape defines the following encodings for strings and characters. Each encoding has several aliases:

canonical form aliases ‘ascii’ ‘A’ ‘us-ascii’ ‘utf8’ ‘U8’ ‘utf-8’ ‘utf16’ ‘U16’ ‘utf-16’ ‘utf32’ ‘U32’ ‘utf-32’ ‘ucs2’ ‘ucs_2’ ‘ucs2’

As seen in the table, encodings must be given in string form:

```
>>> ndt("char('utf16')")
ndt("char('utf16')")
```

#### Chars¶

The `char`

constructor accepts `'ascii'`

, `'ucs2'`

and `'utf32'`

encoding
arguments. `char`

without arguments is equivalent to `char(utf32)`

.

```
>>> ndt("char('ascii')")
ndt("char('ascii')")
>>> ndt("char('utf32')")
ndt("char('utf32')")
>>> ndt("char")
ndt("char('utf32')")
```

#### UTF-8 strings¶

The `string`

type is a variable length NUL-terminated UTF-8 string:

```
>>> ndt("string")
ndt("string")
```

#### Fixed size strings¶

The `fixed_string`

type takes a length and an optional encoding argument:

```
>>> ndt("fixed_string(1729)")
ndt("fixed_string(1729)")
>>> ndt("fixed_string(1729, 'utf16')")
ndt("fixed_string(1729, 'utf16')")
```

#### Bytes¶

The bytes type is variable length and takes an optional alignment argument.
Valid values are powers of two in the range `[1, 16]`

.

```
>>> ndt("bytes")
ndt("bytes")
>>> ndt("bytes(align=2)")
ndt("bytes(align=2)")
```

#### Fixed size bytes¶

The `fixed_bytes`

type takes a length and an optional alignment argument.
The latter is a keyword-only argument in order to prevent accidental swapping of
the two integer arguments:

```
>>> ndt("fixed_bytes(size=32)")
ndt("fixed_bytes(size=32)")
>>> ndt("fixed_bytes(size=128, align=8)")
ndt("fixed_bytes(size=128, align=8)")
```

### References¶

Datashape references are fully general and can point to types of arbitrary complexity:

```
>>> ndt("ref(int64)")
ndt("ref(int64)")
>>> ndt("ref(10 * {a: int64, b: 10 * float64})")
ndt("ref(10 * {a : int64, b : 10 * float64})")
```

### Categorical type¶

The categorical type allows to specify subsets of types. This is implemented
as a set of typed values. Types are inferred and interpreted as int64, float64
or strings. The *NA* keyword creates a category for missing values.

```
>>> ndt("categorical(1, 10)")
ndt("categorical(1, 10)")
>>> ndt("categorical(1.2, 100.0)")
ndt("categorical(1.2, 100)")
>>> ndt("categorical('January', 'August')")
ndt("categorical('January', 'August')")
>>> ndt("categorical('January', 'August', NA)")
ndt("categorical('January', 'August', NA)")
```

### Option type¶

The option type provides safe handling of values that may or may not be present. The concept is well-known from languages like ML or SQL.

```
>>> ndt("?complex64")
ndt("?complex64")
```

### Dtype variables¶

Dtype variables are used in quantifier free type schemes and pattern matching. The range of a variable extends over the entire type term.

```
>>> ndt("T")
ndt("T")
>>> ndt("10 * 16 * T")
ndt("10 * 16 * T")
```

### Symbolic constructors¶

Symbolic constructors stand for any constructor that takes the given datashape argument. Used in pattern matching.

```
>>> ndt("Coulomb(float64)")
ndt("Coulomb(float64)")
```

### Type kinds¶

Type kinds denote specific subsets of dtypes, types or dimension types. Type kinds are in the dtype section because of the way the grammar is organized. Currently available are:

type kind set specific subset `Any`

`datashape`

`datashape`

`Scalar`

`dtypes`

`scalars`

`Categorical`

`dtypes`

`categoricals`

`FixedString`

`dtypes`

`fixed_strings`

`FixedBytes`

`dtypes`

`fixed_bytes`

`Fixed`

`dimension kind instances`

`fixed dimensions`

Type kinds are used in pattern matching.

### Composite types¶

Datashape has container and function dtypes.

#### Tuples¶

As usual, the tuple type is the product type of a fixed number of types:

```
>>> ndt("(int64, float32, string)")
ndt("(int64, float32, string)")
```

Tuples can be nested:

```
>>> ndt("(bytes, (int8, fixed_string(10)))")
ndt("(bytes, (int8, fixed_string(10)))")
```

#### Records¶

Records are equivalent to tuples with named fields:

```
>>> ndt("{a: float32, b: float64}")
ndt("{a : float32, b : float64}")
```

#### Functions¶

In datashape, function types can have positional and keyword arguments. Internally, positional arguments are represented by a tuple and keyword arguments by a record. Both kinds of arguments can be variadic.

##### Positional-only¶

This is a function type with a single positional `int32`

argument, returning
an `int32`

:

```
>>> ndt("(int32) -> int32")
ndt("(int32) -> int32")
```

This is a function type with three positional arguments:

```
>>> ndt("(int32, complex128, string) -> float64")
ndt("(int32, complex128, string) -> float64")
```

##### Positional-variadic¶

This is a function type with a single required positional argument, followed by any number of additional positional arguments:

```
>>> ndt("(int32, ...) -> int32")
ndt("(int32, ...) -> int32")
```

## Arrays¶

In datashape dimension kinds [5] are part of array type declarations. Datashape supports the following dimension kinds:

### Fixed Dimension¶

A fixed dimension denotes an array type with a fixed number of elements of a specific type. The type can be written in two ways:

```
>>> ndt("fixed(shape=10) * uint64")
ndt("10 * uint64")
>>> ndt("10 * uint64")
ndt("10 * uint64")
```

Formally, `fixed(shape=10)`

is a dimension constructor, not a type constructor.
The `*`

is the array type constructor in infix notation, taking as arguments
a dimension and an element type.

The second form is equivalent to the first one. For users of other languages,
it may be helpful to view this type as `array[10] of uint64`

.

Multidimensional arrays are constructed in the same manner, the `*`

is
right associative:

```
>>> ndt("10 * 25 * float64")
ndt("10 * 25 * float64")
```

Again, it may help to view this type as `array[10] of (array[25] of float64)`

.

In this case, `float64`

is the dtype of the multidimensional
array.

Dtypes can be arbitrarily complex. Here is an array with a dtype of a record that contains another array:

```
>>> ndt("120 * {size: int32, items: 10 * int8}")
ndt("120 * {size : int32, items : 10 * int8}")
```

### Variable Dimension¶

The variable dimension kind describes an array type with a variable number of elements of a specific type:

```
>>> ndt("var * float32")
ndt("var * float32")
```

In this case, `var`

is the dimension constructor and the `*`

fulfils the
same role as above. Many managed languages have variable sized arrays, so this
type could be viewed as `array of float32`

. In a sense, fixed size arrays
are just a special case of variable sized arrays.

### Symbolic Dimension¶

Datashape supports symbolic dimensions, which are used in pattern matching. A symbolic dimension is an uppercase variable that stands for a fixed dimension.

In this manner entire sets of array types can be specified. The following type
describes the set of all `M * N`

matrices with a `float32`

dtype:

```
>>> ndt("M * N * float32")
ndt("M * N * float32")
```

The next type describes a function that performs matrix multiplication on any
permissible pair of input matrices with dtype `T`

:

```
>>> ndt("(M * N * T, N * P * T) -> M * P * T")
ndt("(M * N * T, N * P * T) -> M * P * T")
```

In this case, we have used both symbolic dimensions and the type variable `T`

.

Symbolic dimensions can be mixed fixed dimensions:

```
>>> ndt("10 * N * float64")
ndt("10 * N * float64")
```

### Ellipsis Dimension¶

The ellipsis, used in pattern matching, stands for any number of dimensions. Datashape supports both named and unnamed ellipses:

```
>>> ndt("... * float32")
ndt("... * float32")
```

Named form:

```
>>> ndt("Dim... * float32")
ndt("Dim... * float32")
```

Ellipsis dimensions play an important role in broadcasting, more on the topic in the section on pattern matching.

[5] | In the whole text dimension kind and dimension are synonymous. |