20 Feb 2015
I’ve been using Python for quite a while now, but always on pet projects, small scripts, programming contests and I feel that my knowledge didn’t really improve much. I had a chance to work on a Python project and realized there are many basic things I don’t know.
Every time I dedicate some time to learn and write about some subject, I feel like my knowledge about it improves a lot. Thus, I decided to revisit some Python concepts and write a series of posts about things I didn’t know in Python 2.7. The first of our posts will be the Built-in types.
Boolean types can take
False as values. Falsy values include
False, 0, empty sequences (string, list, dictionary) and any class that implements
__len__() and returns 0.
Logical operators. many languages use
||. In Python it is
or, and it works the same way (short-circuiting).
Negation is also more verbose:
not instead of
!. Other logical operators test whether an element belongs to a sequence/container or if a string is a substring of another,
Note that it can be composed with the
not operator for better readability. Another operator is the
== operator for basic types, but for other types, it only returns True if they point to the same object:
This operator can also be combined with the
Comparison operators. It’s the same as in most languages. In Python, they work with custom classes if these classes implement the following methods:
Numeric types can be one of the following:
int (plain integer),
float (floating point numbers),
long (long integers) and
int’s are like C++’s long, 32 bits of precision.
float’s are equivalent to C++’s double, usually 64 bits, 53 bits for mantisse, 10 bits for exponents and 1 bit for sinal.
long’s have unlimited precision.
complex is a pair of floats (named
Math operations. Most operators are the same as other languages. The different ones include
//, which performs floored quotient, for comparison
Note that if both values are integers, the division is integer.
divmod() is also interesting. It’s basically
divmod(a, b) = (a // b, a % b).
Bitwise operators works as in C++.
Numeric types are classes too, and implement
numbers.Integral. We can invoke methods on variables, but not on literals:
Python supports a concept of iteration over containers. Classes that implement the
next() methods are of type iterator. The
next() method should return the current value and proceed to the next value, using
raise StopIteration when the end is reached.
We can implement our own
(x)range function as an example:
The types that fall in this family of type are
Strings. is a list of characters, which themselves are 8-bits encoded ascii values (strings have some overhead besides the characters ). Strings literals can be written in single or double quotes.
Formatting strings: It accepts a syntax similar to sprintf from C. One interesting form is passing an dictionary of values and naming the patterns by the key name:
There also an alternative formatting using the
.format() method. A discussion can be read here.
Unicode strings. Python uses UTF-8 encoding for unicode. Literals of this type can be created by prefixing the value with an
u, for example
Tuples. are shallowly “immutable” containers. Their contents can’t be changed, but the objects their elements point to might be. It can be used without parenthesis and can be used in the LHS to unwrap values. For example:
(1, 2) and
(a, b) are tuples. It has bracket access, for example
Tuples are hashable if all its elements are hashable. This allows using tuples in sets or dictionary keys.
Lists. are mutable sequences. They’re indexed from 0 to length-1 and access out of this range throws an exception. The + operator can be used to concatenate lists. Conveniently, the * operator where the first operator is a list and second operant is an integer N, creates N shallow copies of the list (this works for tuples too).
Access to lists can be made using ranges, in which case it returns another list, for example
We must be careful in making copies of arrays where the elements are references to objects (for example other lists). In the example below, it’s likely not doing what we would want:
Xranges. The existence of the xrange type is justified by the
xrange() function. The python docs explain it well:
This function is very similar to range(), but returns an xrange object instead of a list. This is an opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously. The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break).
Bytearrays. are essentially mutable strings.
Buffer. is intended for memory-efficient manipulation of a large arrays, which otherwise would cause a copy. Guido van Rossum describes an example :
It was created with the desire to avoid an expensive memory-copy operation when reading or writing large arrays. For example, if you have an array object containing several millions of double precision floating point numbers, and you want to dump it to a file, you might prefer to do the I/O directly from the array’s memory buffer rather than first copying it to a string.
This Stack Overflow question also discusses the subject.
Set and Frozenset. The main difference between these two is that set is mutable while frozenset is immutable. These structures are implemented using a hash table, so all elements in a set/frozenset must be hashable. A type is hashable if it implements
__hash()__ (which shouldn’t change during its lifetime) and either
__cmp()__. Since frozenset is immutable and all its elements are hashable, frozenset itself is hashable.
In Python 2.7, sets can be constructed by a shorthand
Dictionaries are associative arrays. They’re called
dict in Python code. Dictionary keys must be hashable.
We can get a dictionary’s keys and values by
.values() respectively. The .items() method returns an array of pairs, each containing key and value. These methods all return copies, so if we assign .keys() to a variable and make changes to the dictionary, the changes won’t get reflected in the list assigned to the variable.
To get references instead of copies, we can use the .viewkeys(), .viewvalues() and .viewitems() methods, which are read-only references.
Returned by the
open() function. It’s used for file manipulation operations.
It was introduced in Python 2.7 and is a replacement for the buffer type.
Any user defined class that implements
__exit()__ has a context manager type. These types are useful for abstracting a specific try/finally pattern. More specifically, imagine we have the following pseudo-code:
If we do “set things up” and “tear things down” in many places and only change “do something”, we can abstract those in a class implementing a context manager type:
This post from Effbot as a very clear explanation.
In this post we covered the basic types from the Python environment. We were able to learn about some interesting features even from basic types like booleans and numeric. We also covered some more exoteric types like buffer and xrange. We got some exposure to other features like context manager.
In the next post in the series we’ll talk about functions.