Not long ago, I wrote an article about Many quirks of JavaScript, highlighting some of the language's peculiarities and shedding light on things that seem nonsensical but, in a strange, twisted way, actually make sense.
However, JavaScript isn't the only programming language with oddities. In fact, I don't think I've ever encountered a language that didn't have at least a few quirks hidden under the hood, except maybe for Brainf*ck. That language is perfect.
Today, I've decided to delve into Python.
I'm not going to rehash some of the more generic issues in the programming world, like the well-known problem of 0.2 + 0.1 not equaling 0.3 or other numerical oddities. Those aren't really Python-specific problems, and I've already covered them in the JavaScript article.
However, there is one magical thing worth repeating:
3 > 2 > 1
As I explained in the previous article, in JavaScript, this will return False
because of the way the condition is evaluated from left to right. Python is different, though; you can chain the comparison operators quite intuitively, and the above condition will be evaluated as (3 > 2) && (2 > 1)
, therefore returning True
. I'm not sure if this qualifies as a quirk in Python's case, but I thought it was worth mentioning.
Now, let's dive into something more fun.
Strings and floats may break my bones...
Since I've already brought up the topic of numbers, let's delve into this interesting behavior. It only works in Python 2.7, but still:
print(1 < float("+inf"))
print(1 > float("+inf"))
print("1" < float("+inf"))
print("1" > float("+inf"))
What do you think the output of the code above is? Well, number 1 is less than infinity, right? So maybe something like True
, False
, True
, and False
?
Nope. The first two lines are True
and False
, but then we get False
and True
, respectively. And what's also odd is when you try it with -inf
instead of +inf
, you will get False
, True
, and... Also False
and True
?
But it gets weirder. If we try str(float("+inf"))
and str(float("-inf"))
, we get "inf"
and "-inf"
respectively. And if we try "1" < "inf"
and "1" < "-inf"
, we get True
and False
. What is going on? Why is "1" < float("+inf")
equal to False
, when "1" < "inf"
is True
?
Well, apparently, Python 2.7 doesn't cast the second parameter into the type of the first one as one might expect. Instead, it compares the objects as they are, and for some reason, a string is always bigger than a float. So yes, even "" > 99999
will return True
.
Now, bear in mind that in Python 3 and above, the code will just throw an exception because you can't natively compare a string and a float anymore, which is probably for the better, as we just demonstrated.
Fun with booleans
Now let's try something else. Did you know that a boolean is actually a subclass of an integer in Python? That's right, isinstance(True, int)
returns True
.
What's also quite fun is that you can multiply a string by a number. "abc" * 2
will return "abcabc"
. With the combination of numerical booleans, you can create your own shorter ternary operator. Instead of doing print "Hello" if Enabled else ""
, you can just do print Enabled * "Hello"
.
Not sure how helpful that is, but hey, it's something. By the way, did you know that in Python 2.7, booleans used to be mutable? That's right, you used to be able to do True, False = False, True
and watch the world burn.
Alright, alright, you might say that this is all cool and all, but it's either not that crazy or it only works in the old versions of Python. From Python 3 onwards, it's flawless, right? Is it?
Let's rewrite math
As you probably know, Python doesn't really have a concept of encapsulation, access modifiers, or anything like that. Anyone can change anything. What's quite crazy is that you can even change stuff from other libraries, like this:
import math
math.pi = 42
math.e = 69
math.tau = 0
And yes, before you ask, this does work in Python 3. Nothing has changed in how you can manipulate constants (which aren't really constants) from any object or module.
I remember back in my college days when I primarily used C#, I hated this so much that I even used to define all my constants in Python as methods. Which... doesn't really make much sense, since you can rewrite those as well:
class myclass:
def mypi():
return 3.1415926535
myclass.mypi = lambda: "I said no."
So yeah, no point in doing that either, but I guess it just felt better to have at least some form of resemblence of the behaviour of constants. Or maybe I didn't even know you could just rewrite methods like that (again, I was used to C#).
Same numbers are equal, but some are more equal than others
x = 20
y = 20
z = x + y
a = 15
b = 25
c = a + b
print(c is z)
As you might expect, you will get True
as the output. But let's do this now:
x = 200
y = 200
z = x + y
a = 150
b = 250
c = a + b
print(c is z)
Suddenly, you will get False
for some reason. And what's weirder is that 400 is 400
returns True
as expected if you try it by hand.
Now you might say, "That's because you're comparing by reference. Use a normal equality operator instead, and you will get True
in all cases." Sure, but that's boring. Instead, let's look into why the reference comparison sometimes works but sometimes doesn't.
The reason for that is that in Python, small numbers are typically cached and reused by the interpreter, behaving pretty much like singletons. These integers range from -5 to 256. The rationale behind this optimization is that small integers are commonly used in many programs, so by caching and reusing them, Python can save memory and improve performance.
So when we compared the references for variables holding the number 40, it really was the same object in memory for both c
and z
. But that wasn't the case for the number 400.
One disclaimer: the above example doesn't work on https://pythonsandbox.com/ and maybe some other non-standard setups. I'm not entirely sure what the specifics are and when exactly the optimization kicks in, but it's definitely an interesting behaviour.
But yeah, I guess comparing numbers by reference might be a bad idea. Who would have thought?
(UPDATE: A colleague says that the behavior of the is
operator for numbers is, in fact, undefined. That's why the behavior might slightly change on different setups or versions.)
To the infinite references and beyond!
Now, before we get to the big thing, I want to look into one last quirk that I find interesting, although probably not very practical.
You know how some object topologies can form closed graphs? An object can have an attribute referencing itself and so on; that's nothing new. Well, in Python, you can have self-contained lists with essentially infinite recursion. Yes, you can do that in pretty much any programming language using some form of generic class containing non-specific objects that act like an array, which you later fill with a reference to itself. But in Python, you can do this without any of that, simply using:
mylist = [42, "hello", mylist]
Thought that was curious.
But now, let's try some other magic with arrays:
arr_2D = [[0]*4]*4
print(arr_2D)
arr_2D[1][1] = 1
print(arr_2D)
What do you think will be the output of the code above? Go ahead, think about it. I'll wait.
The first print is easy, it's just [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
. But what will the second print look like?
We are just changing the second number of the second array, right? So the sixth zero in the above output will switch to 1, and that's all, right?
Nope, the second output will be:
[[0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]]
What in the world is going on here?
Well, aparently, Python really likes to save things by reference whenever it gets a chance.
The first line doesn't actually generate a 2D array, but just four references to the same array, one after the other. What you need to do is create your array like this:
arr_2D = [[0]*4 for i in range(4)]
Then everything works as expected.
Let's talk about mutable arguments
Alright, I have to address the elephant in the room now. You might have heard about something called mutable default arguments. What is that? Well, to put it simply, Python's default arguments are evaluated when the functions are defined, which means exactly once. They are then shared when the functions are actually called.
It's kind of similar to using the static
keyword in any C-like language, although only for mutable types, as demonstrated later. When you mutate the argument in the body of the function, you will get the mutated version of the variable in all subsequent calls. The most common example uses a default list like this:
def add(element, array=[]):
array.append(element)
return array
print(add(42))
print(add(42))
Output:
[42]
[42, 42]
Now, you might say, "That's kind of cool; I can implement a counter that's shared across all calls, right?" Well, no. Numbers are immutable, so this won't work:
def do_something(txt, counter = 0):
counter = counter + 1
print(txt)
print("Counter: " + str(counter))
do_something("Doing a task...")
do_something("Doing a task...")
Output:
Doing a task...
Counter: 1
Doing a task...
Counter: 1
That's what makes it so problematic, in my opinion. It takes the worst aspects of static variables without implementing the best, resulting in a counter-intuitive mess that's mostly useless outside of some niche uses, like a semi-global cache.
And that's why it's a common practice to define default arguments as None in Python and then initialize them in the actual body of the function. This approach, while effective, can be a bit clunky and lead to a lot of code repetition at the start of each function that uses default arguments.
There is a Proposal to simplify the syntax a little bit, so it might be possible to at least write something like array ??= []
. This doesn't really address the core problem but it would at least make the code less convoluted. But until such a proposal is accepted and implemented, we need to get used to a bunch of ugly conditions at the top of each function.
Scopes and late binding
Now, there is one last thing that I'd like to address, and that is how Python binds values of variables used in blocks and functions, which uses pretty much the opposite ideology of default arguments. You see, Python uses "Late binding," which means the variables are bound at the time of function execution, not at the time of their definition. Just pick one, Python!
Since Python doesn't require variable definitions before their first use like some other languages, it kind of makes sense that the binding is a bit more lenient, and you can do something like this:
for x in range(43):
y = x
print(x)
print(y)
Output:
42
42
Yeah, both work; both variables are bound outside of the actual for
block. It's a bit weird coming from C languages, but at least it allows some extra flexibility when breaking out of loops and other blocks and then using the variables from those nested blocks.
Now let's delve into the crazy town of Late binding. Let's say you have something like this:
def make_functions():
functions = []
for i in range(5):
def fun():
return 'X is: ' + str(i)
functions.append(fun)
return functions
for f in make_functions():
print(f())
What do you think will be the output of the script above? Intuitively, it should be 0, 1, 2, 3, and 4, right? We will have five functions, each of them having their own closed-over variable to print, right?
Well, no. We will get the number 4 printed five times. The values of the variables used in the closure are looked up in the surrounding scope at the time the function is executed at call time. But by then, the loop has completed, and the variable is left with its final value of 4.
Now this is something that can happen in other languages as well. Especially JavaScript works similarly and allows you to choose how to work with scope by either using var
or let
. In C#, some things might get a bit unintuitive with lambda functions, and don't get me started on variable binding in something like LUA. Each language does indeed have its quirks.
But I have one last headscratcher for you - courtesy of a colleague who's an expert on Python:
x = 10
def a():
print(x)
def b():
print(x)
x += 1
print(x)
a()
b()
What will be the output here?
Is this a trick question? Surely it's 10, 10 and 11, right? There's no other way it's anything else, or is there?
Surprise, you will get 10 and then... UnboundLocalError.
Now this is what's happening, and it's completely insane. In the method a
we are referencing x
outside of the scope of the function. That is fine, we can do that. We are just reading the value. But in method b
not only are we referencing the number x
, but we are also changing the value. What Python sees is "oh, we are going to set the variable x
, guess x
is now a new variable in the scope of the function. And what should be the value? Oh, right, incremented value of x
. The x
that we have in the scope of the function. The new x
, that we are trying to declare now."
In other words, Python sees the code more like this:
x = 10
def a():
print(x)
def b():
print(x2)
x2 += 1
print(x2)
a()
b()
The difference between method a
and b
is that a
only READS the value. Python sees that and doesn't create a new x
in the scope of the function like in function b
.
And why does at least the first print not work in the method b
before we try to change the value of x
? Well, Python declares the function variables beforehand. It looks at what variables are used and then executes the code after. So the first print already references the "new" x
which does not have a value yet. If you wish to reference the old x
, you need to use the global
keyword.
Quite wild if you ask me. And again, kind of the oposite of the ideology of Late binding we mentioned earlier. It's almost like Python just does whatever it wants in different scenarious. Yes, there are rules and it is deterministic. But maybe the rules aren't the most intuitive. What do you think?
Final word
So yeah, there you have it. Python has its quirks just like any other language. And maybe it is so approachable because of those quirks and because you don't have to worry about the scope of your variables too much. Or maybe you will spend countless nights debugging something because you thought you don't have to worry about the scope of your variables too much.
Who knows.
Anyway, you can read about some other Python quirks here: https://python-quirks.readthedocs.io/en/latest/
Add new comment