I guess it's in the nature of Python to leverage internal/external modules written in a statically-typed compiled language to deliver high perfomance on core functionalities. Still, query processing (expression evaluation, filtering, aggregations, etc) are implemented in Python. In this benchmark we used the later, which shows considerable performance improvements. Spyql can both leverage the python standard lib for parsing json (written in C) as well as orjson (written in Rust). I/O intensive) the architecture/design might have a higher impact than the choice of the programming language. There might also be other tools that I was not aware when I wrote the benchmark (I just learned about a new one that we will be adding to the benchmark).įor me the lesson was that in certain problems (e.g. spyql was not created with the goal of being the fastest tool for querying data, and it might be the case that the same tools with different datasets or in different use-cases outperform spyql. Please take this claim and these results with a pinch of salt. If you are interested in receiving updates, please subscribe to the following issue: I am planning a separate benchmark on Python libs where Pandas, Polars and Modin (and eventually others) will be included. * I removed Pandas from the benchmark and focused on command-line tools. When processing 1GB of input data, SPyQL takes 4x-5x more time than the best, while still achieving up to 2x higher performance than jq (written in C) * SPyQL (written in Python) is now third: SPyQL leverages orjson (Rust) to parse JSONs, while the query engine is written in Python. Now, OctoSQL is one of the fastest and memory is stable * OctoSQL (written in Go) was updated as a response to the benchmark: updates included switching to fastjson, short-circuiting LIMIT, and eagerly printing when outputting JSON and CSV. ClickHouse is now the fastest (together with OctoSQL) * Added ClickHouse (written in C++) to the benchmark: I was unaware that the clickhouse-local tool would handle these tasks. The benchmark was updated and the fastest tool is NOT written in Python.