Web Dev Stories
AboutArticlesProjectsBookmarksContact

written on 10/14/2019

How to query with PostgreSQL wildcards like a Pro

Querying in SQL often requires wildcard queries. Below you can find a cheat sheet with all common operations. And even more below an explanation on how it works.

A cheat sheet for all PostgreSQL wildcard/like operations

1
# Get all entries like Melissa, Melli.
2
# Not depending on how many characters are wildcarded.
3
SELECT * FROM example_table WHERE example_column LIKE 'Mel%';
4
5
# Get all entries like Meli, Melo.
6
# Depending on how many characters are wildcarded.
7
SELECT * FROM example_table WHERE example_column LIKE 'Mel_';
8
9
# Or for multiple characters, here 2
10
SELECT * FROM example_table WHERE example_column LIKE 'Mel__';
11
12
# Get all entries which include mel, not case-sensitive.
13
# To match: Melissa, Emelio, Mel
14
SELECT * FROM example_table WHERE LOWER(example_column) LIKE '%mel%';
15
16
# With an ILIKE expression
17
SELECT * FROM example_table WHERE example_column ILIKE '%mel%';
18
19
# With a regular expression (Regex)
20
SELECT * FROM example_table WHERE example_column ~* 'mel';
  1. Basic filtering in PostgreSQL or SQL
  2. The % wildcard operator in a nutshell
  3. The _ wildcard operator in a nutshell
  4. Search with dynamic cases or full text search
  5. Conclusion

Basic filtering in PostgreSQL or SQL

PostgreSQL wildcards are an important tool for searching in SQL databases. Most of the time text in PostgreSQL databases is saved in columns as TEXT or VARCHAR type. The general recommendation is to use TEXT though. It is no problem though if something like the VARCHAR type is used. All the solutions work the same here in PostgreSQL.

Searching for an exact text in a column is quite easy with a statement like:

1
SELECT * FROM example_table WHERE example_column='Mel';

From a table with a lot of entries of fictional person data, it will filter all rows which have the column example_column with the value Mel.

id...example_column...
............
203...Max...
204...Michael...
205...Mel...
206...Melissa...
207...Mel...
208...Taylor...
209...Emelio...
............

After executing the query it would filter out all non-highlighted rows above, resulting in the following table:

id...example_column...
205...Mel...
207...Mel...

Exactly what we wanted to achieve. But for more advanced use cases like filtering all names which start with Mel, we would need to use wildcards.

The % wildcard operator in a nutshell

In our previous dataset, we filtered all values which start with Mel in the exact casing. Now we filter all rows which have the value like Mel out of our original dataset. This can be achieved with a wildcard character. In this case, here it is not important how many characters are wildcarded. This basically means it could be anything like Melissa, Melloremipsum or Mel. For getting this to work, PostgreSQL offers the character %. With this, we can build a query in theory which can match all of the words above. There is one nuance though, instead of using the WHERE ... = '...' filter, we need to use a statement called LIKE. The statement will look like SELECT * FROM ... WHERE ... LIKE '...'.

id...example_column...
............
203...Max...
204...Michael...
205...Mel...
206...Melissa...
207...Mel...
208...Taylor...
209...Emelio...
............

To get these results the query could look like:

1
SELECT * FROM example_table WHERE example_column LIKE 'Mel%';

And it would result in the following table as the SQL query result.

id...example_column...
205...Mel...
206...Melissa...
207...Mel...

The % sign in the query acts as a match for any character. But not even just a match. It is a match for zero to infinite characters. That is why this operator is matching Mel and Melissa. A lot of use cases result out of this and that is why this is the most used wildcard in SQL or PostgreSQL overall.

The _ wildcard operator in a nutshell

In the last search, we have defined infinite wildcards with the % which match strings with zero to an infinite amount of characters and other filters. Sometimes we have use-cases to just wildcard a specific character in some text. For example, there could be types with or without a single number attached. These types could be SALE but also SALE1 or SALE8. In this example, our matches look really similar, but if we know that there is just one number in the matching string, we could use the wildcard operator _ to match exactly the cases we want. For our sale types, we could use SALE_.

In our example table, we could have different results based on this. To make a good search though, we could find all the names in the table which have three characters but must start with M. In the example table, there are three entries that will match the condition.

id...example_column...
............
203...Max...
204...Michael...
205...Mel...
206...Melissa...
207...Mel...
208...Taylor...
209...Emelio...
............

The highlighted results in this table should be queried for with a PostgreSQL wildcard. The query is looking quite easy again.

1
SELECT * FROM example_table WHERE example_column LIKE 'M__';

An important point here is that the query is using two PostgreSQL wildcard characters which is a simple underscore _. The reason for this is that one underscore _ will replace one character in the queried value.

After executing the query with the PostgreSQL wildcard, it will output the following result:

id...example_column...
203...Max...
205...Mel...
207...Mel...

This result is not a surprise and resulted in our expected output. It includes the rows for the example_columns Max and Mel, where for Mel there exist two rows.

Another problem most people are facing is to search in different cases with PostgreSQL wildcards. This functionality in most programming languages is called include or similar. In JavaScript, the includes prototype function is existing for arrays. This can be used to check if an array includes a specific value. This works relatively well for primitive data types.

1
const numbers = [1, 4, 9];
2
const includesFour = numbers.includes(4);
3
console.log(includesFour);

The above script will output true to the console since the list includes the number 4. The includes functionality can also be used for texts like:

1
const name = "Emilio";
2
const includesPattern = name.includes("mel");
3
console.log(includesPattern);

This script also outputs true to the console. A problem here is that names like Melissa would not be matched with the same script but just exchanging the name because the functionality is case-sensitive.

1
const name = "Melissa";
2
const includesPattern = name.includes("mel");
3
console.log(includesPattern);
4
5
// Outputs false

A simple trick is existing in almost all programming languages to make the incoming value, which is the name in this case, lowercase. Basically it will transform all letters of a word to the small version of the letter. Melissa would get to melissa.

In JavaScript this is done in a way like this:

1
const name = "Melissa";
2
const includesPattern = name.toLowerCase().includes("mel");
3
console.log(includesPattern);
4
5
// Outputs true

The includes functionality is not existing in PostgreSQL or SQL in general. But for these cases, the PostgreSQL wildcards exist. As in the section earlier mentioned, there is an operator % which matches zero to an infinite amount of random characters in a word. We can use this operator to match our desired rows.

id...example_column...
............
203...Max...
204...Michael...
205...Mel...
206...Melissa...
207...Mel...
208...Taylor...
209...Emelio...
............

All of the highlighted rows have Mel or mel of the value of the example_column included. A simple approach here is to write a query which includes both like:

1
SELECT * FROM example_table
2
WHERE example_column LIKE '%mel%'
3
OR example_column LIKE '%Mel%';
id...example_column...
205...Mel...
206...Melissa...
207...Mel...
209...Emelio...

This is quite tedious, if there are more cases then just the first letter changing. The more dynamic approach to this would be to convert the example_column value to lower-case to work in a simpler way with it. To enable this feature and adjust the query, we just have to use the LOWER functionality of PostgreSQL. The functionality should be applied to the example_column column. The SQL query for this scenario would look similar to this:

1
SELECT * FROM example_table WHERE LOWER(example_column) LIKE '%mel%';

It would result in the exact same table output as before but the query looks far more condense and understandable.

id...example_column...
205...Mel...
206...Melissa...
207...Mel...
209...Emelio...

Another approach would be to use the ILIKE keyword. This keyword is supported in a large amount of PostgreSQL versions. It is not in the SQL standard which means that it might be not transferrable to other SQL dialects. Our SQL query from before would look like the following snippet:

1
SELECT * FROM example_table WHERE example_column ILIKE '%mel%';

This looks a bit cleaner, but it is good to mention that this is a bit slower than the LOWER conversion which we have done before. But it is a good alternative for case-insensitive SQL queries.

Another alternative would be to use a Regular expression (also called Regex).

This could be realized with an SQL query like the following.

1
SELECT * FROM example_table WHERE example_column ~* 'mel';

This query is even slower, but I would not recommend using Regular Expressions in these cases because they are hard to understand for a lot of people, even though the expressions are much more flexible. Sometimes you need to have the flexibilities and then regular expressions are exactly what you want to use.

All of these approaches here work in simple solutions for full-text searches. But when performance is important for your project it might be worth it to look into to_tsvector and to_tsquery and proper indexing of your PostgreSQL tables. You can find more information about the text search in tables in the PostgreSQL documentation here: https://www.postgresql.org/docs/current/textsearch-tables.html.

Conclusion

PostgreSQL wildcards are an amazing tool to extend queries. Queries are adjusted to include a wide range of dynamic filtering which is important for systems with big data. Most of the time this data is unstructured on the value-base. A good example is the sale types mentioned in an earlier example which could be SALE1 or SALE2. For some sub-systems, this difference might be important but for some systems, it is not important. Also, the data should rather be structured to include a detailed_... column for these cases to achieve SQL normalization. With the wildcards, we can search for these "unnormalized" values in an easy way.

You might also like

© Kevin Peters 2021

Imprint