Date: December 2009
Most developers already know how to use Pervasive PSQL’s SELECT statement, or how to write queries. And I believe them when they say that. But do they really understand all the features, and do they really use all the capabilities? By talking to many developers over the years I’ve discovered that in most cases that’s not the case. Most developers restrict themselves to using a subset of this statement, and therefore they are limiting themselves. As a result the programs become unnecessarily complex, and they are not as efficient as they could be. It would be comparable to buying a house, and restricting yourself by using just a few of the rooms. That wouldn’t make sense. The better a developer understands SQL, the more efficient his or her programs will become and the better Pervasive PSQL is exploited. This and coming blogs will make developers more aware of the features of SQL as implemented in Pervasive PSQL. This first one is devoted to subqueries.
Subqueries have been part of Pervasive PSQL's SQL dialect for a long time now. But in the beginning subqueries could only be used within the WHERE and HAVING clauses of the SELECT, UPDATE, and DELETE statements. Nowhere else could you specify a subquery. In fact, its use was constrained to selecting rows from intermediate results. Currently, it is possible to specify subqueries almost anywhere, also in the SELECT clause. Besides making the SQL dialect more orthogonal, it also simplifies writing certain queries. Through a few examples, this article shows the power of using subqueries in the SELECT clause.
By the way, the word orthogonal in the context of programming languages means, amongst other things, that the number of odd exceptions is reduced to a minimum.
To illustrate the use of subqueries in the SELECT clause, we use the three tables presented in Figure 1. In these three tables the columns PLAYERS.PNO, PENALTIES.PENNO, and TEAMS.TEAMNO represent the primary keys. The two columns PENALTIES.PNO and TEAMS.PNO are foreign keys pointing to the primary key of the PLAYERS table. Note: This example is derived from the standard example used throughout the book The SQL Guide to Pervasive PSQL.

Figure 3: The Three Sample Tables
Let’s start this discussion with an example.
Example 1: For each player get the name and the number of his or her penalties. Without subqueries the query would look as follows:
SELECT PLAYERS.PNO, PLAYERS.NAME, COUNT(PENALTIES.PENNO)
FROM PLAYERS LEFT OUTER JOIN PENALTIES
ON (PLAYERS.PNO = PENALTIES.PNO)
GROUP BY PLAYERS.PNO, PLAYERS.NAME
The result is:
PNO NAME COUNT(PENALTIES.PENNO)
--- --------- ----------------------
1 Metheny 2
2 Pastorius 1
3 Zawinul 0
If you use subqueries within the SELECT clause instead, the query would look differently:
SELECT PLAYERS.PNO, PLAYERS.NAME,
(SELECT COUNT(PENALTIES.PENNO)
FROM PENALTIES
WHERE PENALTIES.PNO = PLAYERS.PNO)
FROM PLAYERS
What is special about this solution is that the main query does not contain a join, but its SELECT clause contains a subquery. This subquery is correlated because it contains a reference to the main query (PLAYERS.PNO). That means we have to assume that for each row in the PLAYERS table Pervasive PSQL processes this subquery. And because it contains a correlation, it might return a different result for each row. This subquery counts for each player separately his or her number of penalties.
In the solution with the join, we have to be sure we specify the correct join, in this case a left outer join. If we hadn’t done that, but had used an inner join instead, we would not have seen player 3 in the end result, because that player doesn’t have penalties. In the second solution, this is not an issue. The subquery will automatically return the right result. If a player has no penalties, the subquery will return the value zero, which is exactly what we want.
Is this query really easier to formulate than the previous one? Probably not, they are comparable in complexity. However, an advantage of the second solution is that the query resembles the original question somewhat more, and that makes it a little easier to write and read. But that is not a very strong argument for using the second solution. However, there are queries that are only easy to formulate if this construct is used. Let’s demonstrate that with another example.
Example 2: For each player get the name, the number of penalties, plus the number of times this player appears in the TEAMS table.
If we would extend the first query to cater for this extension, we would get this solution:
SELECT PLAYERS.PNO, PLAYERS.NAME, COUNT(PENALTIES.AMOUNT), COUNT(TEAMS.TEAMNO)
FROM (PLAYERS LEFT OUTER JOIN PENALTIES
ON (PLAYERS.PNO = PENALTIES.PNO))
LEFT OUTER JOIN TEAMS
ON (PLAYERS.PNO = TEAMS.PNO)
GROUP BY PLAYERS.PNO, PLAYERS.NAME
The result is:
PNO NAME COUNT(PENALTIES.AMOUNT) COUNT(TEAMS.TEAMNO)
--- --------- ----------------------- -------------------
1 Metheny 2 0
2 Pastorius 2 2
3 Zawinul 0 1
The two joins make this query hard to read. But more importantly, this result might look correct, but it isn’t. The second row shows that player 2 has two penalties, which is not true; he has only one penalty. The problem is that player 2 appears once in PENALTIES and twice in TEAMS. The join results in a multiplication of rows: 1 times 2, and as a result the number of penalties is too high.
With subqueries the query would look different and would return a correct result:
SELECT PLAYERS.PNO, PLAYERS.NAME,
(SELECT COUNT(*)
FROM PENALTIES
WHERE PENALTIES.PNO = PLAYERS.PNO),
(SELECT COUNT(*)
FROM TEAMS
WHERE TEAMS.PNO = PLAYERS.PNO)
FROM PLAYERS
The result is:
PNO NAME COUNT(PENALTIES.AMOUNT) COUNT(TEAMS.TEAMNO)
--- --------- ----------------------- -------------------
1 Metheny 2 0
2 Pastorius 1 2
3 Zawinul 0 1
This result is correct, you can check that yourself if you want to. In addition, the solution is quite straightforward. This is a clear example of where the use of subqueries in the SELECT clause is the only correct solution. We could not have written this query using joins.
There are other examples as well that show that when you allow the use of subqueries in the SELECT clause, it makes life easier for developers.
Example 3: For each penalty get the difference between the amount and the average amount.
SELECT PENNO, PNO,
ABS(AMOUNT - (SELECT AVG(AMOUNT) FROM PENALTIES)) AS DIFF
FROM PENALTIES
The result is:
PENNO PNO DIFF
----- --- ----
1 1 10
2 1 0
3 2 10
Example 4: For each penalty show the amount plus the cumulative amount.
SELECT PENNO, AMOUNT,
(SELECT SUM(AMOUNT)
FROM PENALTIES AS SUB_PEN
WHERE SUB_PEN.PENNO <= MAIN_PEN.PENNO) AS CUM_TOTAL
FROM PENALTIES AS MAIN_PEN
The result is:
PENNO AMOUNT CUM_TOTAL
----- ------ ---------
1 60 60
2 50 110
3 40 150
Summary: Using subqueries in the SELECT clause really does extend the power of Pervasive PSQL’s SQL dialect. It reduces the complexity of certain queries, and it can lead to SELECT statements that are easy to write, understand, and maintain. If you haven’t been using this feature yet, I strongly recommend you to get comfortable with it.