Why did comparison and conditional aggregation fail in BigQuery ?

**BigQuery(BQ)** is very useful for data analysis or processing.

It is good at handling huge data. It returns summary result in short time.

When we handle big data, sometimes we want to extract or summarize data that meets specific conditions.

Usually we use comparison operator like below.

`WHERE COLUMN1 = "A"`

`SUM(IF(COLUMN1 = "A",1,0)`

But **it fails in specific case.**

Why did it fail ?

So today I introduce about **"Why comparison and conditional aggregation fails in BigQuery"**.

## Prepare data

First, prepare data.

Import this csv file as `null_sample`

table.

This data has null cells.

```
col1,col2,col3
a,b,c
,b,c
,,c
a,b,
a,,
```

Then you can see table like below.

Table

Row | col1 | col2 | col3 |
---|---|---|---|

1 | null | b | c |

2 | null | null | c |

3 | a | b | c |

4 | a | b | null |

5 | a | null | null |

## Comparison and conditional aggregation

There is a good example of comparison and conditional aggregation.

We use operator `=`

.

```
SELECT * FROM test.null_sample
WHERE col1 = "a"
```

Result

Row | col1 | col2 | col3 |
---|---|---|---|

1 | a | b | c |

2 | a | b | null |

3 | a | null | null |

In order to get conditional sum, we can use `SUM`

and `IF`

function.

```
SELECT
SUM(IF(col1="a",1,0)) as sum_col1_a
FROM test.null_sample
```

Result

Row | sum_col1_a |
---|---|

1 | 3 |

## Example of failure case

Then there is an example of failure.

When we use `!=`

, its comparison does not work.

```
SELECT * FROM test.null_sample
WHERE col1 != "a"
```

Result

No result.

Even in case of conditional aggregation, it fails like below.

```
SELECT
SUM(IF(col1="a",1,0)) as sum_col1_a,
SUM(IF(col1!="a",1,0)) as sum_col1_not_a,
SUM(IF(col2!="a",1,0)) as sum_col2_not_a,
SUM(IF(col3!="a",1,0)) as sum_col3_not_a
FROM test.null_sample
```

Result

Row | sum_col1_a | sum_col1_not_a | sum_col2_not_a | sum_col3_not_a |
---|---|---|---|---|

1 | 3 | 0 | 3 | 3 |

Sample table has 5 records.

In the table, about non- `a`

count, it should be 2 in col1.

And it should be 5 in col2 or col3.

Why did it fail to aggregate ?

## The reason why comparison and conditional aggregation fail in BigQuery

The reason why comparison and conditional aggregation fails in BigQuery is `null`

.

`null`

is special **that does not return true to both = nor !=.**

So if you want to compare data, you should replace null with other value by `IFNULL`

or `COALESCE`

.

```
SELECT * FROM test.null_sample
WHERE IFNULL(col1,"") != "a"
```

Result

Row | col1 | col2 | col3 |
---|---|---|---|

1 | null | b | c |

2 | null | null | c |

For conditional aggregation, we should use `IFNULL`

or `COALESCE`

.

```
SELECT
SUM(IF(IFNULL(col1,"")="a",1,0)) as sum_col1_a,
SUM(IF(IFNULL(col1,"")!="a",1,0)) as sum_col1_not_a,
SUM(IF(IFNULL(col2,"")!="a",1,0)) as sum_col2_not_a,
SUM(IF(IFNULL(col3,"")!="a",1,0)) as sum_col3_not_a
FROM test.null_sample
```

Result

Row | sum_col1_a | sum_col1_not_a | sum_col2_not_a | sum_col3_not_a |
---|---|---|---|---|

1 | 3 | 2 | 5 | 5 |

In this case we got correct non-`a`

records and non-`a`

count.

## Conclusion

Today I explained about **"Why comparison and conditional aggregation fails in BigQuery"**.

The reason why comparison and conditional aggregation fails in BigQuery is `null`

.

It fails when we try to **compare null directly**.

Import point is below.

Replace null by `IFNULL`

or `COALESCE`

Handling `null`

is complicated.