How can I get minus difference in BigQuery ?
BigQuery(BQ) is very useful for data analysis or processing.
It is good at handling huge data. It returns summary result in short time.
And also we can use BQ with very cheap price.
Sometimes we want to compare data between 2 tables.
For example, when we change some data processing program, we must check difference between previous result and modified one.
But How can we get minus difference in BigQuery ?
So today I introduce about "How to get minus difference in BigQuery".
Prepare data
First, we have to prepare data.
This case we try to get table difference. So we prepare 2 tables.
Import CSV data below as table minus_sample01
.
col1,col2,col3
a,b,c
aa,bb,cc
aaa,bbb,ccc
aaaa,bbbb,cccc
Next, import this data into table minus_sample02
.
col1,col2,col3
a,b,c
aa,bb,cc2
aaa2,bbb,ccc
aaaa,bbbb,cccc
Try "MINUS" to get difference
Then how can we get difference between these tables ?
In case of Oracle, we can use minus
.
So try minus
clause.
SELECT * FROM test.minus_sample01
MINUS
SELECT * FROM test.minus_sample02
Result is below.
Syntax error: Expected end of input but got keyword SELECT at [3:1]
Syntax error
, it means SQL sentence was wrong.
So we can't use MINUS
in BQ.
How to get minus difference in BigQuery
In order to get minus difference in BigQuery, you can use set operator EXCEPT DISTINCT
.
You can use it like MINUS
clause.
SELECT * FROM test.minus_sample01
EXCEPT DISTINCT
SELECT * FROM test.minus_sample02
Result is like below.
Row | col1 | col2 | col3 |
---|---|---|---|
1 | aa | bb | cc |
2 | aaa | bbb | ccc |
You can get rows that only minus_sample01
has.
If you want records that only minus_sample02
has, switch front sentence for back sentence.
SELECT * FROM test.minus_sample02
EXCEPT DISTINCT
SELECT * FROM test.minus_sample01
Result is like this.
Row | col1 | col2 | col3 |
---|---|---|---|
1 | aa | bb | cc2 |
2 | aaa2 | bbb | ccc |
We got unique rows in second table.
Conclusion
Today I introduced about "How to get minus difference in BigQuery".
In order to get minus difference in BigQuery, we can use method below.
- Use set operator
EXCEPT DISTINCT
.
A + B
is UNION ALL
, A - B
is EXCEPT DISTINCT
.