
How can I get cross join data with pandas DataFrame ?
pandas.DataFrame
is useful to handle table format data.
With using pandas, we can import CSV or Excel data and join them with common key column.
And sometimes you want to get all combination data without any join condition.
But how can we get such all combination data.
So today I will introduce about "How to get cross join with pandas.DataFrame".
pandas.DataFrame "merge"


In order to join two DataFrame, we can use merge
.
So let's try to use merge
.
Prepare 2 DataFrame with common key column.
import pandas as pd
data_list1 = [
["a",1],
["b",2],
["c",3],
["d",4]
]
col_list1 = ["column1-1","key"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)
# column1-1 key
# 0 a 1
# 1 b 2
# 2 c 3
# 3 d 4
data_list2 = [
["e",1],
["f",2],
["g",3],
["h",4]
]
col_list2 = ["column2-1","key"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)
# column2-1 key
# 0 e 1
# 1 f 2
# 2 g 3
# 3 h 4
Then with using pd.merge()
, we can join two DataFrame by common key.
df_crossjoin = pd.merge(
df1,
df2,
on="key"
)
print(df_crossjoin)
# column1-1 key column2-1
# 0 a 1 e
# 1 b 2 f
# 2 c 3 g
# 3 d 4 h
Cross join with merge


Next, in cross join case, there are no common keys.
So can we use merge
without common key ?
Let's try to use merge
.
import pandas as pd
data_list1 = [
["a"],
["b"],
["c"],
["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)
# column1-1
# 0 a
# 1 b
# 2 c
# 3 d
data_list2 = [
["e"],
["f"],
["g"],
["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)
# column2-1
# 0 e
# 1 f
# 2 g
# 3 h
df_crossjoin = pd.merge(
df1,
df2
)
print(df_crossjoin)
# MergeError: No common columns to perform merge on.
# Merge options: left_on=None, right_on=None, left_index=False, right_index=False
As this result, if we don't set common column in merge
, it shows error like above.
So how can we get cross join data ?
How to get cross join with pandas.DataFrame


In order to join DataFrame, we can use merge
.
But without common key, merge
raises error.
So how can we do cross join ?
In order to get cross join with pandas.DataFrame, we can use parameter how = "cross"
.
df_crossjoin = pd.merge(
df1,
df2,
how = "cross"
)
print(df_crossjoin)
# column1-1 column2-1
# 0 a e
# 1 a f
# 2 a g
# 3 a h
# 4 b e
# 5 b f
# 6 b g
# 7 b h
# 8 c e
# 9 c f
# 10 c g
# 11 c h
# 12 d e
# 13 d f
# 14 d g
# 15 d h
With using how = "cross"
, we can get cross join DataFrame.
how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
pandas.merge — pandas 2.2.1 documentation
- left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
- right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
- outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
- inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
- cross: creates the cartesian product from both frames, preserves the order of the left keys.
Dummy common key


There is another way to get cross join DataFrame.
There is a method that we add dummy common column to both DataFrame.
Check following code.
import pandas as pd
data_list1 = [
["a"],
["b"],
["c"],
["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
df1["dummy_key"] = 1
print(df1)
# column1-1 dummy_key
# 0 a 1
# 1 b 1
# 2 c 1
# 3 d 1
data_list2 = [
["e"],
["f"],
["g"],
["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
df2["dummy_key"] = 1
print(df2)
# column2-1 dummy_key
# 0 e 1
# 1 f 1
# 2 g 1
# 3 h 1
df_crossjoin = pd.merge(
df1,
df2,
on="dummy_key"
)
print(df_crossjoin)
# column1-1 dummy_key column2-1
# 0 a 1 e
# 1 a 1 f
# 2 a 1 g
# 3 a 1 h
# 4 b 1 e
# 5 b 1 f
# 6 b 1 g
# 7 b 1 h
# 8 c 1 e
# 9 c 1 f
# 10 c 1 g
# 11 c 1 h
# 12 d 1 e
# 13 d 1 f
# 14 d 1 g
# 15 d 1 h
Add dummy columns with same value.
Then with using dummy columns as common key, we can do cross join.
In addition, if you don't want to show dummy column, you can delete it by drop
.
df_crossjoin.drop(columns=["dummy_key"], inplace=True)
print(df_crossjoin)
# column1-1 column2-1
# 0 a e
# 1 a f
# 2 a g
# 3 a h
# 4 b e
# 5 b f
# 6 b g
# 7 b h
# 8 c e
# 9 c f
# 10 c g
# 11 c h
# 12 d e
# 13 d f
# 14 d g
# 15 d h
Conclusion


Today I described about "How to get cross join with pandas.DataFrame".
Important points are following.
- Use
merge
to join DataFrame merge
requires common key- In order to get cross join, use parameter
how = "cross"
- You can also add dummy column as common key



Without on
parameter, you can use how
parameter.