trans_date_trans_time cc_num merchant category amt 2020-06-21T12:14:25 2291163933867244 fraud_Kirlin… personal_care 2.86 first last gender street city state zip lat long Jeff Elliott M 351 Darlene Green Columbia SC 29209 33.9659 -80.9355 city_pop job dob trans_num unix_time is_fraud 333497 Mechanical engineer 1968-03-19 2da90c7d74… 1371816865 0
- Which transactions exist in the dataset?
- What is the amount of a given transaction?
- What is the merchant of a transaction?
- What is the category of a transaction?
- Is a given transaction fraudulent?
- What is the cardholder's name?
- What is the transaction date and time?
- What is the card number used?
- What is the merchant's location?
- What is the cardholder's location?
These questions require no joins, no aggregations, no geo reasoning. They don't encode the intended entity structure.
CreditCardTransactionhasPart, merchant as string
Merchantlat/long only
Metadataflat blob of all other columns
- Which transactions are flagged as fraudulent?
- Which cardholders made >3 fraudulent transactions?
- Total amount per merchant category?
- Job and date of birth of a cardholder?
- Cardholders in cities with pop > 100 000?
- Merchants with most fraud flags?
- Which categories are most fraud-associated?
- Transactions >50 km from cardholder location?
- Fraudulent transactions in a specific US state?
- Transactions within the same hour?
- Cardholder with multiple transactions in 10 min?
These questions encode the cardholder–card–merchant–address entity structure present in the gold KG.
CreditCardTransactionisFraudulent, cardholder link
Merchanthas_part transaction link
Persongender, jobTitle, birthDate ✓
MetadatatransDateTime, state, zip ✓