A search engine company hires you to maintain a database that records the searched words per customer and the advertisements paid per advertisers.
You must store all sentences searched by customers, the timestamp of the query, and the customer identifier (e.g., using browser cookies). These sentences consist of different words (called keywords). You should also determine the language the customer used to search each sentence based on the keywords used. Note that a given keyword may appear in one or more languages.
Advertisers will also share their pages to be displayed once customers type a keyword related to their page. The price paid by advertisers is proportional to the probability to return their page first when queried by customers.
(Q1): Draw an ER diagram, and provide the database schema. Underline the primary keys.
(Q2): Write the CREATE TABLE statements including all the constraints.
(Q3): List the top 20 most common keywords searched per customer, in decreasing frequency order.
(Q4): Returns all languages that have never been used in any searched query.
(Q5): Write a SQL query to show bilingual customers. Return the customer identifier, his name ordered by decreasing the number of languages that he potentially speaks. We assume that a customer speaks a language if he searched at least once a sentence from the language.
Solution:
1.
ERD:
Schema:
2.
create table Customer (
CookieID varchar(100) not null,
Name varchar(50),
primary key(CookieID)
);
create table Search (
Timestamp varchar(15) not null,
CookieID varchar(50) not null,
Sentence varchar(100) not null,
primary key(Timestamp, CookieID),
foreign key(CookieID) references Customer(CookieID)
);
create table Keyword (
Word varchar(50) not null,
Language varchar(50) not null,
primary key(Word, Language)
);
create table SearchKeyword (
Timestamp varchar(15) not null,
CookieID varchar(50) not null,
Word varchar(50) not null,
Language varchar(50) not null,
primary key (Timestamp, CookieID, Word, Language),
foreign key (Timestamp, CookieID)
references Search(Timestamp, CookieID),
foreign key (Word, Language)
references Keyword(Word, Language)
);
create table PagesHit (
Advertiser varchar(50) not null,
Url varchar(255) not null,
Timestamp varchar(15) not null,
CookieID varchar(50) not null,
Word varchar(50) not null,
Language varchar(50) not null,
primary key(Advertiser, Url, Timestamp, CookieID, Word, Language),
foreign key (Timestamp, CookieID, Word, Language)
references SearchKeyword(Timestamp, CookieID, Word, Language)
);
3.
select top 20
Customer.CookieID,
SearchKeyword.Word,
count(SearchKeyword.word) as SearchCount
from
Customer,
Search,
SearchKeyword
where
Customer.CookieID = Search.CookieID and
Search.Timestamp = SearchKeyword.Timetamp and
Search.CookieID = SearchKeyword.CookieID
group by
Customer.CookieID,
SearchKeyword.Word
order by
count(SearchKeyword.word) desc;
4.
select
Keyword.Language
from
Keyword
where
Keyword.Language not in (
select
SearchKeyword.Language
from
SearchKeyword
);
5.
select
Customer.CookieID,
Customer.Name,
count(DistinctLanguages.Language) as LanguageCount
from
Customer,
(select distinct
SearchKeyword.CookieID,
SearchKeyword.Language
from
SearchKeyword) as DistinctLanguages
where
Customer.CookieID = DistinctLanguages.CookieID
group by
Customer.CookieID,
Customer.Name
having
count(DistinctLanguages.Language) >= 2
order by
count(DistinctLanguages.Language) desc;