The Glade 4.0

"Turn the lights down, the party just got wilder."
It is currently Sun Nov 24, 2024 2:47 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 24 posts ] 
Author Message
 Post subject: Database Design
PostPosted: Mon Jan 27, 2014 11:30 am 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
So I am working on a project to organize data for my company. I have roughly 10,000 records and up to 1,000 users trying to read it at the same time. Today our navigation and content management is utter crap (basically full text search of content), so to facilitate retrieval I want to tag data by essentially building an address book but I am banging my head against how to set up the relationships for the data tags. I can't figure out how to make SQL recognize a one-to-many relationship for the various components. Here's what my "address book" needs to contain:

Page - URL of a page, can be external or internal
Page Description - brief text summary of content
Page Name - brief text name
Categories - Can be 1 or any combination of 10 distinct categories
Sub-Categories - Can be 1 or any combination of 30 distinct subcategories
Topics - Can be 1 or any combination of 30 distinct topics

Users will pick a category and be presented with a valid list of subcategories and based on their subcategory selection they are presented with a list of valid topics (cascading filters).

Access 2010 can handle multi-valued fields such as Topics, Categories and SubCategories but it cannot handle 1000 concurrent readers. JavaScript and XML can handle the users but chokes on data sets over 1000 records. SharePoint can also handle the users but chokes on content over 2500 records. SQL in theory can handle both the users and dataset but I cannot figure out how to allow a record that has multiple Categories, SubCategories and Topics.

Anybody have a suggestion?

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Last edited by Hopwin on Mon Jan 27, 2014 11:45 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 11:38 am 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
Oh and I apologize if this is retarded, the answer feels like it is simple but my brain is derping out on it.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 11:45 am 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Code:
tblPages
--------------------
PageID int NOT NULL identity seed bla bla as your PK
PageURL varchar(255/max)
PageDescription varchar(255/max)
PageName varchar(50)

tblCategories
--------------------
CategoryID int etc... as your PK
CategoryName varchar(50)

tblSubCategories
--------------------
SubCategoryID int etc... as your PK
SubCategoryName varchar(50)
CategoryID int FK to tblCategories.CategoryID --(assuming 1:M with Category)

tblPageCategories
--------------------
PageCategoryID int as your PK --(I like an identifier on bridge tables)
PageID int FK to tblPages.PageID
CategoryID int FK to tblCategories.CategoryID

tblPageCategories is the bridge table, you could use the same thing for topics.

I apologize in advance if I misunderstood the question.

_________________
Image Image Image Image Image


Last edited by Vladimirr on Mon Jan 27, 2014 11:55 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 11:50 am 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
So on tblPageCategories wouldn't I still be constrained to only have one CategoryID (FK) per PageID (FK)? It looks like I would need to create an entry for each possible combination of PageID and CategoryID.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Mon Jan 27, 2014 11:53 am 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
So on tblPageCategories wouldn't I still be constrained to only have one CategoryID (FK) per PageID (FK)? It looks like I would need to create an entry for each possible combination of PageID and CategoryID.


Your second sentence is correct.

Code:
PageID 1
CategoryID 1

PageID 1
CategoryID 2

PageID 1
CategoryID 4

PageID 2
CategoryID 5

etc...


At your own peril, you could go with the denormalized approach of:

Code:
tblPages
--------------------
PageID int NOT NULL identity seed bla bla as your PK
PageURL varchar(255/max)
PageDescription varchar(255/max)
PageName varchar(50)
Category1 int NULL
Category2 int NULL
Category3 int NULL
Category4 int NULL
etc...


Or even the even more bad:

Code:
tblPages
--------------------
PageID int NOT NULL identity seed bla bla as your PK
PageURL varchar(255/max)
PageDescription varchar(255/max)
PageName varchar(50)
Categories varchar(255/max)  --comma-delimited string of CategoryIDs


I've seen both implementations in production but they're harder to maintain and somewhat of a nightmare for query performance.

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:01 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Actually, now that I'm thinking of it, you really need to join PageID to SubCategoryID. It sounds like SubCategoryID is the thing that'll really tie to the page. Right? Or can you associate a page with a category AND a subcategory? If so, you probably should rethink the data structure.

_________________
Image Image Image Image Image


Last edited by Vladimirr on Mon Jan 27, 2014 1:14 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:02 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
So if I went this route I'd need 10,000 x 10 entries into the table to account for the categories, then 10,000 x 30 to account for subcategories, then 10,000 x 30 to account for topics? Or would it actually be permutations of like 10,000x 10^25?

To answer your question Category is independent of SubCategory is independent of Topic.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Mon Jan 27, 2014 12:07 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
So if I went this route I'd need 10,000 x 10 entries into the table to account for the categories, then 10,000 x 30 to account for subcategories, then 10,000 x 30 to account for topics? Or would it actually be permutations of like 10,000x 10^25?


No, you wouldn't need all those rows. Just the ones where there's a match.

Think of the bridge table as a grid. For our previous scenario, let's say you want the four relationships listed below.
PageID 1
CategoryID 1

PageID 1
CategoryID 2

PageID 1
CategoryID 4

PageID 2
CategoryID 5

Code:
    P A G E
    1 2 3 4 5 6
C 1 X
A 2 X
T 3
E 4 X
G 5   X
O 6
R 7
Y 8


All you need is four rows, not forty-eight. Unless every page is in every category.

_________________
Image Image Image Image Image


Last edited by Vladimirr on Mon Jan 27, 2014 12:17 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Mon Jan 27, 2014 12:10 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
To answer your question Category is independent of SubCategory is independent of Topic.


You might want to rethink that design, and relate back to the page on just a single tier of categorization (in this case, subcategory). Otherwise, things can get ugly.

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:13 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
So for example (since I work for an insurance company)

Name - Vehicle Identification Number (VIN)
URL - http://www.google.com (not really)
Description - Tool to break down VIN
Categories - Acceptable, Unacceptable
SubCategories - New Business, Endorsement
Topics - Pricing, State Reporting, Add a vehicle, Point of Sale

Which gives our users access to the tool based on what they are doing at the time. This is obviously very paired back because we VIN for a ton of our processes, but gives a flavor for the types of paths that we are trying to provide them to get to the data without doing a full text search and getting back 500+ pages of processes that reference VIN.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Mon Jan 27, 2014 12:14 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
So for example (since I work for an insurance company)

Name - Vehicle Identification Number (VIN)
URL - http://www.google.com (not really)
Description - Tool to break down VIN
Categories - Acceptable, Unacceptable
SubCategories - New Business, Endorsement
Topics - Pricing, State Reporting, Add a vehicle, Point of Sale

Which gives our users access to the tool based on what they are doing at the time. This is obviously very paired back because we VIN for a ton of our processes, but gives a flavor for the types of paths that we are trying to provide them to get to the data without doing a full text search and getting back 500+ pages of processes that reference VIN.


So something can be Acceptable New Business, but also Unacceptable New Business?

_________________
Image Image Image Image Image


Last edited by Vladimirr on Mon Jan 27, 2014 12:41 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:19 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
Correct, if a VIN breaks down as a pick-up truck that's cool. If a VIN breaks down as a garbage truck that's no bueno.

Just to further complicate matters, a VIN can be unacceptable at New Business but acceptable for endorsement because we have a relationship with the customer. So a dually F-550 might be unacceptable for new business because it's a big unknown risk, but if you've been with us for four years and we have a handle on your claims history we might allow it at endorsement.

I made that particular example up but it is not far off base.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:38 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
So really it's not category vs subcategory. It's category 1 and category 2? Or Category and Status? Type of Business and Purpose? Process Step and Status? (Trying to infer from the examples)

_________________
Image Image Image Image Image


Last edited by Vladimirr on Mon Jan 27, 2014 1:23 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 12:57 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Well, either way, no matter what you call it, if you have a set of fixed values that need to be tied back to Page on a one to many basis, a bridge table is the way to do it IMHO.

What will you be doing with these values? I mean, you might be able to get away with a delimited string for something like "Topics", if it's an open ended entity that people can do a keyword search on. That way you can use a LIKE statement and still get sort of decent performance... just don't try to use that sort of field for doing analytics. If you go down that road and decide that you need more capability from that field though, it's going to be an uphill battle to convert it.

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 1:35 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
I want users to select a category from a picklist and be presented with all the valid subcategories under that category (based on how pages are tagged)
Then they choose a subcategory from that picklist to get a list of the valid topics under that subcategory (based on how pages are tagged)
Then they choose their topic from that picklist and are presented with a display of all the pages that match those criteria

Cascading filters.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 1:38 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
It lets us drive users to valid selections. Gives us the benefit of searching without full text.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Mon Jan 27, 2014 2:01 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
I want users to select a category from a picklist and be presented with all the valid subcategories under that category (based on how pages are tagged)
Then they choose a subcategory from that picklist to get a list of the valid topics under that subcategory (based on how pages are tagged)
Then they choose their topic from that picklist and are presented with a display of all the pages that match those criteria

Cascading filters.


Just so I'm clear;
For a new page, any subcategory is valid under any category, and any topic is valid under any subcategory?
And when they do a search, you want to limit categories to the ones already in use, the subcategories to any ones already in use for that category, and the topics to any ones already in use for the subcategory?

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 2:41 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
And when they do a search, you want to limit categories to the ones already in use, the subcategories to any ones already in use for that category, and the topics to any ones already in use for the subcategory?

Yeah. When user enters a category I want to query the pages to find what subcategories exist on pages where user_selection is contained within page_categories. Then I want to present only those subcategories for the next user selection.

Then I want to query the pages to find what topics exist on pages where user_subcat_selection is contained in page_subcats. Then I will present only those topics to the user for their final selection.

Then I will return the pages that contain a combination of all three of those tags.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 2:53 pm 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Well, since it's a many to many (and many to many, and another many to many) I'd still go with the bridge tables, but you might consider refactoring your data to be hierarchical. Drive the users down the right path. Even if a subcategory might exist under two different categories, it probably doesn't need to exist under all ten.

_________________
Image Image Image Image Image


Last edited by Vladimirr on Tue Jan 28, 2014 8:18 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 27, 2014 6:07 pm 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
So how would you go about it?

Page_Table
ID PageName Address Description

Category_Table
ID Category

SubCategory_Table
ID SubCategory

Topic_Table
ID Topic

Then have a table of all combos?
PageID CategoryID SubCategoryID TopicID

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Tue Jan 28, 2014 8:03 am 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
So how would you go about it?

Page_Table
ID PageName Address Description

Category_Table
ID Category

SubCategory_Table
ID SubCategory

Topic_Table
ID Topic


Yes.

Hopwin wrote:
Then have a table of all combos?
PageID CategoryID SubCategoryID TopicID


No. The normalized option would be a PageCategories table, a PageSubCategories table, and a PageTopic table, since each categorization entity has a many to many relationship to Pages, but no relationship to its siblings.

...
...

If you wanted to denormalize that a bit, you could get rid of the Categories, SubCategories, and Topic tables, and just create one master table ("Groups" because I'm bad at naming - I'm sure you can come up with a better one):

Code:
GroupID int (your PK)
GroupTypeID int FK to GroupTypes table
GroupName varchar(50)



GroupTypes
GroupTypeID int
GroupTypeName varchar(50)

GroupTypes sample data:
GroupTypeID 1 Category
GroupTypeID 2 SubCategory
GroupTypeID 3 Topic


So this way all your categories, subcategories, and topics would be in the same table, since they all pretty much do the same thing and share the same columns.

Group Table Sample data:
Code:
GroupID 1
GroupTypeID 1
GroupName Acceptable

GroupID 2
GroupTypeID 1
GroupName Unacceptable

GroupID 3
GroupTypeID 2
GroupName New Business

GroupID 4
GroupTypeID 2
GroupName Endorsement

GroupID 5
GroupTypeID 3
GroupName Pricing

GroupID 6
GroupTypeID 3
GroupName State Reporting

GroupID 7
GroupTypeID 3
GroupName Add a vehicle

GroupID 8
GroupTypeID 3
GroupName Point of Sale


Then just create the one bridge table to establish the many-to-many relationship, PageGroups:
Code:
PageGroupID int
PageID int
GroupID int


Depending on your real data, this could either make things more confusing or less confusing. It'd still perform decently, and would be readable. If you never expect categories, subcategories, and topics to grow by an order of magnitude, or change in complexity, this might be a way to go. If you expect to add attributes to only one type of grouping but not another (add a CategoryCode but not SubcategoryCode or something), then go with the first option (three entity tables, three bridge tables).

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 28, 2014 9:42 am 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
Hmmm interesting, I am trying to think of how I would present choices to users. For example if Acceptable, Unacceptable, Endorsement and New Business were flat then a user could pick a funky combination of Acceptable, Unacceptable and New Business and get the VIN Results page which would be a bit counterintuitive. Or they could choose two separate transactions that would never occur together like New Business, Endorsement and Point of Sale.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 28, 2014 10:53 am 
Offline
The Dancing Cat
User avatar

Joined: Wed Nov 04, 2009 2:21 pm
Posts: 9354
Location: Ohio
I think I've got it:

Categories Table
CatID | Category

SubCategories Table
SCID | SubCategory

Topics Table
TOPID | Topic

Page Table
PageID | Name | Address | Description

ClusterIndex:
CATID | SCID | TOPID | PAGEID

The cluster index will contain every possible valid permutation of Categories, Sub Categories and Topics associated with the Page

So then we present the user with a Category and their selection will query the clusterindex to return SubCategories
The user picks one of those SubCategories it queries the clusterindex (with Cat and now SubCat) to return Topics
The user picks a topic and it queries the clusterindex with Cat, SubCat and Topics to return PAGEIDs which ten hit the Page Table to return addresses, names, descriptions.

_________________
Quote:
In comic strips the person on the left always speaks first. - George Carlin


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Wed Jan 29, 2014 7:55 am 
Offline
User avatar

Joined: Wed Feb 03, 2010 8:20 am
Posts: 1037
Hopwin wrote:
Hmmm interesting, I am trying to think of how I would present choices to users. For example if Acceptable, Unacceptable, Endorsement and New Business were flat then a user could pick a funky combination of Acceptable, Unacceptable and New Business and get the VIN Results page which would be a bit counterintuitive. Or they could choose two separate transactions that would never occur together like New Business, Endorsement and Point of Sale.


Well, they'd be flat in the table, but you'd have to expand them in the UI. Make one dropdown just show Type 1 items, another Type 2, and another Type 3.

_________________
Image Image Image Image Image


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 197 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group