The Glade 4.0 https://gladerebooted.net/ |
|
Database Design https://gladerebooted.net/viewtopic.php?f=5&t=10730 |
Page 1 of 1 |
Author: | Hopwin [ Mon Jan 27, 2014 11:30 am ] |
Post subject: | Database Design |
So I am working on a project to organize data for my company. I have roughly 10,000 records and up to 1,000 users trying to read it at the same time. Today our navigation and content management is utter crap (basically full text search of content), so to facilitate retrieval I want to tag data by essentially building an address book but I am banging my head against how to set up the relationships for the data tags. I can't figure out how to make SQL recognize a one-to-many relationship for the various components. Here's what my "address book" needs to contain: Page - URL of a page, can be external or internal Page Description - brief text summary of content Page Name - brief text name Categories - Can be 1 or any combination of 10 distinct categories Sub-Categories - Can be 1 or any combination of 30 distinct subcategories Topics - Can be 1 or any combination of 30 distinct topics Users will pick a category and be presented with a valid list of subcategories and based on their subcategory selection they are presented with a list of valid topics (cascading filters). Access 2010 can handle multi-valued fields such as Topics, Categories and SubCategories but it cannot handle 1000 concurrent readers. JavaScript and XML can handle the users but chokes on data sets over 1000 records. SharePoint can also handle the users but chokes on content over 2500 records. SQL in theory can handle both the users and dataset but I cannot figure out how to allow a record that has multiple Categories, SubCategories and Topics. Anybody have a suggestion? |
Author: | Hopwin [ Mon Jan 27, 2014 11:38 am ] |
Post subject: | |
Oh and I apologize if this is retarded, the answer feels like it is simple but my brain is derping out on it. |
Author: | Vladimirr [ Mon Jan 27, 2014 11:45 am ] |
Post subject: | |
Code: tblPages -------------------- PageID int NOT NULL identity seed bla bla as your PK PageURL varchar(255/max) PageDescription varchar(255/max) PageName varchar(50) tblCategories -------------------- CategoryID int etc... as your PK CategoryName varchar(50) tblSubCategories -------------------- SubCategoryID int etc... as your PK SubCategoryName varchar(50) CategoryID int FK to tblCategories.CategoryID --(assuming 1:M with Category) tblPageCategories -------------------- PageCategoryID int as your PK --(I like an identifier on bridge tables) PageID int FK to tblPages.PageID CategoryID int FK to tblCategories.CategoryID tblPageCategories is the bridge table, you could use the same thing for topics. I apologize in advance if I misunderstood the question. |
Author: | Hopwin [ Mon Jan 27, 2014 11:50 am ] |
Post subject: | |
So on tblPageCategories wouldn't I still be constrained to only have one CategoryID (FK) per PageID (FK)? It looks like I would need to create an entry for each possible combination of PageID and CategoryID. |
Author: | Vladimirr [ Mon Jan 27, 2014 11:53 am ] |
Post subject: | Re: |
Hopwin wrote: So on tblPageCategories wouldn't I still be constrained to only have one CategoryID (FK) per PageID (FK)? It looks like I would need to create an entry for each possible combination of PageID and CategoryID. Your second sentence is correct. Code: PageID 1 CategoryID 1 PageID 1 CategoryID 2 PageID 1 CategoryID 4 PageID 2 CategoryID 5 etc... At your own peril, you could go with the denormalized approach of: Code: tblPages -------------------- PageID int NOT NULL identity seed bla bla as your PK PageURL varchar(255/max) PageDescription varchar(255/max) PageName varchar(50) Category1 int NULL Category2 int NULL Category3 int NULL Category4 int NULL etc... Or even the even more bad: Code: tblPages -------------------- PageID int NOT NULL identity seed bla bla as your PK PageURL varchar(255/max) PageDescription varchar(255/max) PageName varchar(50) Categories varchar(255/max) --comma-delimited string of CategoryIDs I've seen both implementations in production but they're harder to maintain and somewhat of a nightmare for query performance. |
Author: | Vladimirr [ Mon Jan 27, 2014 12:01 pm ] |
Post subject: | |
Actually, now that I'm thinking of it, you really need to join PageID to SubCategoryID. It sounds like SubCategoryID is the thing that'll really tie to the page. Right? Or can you associate a page with a category AND a subcategory? If so, you probably should rethink the data structure. |
Author: | Hopwin [ Mon Jan 27, 2014 12:02 pm ] |
Post subject: | |
So if I went this route I'd need 10,000 x 10 entries into the table to account for the categories, then 10,000 x 30 to account for subcategories, then 10,000 x 30 to account for topics? Or would it actually be permutations of like 10,000x 10^25? To answer your question Category is independent of SubCategory is independent of Topic. |
Author: | Vladimirr [ Mon Jan 27, 2014 12:07 pm ] |
Post subject: | Re: |
Hopwin wrote: So if I went this route I'd need 10,000 x 10 entries into the table to account for the categories, then 10,000 x 30 to account for subcategories, then 10,000 x 30 to account for topics? Or would it actually be permutations of like 10,000x 10^25? No, you wouldn't need all those rows. Just the ones where there's a match. Think of the bridge table as a grid. For our previous scenario, let's say you want the four relationships listed below. PageID 1 CategoryID 1 PageID 1 CategoryID 2 PageID 1 CategoryID 4 PageID 2 CategoryID 5 Code: P A G E 1 2 3 4 5 6 C 1 X A 2 X T 3 E 4 X G 5 X O 6 R 7 Y 8 All you need is four rows, not forty-eight. Unless every page is in every category. |
Author: | Vladimirr [ Mon Jan 27, 2014 12:10 pm ] |
Post subject: | Re: |
Hopwin wrote: To answer your question Category is independent of SubCategory is independent of Topic. You might want to rethink that design, and relate back to the page on just a single tier of categorization (in this case, subcategory). Otherwise, things can get ugly. |
Author: | Hopwin [ Mon Jan 27, 2014 12:13 pm ] |
Post subject: | |
So for example (since I work for an insurance company) Name - Vehicle Identification Number (VIN) URL - http://www.google.com (not really) Description - Tool to break down VIN Categories - Acceptable, Unacceptable SubCategories - New Business, Endorsement Topics - Pricing, State Reporting, Add a vehicle, Point of Sale Which gives our users access to the tool based on what they are doing at the time. This is obviously very paired back because we VIN for a ton of our processes, but gives a flavor for the types of paths that we are trying to provide them to get to the data without doing a full text search and getting back 500+ pages of processes that reference VIN. |
Author: | Vladimirr [ Mon Jan 27, 2014 12:14 pm ] |
Post subject: | Re: |
Hopwin wrote: So for example (since I work for an insurance company) Name - Vehicle Identification Number (VIN) URL - http://www.google.com (not really) Description - Tool to break down VIN Categories - Acceptable, Unacceptable SubCategories - New Business, Endorsement Topics - Pricing, State Reporting, Add a vehicle, Point of Sale Which gives our users access to the tool based on what they are doing at the time. This is obviously very paired back because we VIN for a ton of our processes, but gives a flavor for the types of paths that we are trying to provide them to get to the data without doing a full text search and getting back 500+ pages of processes that reference VIN. So something can be Acceptable New Business, but also Unacceptable New Business? |
Author: | Hopwin [ Mon Jan 27, 2014 12:19 pm ] |
Post subject: | |
Correct, if a VIN breaks down as a pick-up truck that's cool. If a VIN breaks down as a garbage truck that's no bueno. Just to further complicate matters, a VIN can be unacceptable at New Business but acceptable for endorsement because we have a relationship with the customer. So a dually F-550 might be unacceptable for new business because it's a big unknown risk, but if you've been with us for four years and we have a handle on your claims history we might allow it at endorsement. I made that particular example up but it is not far off base. |
Author: | Vladimirr [ Mon Jan 27, 2014 12:38 pm ] |
Post subject: | |
So really it's not category vs subcategory. It's category 1 and category 2? Or Category and Status? Type of Business and Purpose? Process Step and Status? (Trying to infer from the examples) |
Author: | Vladimirr [ Mon Jan 27, 2014 12:57 pm ] |
Post subject: | |
Well, either way, no matter what you call it, if you have a set of fixed values that need to be tied back to Page on a one to many basis, a bridge table is the way to do it IMHO. What will you be doing with these values? I mean, you might be able to get away with a delimited string for something like "Topics", if it's an open ended entity that people can do a keyword search on. That way you can use a LIKE statement and still get sort of decent performance... just don't try to use that sort of field for doing analytics. If you go down that road and decide that you need more capability from that field though, it's going to be an uphill battle to convert it. |
Author: | Hopwin [ Mon Jan 27, 2014 1:35 pm ] |
Post subject: | |
I want users to select a category from a picklist and be presented with all the valid subcategories under that category (based on how pages are tagged) Then they choose a subcategory from that picklist to get a list of the valid topics under that subcategory (based on how pages are tagged) Then they choose their topic from that picklist and are presented with a display of all the pages that match those criteria Cascading filters. |
Author: | Hopwin [ Mon Jan 27, 2014 1:38 pm ] |
Post subject: | |
It lets us drive users to valid selections. Gives us the benefit of searching without full text. |
Author: | Vladimirr [ Mon Jan 27, 2014 2:01 pm ] |
Post subject: | Re: |
Hopwin wrote: I want users to select a category from a picklist and be presented with all the valid subcategories under that category (based on how pages are tagged) Then they choose a subcategory from that picklist to get a list of the valid topics under that subcategory (based on how pages are tagged) Then they choose their topic from that picklist and are presented with a display of all the pages that match those criteria Cascading filters. Just so I'm clear; For a new page, any subcategory is valid under any category, and any topic is valid under any subcategory? And when they do a search, you want to limit categories to the ones already in use, the subcategories to any ones already in use for that category, and the topics to any ones already in use for the subcategory? |
Author: | Hopwin [ Mon Jan 27, 2014 2:41 pm ] |
Post subject: | |
And when they do a search, you want to limit categories to the ones already in use, the subcategories to any ones already in use for that category, and the topics to any ones already in use for the subcategory? Yeah. When user enters a category I want to query the pages to find what subcategories exist on pages where user_selection is contained within page_categories. Then I want to present only those subcategories for the next user selection. Then I want to query the pages to find what topics exist on pages where user_subcat_selection is contained in page_subcats. Then I will present only those topics to the user for their final selection. Then I will return the pages that contain a combination of all three of those tags. |
Author: | Vladimirr [ Mon Jan 27, 2014 2:53 pm ] |
Post subject: | |
Well, since it's a many to many (and many to many, and another many to many) I'd still go with the bridge tables, but you might consider refactoring your data to be hierarchical. Drive the users down the right path. Even if a subcategory might exist under two different categories, it probably doesn't need to exist under all ten. |
Author: | Hopwin [ Mon Jan 27, 2014 6:07 pm ] |
Post subject: | |
So how would you go about it? Page_Table ID PageName Address Description Category_Table ID Category SubCategory_Table ID SubCategory Topic_Table ID Topic Then have a table of all combos? PageID CategoryID SubCategoryID TopicID |
Author: | Vladimirr [ Tue Jan 28, 2014 8:03 am ] |
Post subject: | Re: |
Hopwin wrote: So how would you go about it? Page_Table ID PageName Address Description Category_Table ID Category SubCategory_Table ID SubCategory Topic_Table ID Topic Yes. Hopwin wrote: Then have a table of all combos? PageID CategoryID SubCategoryID TopicID No. The normalized option would be a PageCategories table, a PageSubCategories table, and a PageTopic table, since each categorization entity has a many to many relationship to Pages, but no relationship to its siblings. ... ... If you wanted to denormalize that a bit, you could get rid of the Categories, SubCategories, and Topic tables, and just create one master table ("Groups" because I'm bad at naming - I'm sure you can come up with a better one): Code: GroupID int (your PK) GroupTypeID int FK to GroupTypes table GroupName varchar(50) GroupTypes GroupTypeID int GroupTypeName varchar(50) GroupTypes sample data: GroupTypeID 1 Category GroupTypeID 2 SubCategory GroupTypeID 3 Topic So this way all your categories, subcategories, and topics would be in the same table, since they all pretty much do the same thing and share the same columns. Group Table Sample data: Code: GroupID 1 GroupTypeID 1 GroupName Acceptable GroupID 2 GroupTypeID 1 GroupName Unacceptable GroupID 3 GroupTypeID 2 GroupName New Business GroupID 4 GroupTypeID 2 GroupName Endorsement GroupID 5 GroupTypeID 3 GroupName Pricing GroupID 6 GroupTypeID 3 GroupName State Reporting GroupID 7 GroupTypeID 3 GroupName Add a vehicle GroupID 8 GroupTypeID 3 GroupName Point of Sale Then just create the one bridge table to establish the many-to-many relationship, PageGroups: Code: PageGroupID int PageID int GroupID int Depending on your real data, this could either make things more confusing or less confusing. It'd still perform decently, and would be readable. If you never expect categories, subcategories, and topics to grow by an order of magnitude, or change in complexity, this might be a way to go. If you expect to add attributes to only one type of grouping but not another (add a CategoryCode but not SubcategoryCode or something), then go with the first option (three entity tables, three bridge tables). |
Author: | Hopwin [ Tue Jan 28, 2014 9:42 am ] |
Post subject: | |
Hmmm interesting, I am trying to think of how I would present choices to users. For example if Acceptable, Unacceptable, Endorsement and New Business were flat then a user could pick a funky combination of Acceptable, Unacceptable and New Business and get the VIN Results page which would be a bit counterintuitive. Or they could choose two separate transactions that would never occur together like New Business, Endorsement and Point of Sale. |
Author: | Hopwin [ Tue Jan 28, 2014 10:53 am ] |
Post subject: | |
I think I've got it: Categories Table CatID | Category SubCategories Table SCID | SubCategory Topics Table TOPID | Topic Page Table PageID | Name | Address | Description ClusterIndex: CATID | SCID | TOPID | PAGEID The cluster index will contain every possible valid permutation of Categories, Sub Categories and Topics associated with the Page So then we present the user with a Category and their selection will query the clusterindex to return SubCategories The user picks one of those SubCategories it queries the clusterindex (with Cat and now SubCat) to return Topics The user picks a topic and it queries the clusterindex with Cat, SubCat and Topics to return PAGEIDs which ten hit the Page Table to return addresses, names, descriptions. |
Author: | Vladimirr [ Wed Jan 29, 2014 7:55 am ] |
Post subject: | Re: |
Hopwin wrote: Hmmm interesting, I am trying to think of how I would present choices to users. For example if Acceptable, Unacceptable, Endorsement and New Business were flat then a user could pick a funky combination of Acceptable, Unacceptable and New Business and get the VIN Results page which would be a bit counterintuitive. Or they could choose two separate transactions that would never occur together like New Business, Endorsement and Point of Sale. Well, they'd be flat in the table, but you'd have to expand them in the UI. Make one dropdown just show Type 1 items, another Type 2, and another Type 3. |
Page 1 of 1 | All times are UTC - 6 hours [ DST ] |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |