Business Problem & Model Documentation
Okay. Everybody, it. Is 12:30. And I would like to get started. First. Thing today, we are going to start with the. Quiz. Which. Is. Not. Where I expected it, to be. If. You go to. Readings. My strobus so. Modules. And. Quiz. Okay, if you refresh the page the quiz should now be visible. For. Anyone who's going to be taking the quiz. I'll. Give you a few minutes to start a class to get going on that. The content, for today is pretty, short, I mean I'm sure when you guys read the chapters you realize oh huh, straightforward, not much here. So. It's. Gonna be that and then mostly project time today. There, is a difference between the two it is addressed in the chapter. The. First name talked about, this. We'll, see yes. Okay. Ah. Let's. Get started. Hey. You know if they're in case there are so stragglers, how about I actually. Just. Mention, one other thing real, quick. The, Titanic, novel idea sharing, has its first. Idea. On it. The. Rough intuition, which I very, impressed with is. A. Group. Realized hey, there. Might be some features that are predictive, namely. What. Deck you're on and how far that deck is from the lifeboats. However. We don't have that data for, every, passenger, when. We look at. Ticket. Numbers that have what room people, are on we. Only have that data for, maybe. Twenty, thirty percent. Twenty. Or thirty percent of passengers, the, rest we don't know what that they're on. However. We, happen to have a very cool tool in data robot, that's, good at predicting things. So. Let's. Train, a new model, to, predict what, deck people, are on. So. We use the. Deck information, we have data on and then we just build a predictive model from there. It. Was a cool and interesting idea, that's just kind of there's something I want to know we, have the tool to learn it. Exactly. What I'm looking for it was an excellent suggestion I'm happy, that it. Came up so early I was not expecting it to the second project, maybe. The third. So. Cool. Beans that's an idea that somebody's, got. Now. On to the. Reading from last night. So. Far we have talked, about a. Lot. Of details. Once we get into the nitty-gritty of the model. However. We kind of glossed over how you select, what. Should be going on in the model. So. That's, what we're going over today. We. Start with a problem statement. It's. Pretty straightforward there's. A list of three criteria in, there. That. Define, what is a good problem statement is. It presented. In the language of business, does. It specify an action that can result from the project, and how. Does solving. The problem, impact your bottom line. This. Keeps what you're doing nice and grounded within the context, of business it, means you can present it to management it means you can present it to. Other. Departments because you're including, concrete, numbers, and using. The, terms they're familiar with instead.
Of Overloading, them with jargon and. It's. Showing them what, concrete, goals are. But. Ours are both goals and the. Outcome. The results of those goals so you're not just saying here's what's wrong you're saying here is exactly what's wrong here's how wrong it is by using your numbers. That you're throwing in here you're. Proposing a way to fix it and you're, saying how that fix makes things better. So. Using. The Lending Club I slightly, tweaked the last Lending Club example, he had in. The book. So. For, the first part are we using the language of business, yes. We are talking about investments. Interest, rates and losses. We're. Giving concrete numbers, for these we're not just saying we, could save some money, there's. Problems. Here with the numbers, now we're saying exactly what they are where. Domain. Experts, who, know the. Details of loans can follow. What. Are we proposing that it will do our. Model, is going to screen up risky. Borrowers. Then. We're going to use the model to justify. Investing, to. Lenders. If. I'm remembering how Lending Club works, it's a. Like. A matchmaking, site between borrowers, who show up and say hey I want this loan and lender, so I show, up and say sure I'll give you this loan. So. Being. Able to say, who. Are the bad borrowers, let's avoid them and being able to say this, person might look risky but they're really not and here's why and. Finally. Using it to choose which models to fund. How. Does that impact our bottom line we. Reject high-risk borrowers, they're. People that we were worried about or, that we should have been worried about and we, can just say nah not, giving you money. It. Also lets us increase, our profits how, does it do that there. Might be someone who looks bad but it, looks like a high risk but actually, isn't we. Can charge them a higher interest rate because they look like a bad loan even though we know that they're actually pretty good. So. Concrete. Description of the problem what. We plan to do to fix it how, fixing, it helps us. You. Don't have to write one of these for the Titanic project. For. Some of the other projects, in the course you will. Next. We're going to talk about unit, of analysis so. Before. You can make any predictions, you need to know what you're making predictions for. Unit. Of analysis, asks, your answers, your big questions, about, who. Or what you're making predictions. For. These. Decisions, will impact both. What. Data you need and, how. You design, the model I. Think. This can be kind of abstract, so I'm gonna start with an example. I'm. Gonna start with an example. So. Start. By imagining that you're a real estate broker. You. Know that, you. Need to get some new people that you want to enlist their house. You're. Going to send out some sort of mailer, some. Hey. I'm awesome, go. Work with me. If. You pick if you. Want to send out bulk mail so just I want, to send something to everyone in this zip code I tell, the post office, here's. $300. Give, everyone, in this zip code a little. Generic flyer with my name on it. Then. Your unit of analysis, should be zip code. Everyone. Within that zip code will receive the same. Decision. So either, they will or won't receive a mailer depends, on what their zip code is. For. Normal mail I want to send a postcard, it's. Got my name on it. My. Unit of analysis will be household. Why. Is it household I'm deciding, which households, are receiving mailers I'm, only. Going to send one or zero to any given house. I'm. Not gonna send one, to two people in the house and. All. Right I'm going to send a. Postcard. To two people in the same house. So. It's either the house gets one of the house doesn't there is no parts, of the house get it. For. Email, it's a little bit different my. Unit of analysis is individual. So. I'm saying which person. Is going to be likely to list with me. Once. They have decided which people, I want to send it to then, I will look up their email addresses. The. Unit of analysis, is not email. Address, here. Why. Isn't it email address because. I'm never gonna be in a situation where I am making a decision that you're. At Colorado, edu. Email address, does. Not get. A prediction or does not get a mailer but. You're at gmail does. I'm. Going, to pick who I want to send the email to and then. Once I have figured that out I'll figure out how to get it to you. The. Big thing is that everyone, within your unit of analysis, receives the same decision and. That's. What's going to be, what. Your that's how you're going to be predicting things what you're going to be using. In. Your, model so I'm, Titanic we're predicting whether individuals. Are. Going to survive or die we're, not predicting if families, are going to live or die we, are not predicting, if. First-class. Passengers. Are going to live or die yes. Those are features that go into the decision, but.
We're Not trying. To predict everyone. In this class will, live everyone, in this class will die. Selecting. Your unit of analysis can, be tricky, I'm. Not going to deny that but. It is. Critically. Important, and once. You start getting some practice with it gets. Very intuitive. Very quickly. So. There'll, be some initial hiccups don't, worry about that but you'll. Catch on pretty quick. So. Some. Of the questions. There. Was what the generic who what when, where. Questions. So. When. Analyzing they selected. My unit analysis can answer questions like who. So. I could be interested in. Users. So. Everybody, on my website I want, to come up with a decision over which. Users, should receive an, ad. For. Cabela's. Camping equipment. Customers. I can decide which of my customers. So. Not everybody on my website but of the people on my website that are also customers. Which. Of them should receive, a flyer. My. Unit analysis could be a prospective, customer. People. That are looking. At my website but haven't made that buy. Something decision yet what. Do i send to get them to change their mind. Switching. Gears I'm a political, campaign I know. That, some, voter outreach efforts, are more effective than others I also. Know that some groups are more likely to vote than others I want. To know where I should spend my limited, campaign funds. In. Order to maximize, my gain so. I look at demographic, blocks so. I'll create four. Bins for age I'll, create. Three. Bins for race I'll create. Five. Bins for income, two. Votes for voted, last election, or not and. Then. I'll create some fancy. Hybrid, of all of those categories, together. Which. Will give me what. Two, times three times five times, whatever. All the numbers I said were multiplied together it'll. Give me that many different groups those are demographic, blocks. And. Then I want to predict what. My return on investment, will be for each block. I'm. Not caring, about individual, voters there I am caring, about, white. Men. 18. To 35. That. Are in the third income, quintile. They. Will all receive the same amount of attention from me I will not care, beyond that I. Can. Also look at it from an organizational, perspective. Let's, say that I, am. Manager, of a regional. Manager. For. Kroger. King Soopers. That. Chain, I could. Be interested in which stores are performing, well or poorly I. Could. Also be interested, in which departments, are performing well or poorly so. I, could. Say oh man this store is doing great or I could say oh man all of our meat sales are lagging. They're. Both reasonable units, of analysis, trying to look for patterns within things, but. I'm going to make very different decisions based, on what I select. They. Answer different business problems. If. I'm finding out that all of my meat departments, are suffering, my. Solution, will be contact, a different, distributor. Of meat. If. I find out that stores in Longmont. Are doing poorly my. Decision might be let's shut down our Longmont store I. Could. Not reasonably, make a decision, that let's shut down the Longmont store because, our dairy section is doing great.
There's. No meaningful, connection. Or information, there. So. In addition to who you're caring about you also need to think about what. You are caring about when. Determining unit of analysis so. I. Are. In a store I want. To know if my customer, will buy anything I, want. To know what my customer, will buy I, want. To know how many of, something my customer, will buy I want. To know how much they will spend so. I can allocate, oh that means they could buy four of these two of those or, one. Of these six of those. Or. I want to know if a customer's likely to change their habits. These. Are all, different. Things that we could predict they are different targets. But. They, also entail. With them that. We must be changing, our unit of analysis. If. My unit of analysis, is whether a customer will buy anything. Customer. Is. A good unit of analysis. If. I want to know. How. Much a customer, will spend. Customer. Is one possible, unit of analysis. But. It might be more reasonable to say transaction. So. I, go. To King Soopers four, or five times a week. They. Have donuts and Red Bull don't. Don't. Judge um. They. Could model in a customer, level how. Much is James going to spend this month. They. Could also spend at the transaction level how. Much is James going to spend today. Those. Are both valid units, of analysis but they. Solve they address different business, problems. Similarly. What. Will a customer, buy yes. What, am I going to buy is certain, I'm sorry now how many units of a product. Will custom provide. Customer. Isn't a great answer there you need the intersection, between customer, and product. Because. How. Many of X is James going to buy is going. To change whether X is Red. Bull, Brussels. Sprouts. Or. Car. Shammies. Trying. To keep it at the customer level would, only let you predict Oh James. Is gonna buy seven. No. Matter what it is we're talking about, similarly, feeling from the product level and we. Say, anyone. Who buys Red, Bull will buy four of them that's. Not going to help us either so. Our unit of analysis there is the intersection, between a product and a customer. And. That's for, how. Many units of a certain product. The. Finally, time, is an interesting, issue. Because, when will a customer make their next purchase interesting. Question, definitely can operate at the customer level, willow. Customer make a purchase in the next two weeks sure. Again. Customer. Level am I going in two weeks uh. Yeah. I should I look at these all of these are customer example, or would, be the customer level. Oh I, do it. For. This, one here instead of how long will the customers contract, last if it. Was how long will a customer, have this. Device. That. Would be an interaction, between customer, and device. Think. Verizon. My. The. Family plan that I've got has. Nine. Devices, on it amongst, the four of us, three. Cuz our brothers in the UK now. So. I. Have. A phone my mom has a phone and an iPad my dad is a phone and an iPad. There. Is also a. Wireless. Jet. Something. Basically. It's a Wi-Fi. But. Local Wi-Fi think it's a cellular signal so. We've got all these different devices. How. Long will my, family. Have that device is a. Family, to device, you. Need both. Which. Is different than family. To contract, how long are we under contract, eternity. That's how cellphone contracts work now, always. Have worked I suppose. And. It's. Important. And. Again one of the biggest things I want to stress here for unit of analysis is, that, there are multiple right answers and they will depend on your business problem. So. Let's, go through a few examples see, if you guys can't tell me what you think the internal analysis is, so. I am a city government of some sort, mayor. Town council whatever. We. Have the question, is. Our Airport competitive, with other cities. What's. The unit of analysis going to be here. Exactly. We're. Going to have, but. We're trying to, predict. How. Bad the delays are on a per Airport level.
So. Airports. Our unit of analysis, that'll, let us say hey our Airport is fifth from the top awesome. We're doing pretty well or. We're, 12th, from the bottom oh man. Arsen is the solution. Next. Let's. Consider I'm, an employer. We've. Got some fancy system for reimbursing employees, who go on business, trips. However. We want to make sure that. For. Simplicity sake, we're only going with a handful, of airlines, there's. Only a few different, people, they can book with just. So that we're being one giant block and we get a discount for that. What's. The end of analysis here. Bingo. It. Is airline, how. Many people just have two slides open right now and are clicking next one step out of me okay. So. The. Unit of analysis is airline, here because. Yes, there's lots of delays and you're figuring that out but you're trying to figure out which airlines. Are, really. Bad about delays, you. Don't care what individual, planes do or what individual, routes or tickets have you care, about in general, does this airline suck. I mean. Yes but does it suck worse than the others. Now. You're a mechanic, you work for the airline. Which. Planes need the most maintenance time. Union. You know that you're going to have a bunch of crew that you have to sense. Different things you, have to budget their time. What. Are you going to choose to budget based on. You're. Probably not gonna want to select individual, planes you're. Not going to say oh man this, plane. Always. Has problems, with. The. Fuel line freezing. Instead. You're going to say this. Model, of plane tends. To have problems with a fuel line freezing. Therefore. Whenever you get one of those planes you know that. You need to. Allocate. A little more time a. Competitor. Can be interested, in what routes. Have. A lot of delays, because. Oh man if all the other airlines are screwing. Up Denver. To Dulles I. Have. A chance to do well there I'll just add more flights that go that route and make sure my delays aren't bad then. I become the attractive one for that Denver, to Dulles route. If. You're an air traffic controller, you might want to know is. Thunder. Or snow a bigger source of delay. It'll. Help you plan and mitigate it'll help you announce delays further in advance or no one need to get extra staff on. And. It's. Possible, you love combinations, of any of these especially, things like which, weather is worst, I'd.
Imagine, That that, that. There are people who would be interested in. What. The model of plane is combined. With the weather. So. You're a mechanic, you know. Hey. These planes are really bad in the heat. The. Point of this side is. There. Are lots of right answers and they depend on your business problem, I couldn't, just say what's the unit of analysis and, give you a data set you, would not be able to come up with a unique answer to that. And the, last thing I want to go over is. Target, selection. We've. Talked a little bit about this before in the types and goals of machine learning. But. I'm just gonna bring it up again real, quick because it's a good refresher to have. Classification. You're. Trying to predict a category. Binary. Is when you have exactly, two categories. Usually. It's a does exist doesn't exist did happen didn't happen. Good. Idea bad idea. The. Other kinds of classification. I. Technically. Was a little sloppy there is multi class and multi label. The. Only difference here is how many are allowed to have, so. Multi class means, I have more than two groups but each person is only in one group. So. Is. Your grade an a a b, a c a, D or an F. You. Cannot have an ad grade. In. A specific class. Multi-label. Classification. Is. For something where you. Have you. Can actually think of it as a series, of binaries. Because. You, have multiple, different labels, and a specific, group, can have more than one so, is this movie a comedy, a drama, a historical. Fiction a fantasy, a sci-fi. It. Is possible, to have a comedy, sci-fi movie. That. Answer has multiple, labels so it is a multi-label, classification. And. As. I said you can think of it as it's two different binary classifications. Is, it a comedy true/false, is it. Sci-fi true/false. So, I thought. It is in, fact possible to have no it doesn't meet any of my groups. The. More groups you have the less likely it is but. That's. How multi-label, versus multi class work. Regression. Is when you're predicting, a continuous, value almost, always, a number. Most. Of the time regression, is going to be predicting, a value. How. Tall is this baby going to be when it grows up. What. Will this person's, credit score be in six months. How. Much can I get away with charging, this person, before, they will stop being my customer. The. Other thing you can wind up with is using, aggression to predict a probability. So. You can say I am 80%, sure this. Person, will survive I. Am. 22%. Sure. This. Person, is, going. To buy something. That's. The only difference it. Mathematically. They're mostly the same thing. It's. Just what you're trying to predict just ever, so slightly different, conceptually. One. Other thing, that this chapter covered, that I think is important, so I want to go over it is. How. Data. Collection, can interact, with target, that. Was selecting your target. So. I'm, gonna go with an example of. Divorce. I'm. Sure everyone. Here has heard that half of all marriages end in divorce. Turns. Out that's a statistical, fallacy. And. Let's. Go over why. How. Can a marriage end. One. Or more partners dies. Or. There's. A divorce. There. Aren't a lot of other ways to get out of marriage. On. Average, a divorce will happen eight years after the marriage or at least a first, marriage, will. End eight. Years after if it's gonna end in a divorce. The. Average length of a marriage that doesn't in it ended, in a divorce is about 50 years. So. When. We generate nine, marriages, here, I've. Got, in. Red. Two. Divorces. One. Two I. Have. Two successful marriages. Because. Being somebody died, one. Two, when. I collected, my data in 2018, I have, five more marriages, that. I don't know how they're gonna end now. That's, five, people, that I have incomplete, data on how. Can i categorize, them how can I label, them as a, successful. Marriage or an unsuccessful marriage I.
Could. Call them successful, because. Well. They haven't ended yet. Or. I could. Call them failures. Because I don't know that we'll end with somebody, dying, a. More. Common solution, or generally. The preferred solution. Is. Throw. Out data where we don't have an end condition, on. So. You. Have a 50/50, here. Instead. Of trying to say it's 7, & 2, or. 2, & 7 we're saying it's 2 & 2 and then just saying we have 5 we don't know on. Now. This. Does come with its own set of problems. Most. Notably we've thrown out most of our data. But. Beyond that we've also thrown out. Stuff. That like, this looks really, long, if. They got married in. 65. Or so well they're still married in 2018. Their. Odds of divorce aren't, super. High. So. I've. Thrown out something that probably could have been useful similarly. With this couple that got married mmm. 82. I'm. A little less certain for this 98. And I, have no idea for these 2009. 2011. S but. I'm. Throwing out data that's probably useful. Now. That. Is not my deal thing, you generally, don't want throw out useful data however. You don't if you really don't know it. Can be tough. If. I'm remembering correctly example. From the book was, is alone going to default. They. Had paid, on time paid on time paid on time paid on time and then, they had 15-day windows for how late they were before. It reached once it's past 90 days it's in default. So. That was how they labelled the data, now. Anyone who had paid on time for the whole thing and then they were done, great. Anyone. Who had defaulted before our cutoff date well we know what their category is, how. Do we classify someone, that, is twenty-two. Days past. Due, they. Haven't defaulted, yet they have 68 days left. And. The answer is that's exactly how. This is a problem and exactly we need to worry about. When. Trying, to figure out how. To select, data for, or. How to create our targets and decide what we're doing. This. Is the problem we come across. It, also entails the messy complication, of. Refer. The loan data specifically, you. Won't get your most recent data because. Those. Freaking solution is you'd say well, I ignore. Anything that I'm, currently. I'm sure of so. When I get rid of all the Blues here I'm. Getting, rid of anything that's currently on gone that. Means that none of my predictions apply, to anything, I currently have. So. I've just created, a model that I know doesn't, apply to anything I have that sounds bad. Unfortunately. There aren't a lot of great solutions to it it's. Just something that you need to be aware of and you need to try different things to get around, so. Pick. An earlier cutoff only use data that is concluded. Or. Pick only include a that's been concluded pick an earlier cutoff so that you know what the final goal is even if you only have partial data. Try. Putting on going in its most likely category. Try. Creating, a. Multi. Class classifier, that has. The two classes you care about plus ongoing, so. That you'll know. These. Are things I think lended divorce these are things I think will end in death. These. Are things that I'm pretty sure are still going on so my, models going to be bad. You. Need to take all of these approaches, so, that you can be, robust, against, the kinds of errors this creates. So. With. That. We're, good for our contents, today. Now. We're going to go and play, around in data robot and. All, tricks and work on the Titanic project and. My. Understanding, is there's a number of questions about that and. There's. Some confusion and some stuff that isn't working so we're. Gonna work through all of that. So. Get. In your groups one. Around do not leave. Per. Team. I don't. Care who submits. Each. Team, is responsible for the four parts of the assignment. But. Once. You have created a team it should just have all of the leaderboard together as one.
So. Once you've joined the Kaggle team it should all count as one thing. So. How. About you, come. On up. And. We'll. Take a look at it. Yeah. Seems. Roasting, a murder. So. Is. So. You're unable to ah. That. Could be it. Yeah. That would be nice gesture. Yeah, see. If Chris, has the option to do it. And if not it might be that you, need, to. Just. Unable to merge with other geniuses competition, so it might be you need to add him to this instead of working the two teams I. Do. I'm. Pretty sure that the team leader should have an option to. Should. Have some more options here that'll do that. Cosmo, you, had a question, at the start of class that I thought was pretty good, let. Me. Sow. The mustard Cosmo, asked. Was. I'm. Confused, about the exact flow going on here. So. The. Rough. Overview, of what we have is. Tagle, has, some. Files in it. Most. Important, are train and. Test. What. You are going to do is. Open. Up all tricks. Take. The Train. Transform. It. This. Transformation, will include, feature. Engineering. Imputing. Missing data. Anything. Like that, anything. You can do in all tricks to get more, information out of the data you have, you. Make all those changes. Then. You go to data robot. Where. You take your transform. Stuff. And. You generate, a model. The. Model will be here's. My predict who survived who didn't using. The transforms, training data. Once. You have this model. You. Then take the test data from taggle. You. Apply the same transformation. So. This will create new features. It. Was a few missing data anything like that. So. That you will have the exact same columns. Here, that you do in the transform version here. And. Then you will use the model. To, predict. From. The transformed. Test data you, will, use, the predict function, to. Make a prediction of what your data is. So. 1. 2. 3. 4. You. Will. Then take this prediction. Upload. It. That. The. Predictions, that you generated are what gets uploaded to Kaggle. This. Is generally, speaking, what you're doing in your process. Now. Last. Class we, talked about. Things. You can learn from your model. So. We talked about feature, impact, and, variable. Importance, and that sort of thing. There. Are two things that you can do with that information. One. You. Can use it in your write-up. Because. The write-up requires, you, to. Say. What, features are important.
What, Is highly predictive, that. Sort of thing that's. A component of your write-up so having feature impacts, and variable importance helps with that. Being. Able to justify, your positions, and justify, how you said why, does sex matter why, does age matter. That's. How, you'll be using the future important stuff. That. Is the way that everyone, in the class is going to be using those sorts. Of information. Some. Of you who choose to do so, may. Also. Add. An. Intermediate, step here. So. Instead of you. Make your model and data robot, and, then, immediately start. Transforming. Your test in predicting you. Can also say. These. Are the good features from the model I. Am. Going to make, model. To. So. I'm going to, using. The same transforms, data, instead. Of using all of the columns I'm only, going to use. Three. Or four or five very predictive columns or. Highly impactful columns. You. Can also, say. Oh man this model taught me something I want. To go back to Alteryx, make. Some new transformations. And. Then. Make. A new model from, those, so. If you find out that. Name. Is highly predictive, and, then. The reason name is highly predictive, is because. Title. Is very good so it says mr.. Master. Miss. Colonel. If, it finds those are the impactful, parts of name. Then. You could go back into Alteryx, make. Some new transformations. Where. You would say let's. Make a new column called title, and, then. We can impute that data, for people who don't have labels on. Or. We, could say. Make. A new category, for. Professional. Titles versus. Titles. Related to, age. Or. Other things. Like that based, on the insights, we gained from the first model. That's. Not required for this project but, it is one way that you can use the feature importances, variable, impacts. The. Texts. Off the word clouds it's how you can use all of those to refine your workflow. So. Everyone's. Gotta use. It in art there write up anyone. Who wants can use. It to refine, their, transformations. And Alteryx. Which. Entails, of course making new models and that sort of thing. Cool. Yes. Okay. Could you come up here and login to your data robot. Yep. You can be logged in to multiple occasions at once. One, other question that I've had, a few people ask. It's. All the same um. So. Let me log out. Yeah. So it basically once you log in I want you to show me would you yeah what's going on. So. A. Quick, question how. Many groups have submitted, at least two. Predictions. How. Many groups have submitted at least five. How. Many have submitted at least ten, how. Many have submitted at least twenty. Cool. I, am. Looking, for. Hopefully. Between. Ten. And twenty. From. Each group. As. For. The number of submissions I am, expecting, the number of submissions with improvements. To, be somewhere. Around. Three. Or four. Why. Do you discrepancy. Here. Data. Robot is very good at what it does so. You're probably going to get a pretty high score pretty quick. However. Comma. There. Is still utility, and value in trying things that you think, are might. Be good even. If their numbers aren't quite exactly. The highest you have access to, because. It's possible there's problems, in something so. You can go ahead and submit well, this is my third best, model on my leaderboard. But. I. Still. Want, to try it and submitted. On catechol. So. That's, why I'm looking for. 10. To 20 submissions.
With. 3, 4, maybe 5 that. Show improvement. You. Guys do have until Friday at 5 p.m. so. With. 10 submissions, per day as the, cap per, team. That. Gives you, stays. Monday is over 10 20 30 40 you. Will either have 50. Or 60, I don't know when it resets. Because. I don't know grach meantime off the top of my head, um. But. You have plenty of time to make all these submissions. So. Y'all, didn't good yeah did you want me to come up here so. I. Mean. That's, kind of the point here is to get our, assuming a lot of people are having the same problems, okay so that's, what I want to show everybody up here. At once so what. Have you got okay, so we. Created a we went all tricks we created a row. Called. Title where, we parsed four different, titles and. Then we put it in here and it. Came out as categorical. Even. Though we, were messing with the. Datatype, and all tricks to try to change it to try to get it to text and we. Want it to be, text. Type type, because, that, we can go to text. Mining. And insult their and insights but we, don't have that right now and we can't figure out how to get that okay. I can show you that one, so. If, you, are interested in changing a data, type from, what, Alteryx. Tries to figure out on its own. You. Can go and click this arrow label, to create a new feature and. Change. Var type, and. Then. You can go transform. To in this case we want text. So. That I'll just create the feature and now you have a, new. Thing, once. It stopped being created. Title. Text, and now. Once, you've got this you, are going to want to create a new feature list, so. You'll select the features that you care about. There's. Probably a select all button. And. Then. Type, in create a new feature list. Title. Text. Create. And. Now. You can, run, autopilot, on a different feature list and select. Title text. Push. To restart. When. You click push to restart, it will not get rid of existing, models you have, it. Will stop running anything that's currently working on but, it will not eliminate, existing models, so once.
You Get to your models tab, you. Can see all the stuff with informative, features is still here. You. Can then within your leaderboard. For models select. Only ones you're interested in so title. Text is what I care about and. It. Shows me the status on the new feature list we created. This. Process, right here of creating a new feature list is, probably. Going to be your easiest way, to use. The, model. Insights, tab. Just. Select only the features you really care about so. Instead of going to all tricks and only uploading some of it. If. You just want to say show me the same thing but with only three features. You. Can go into your data tab check. The three features that were best create. A new thing called top three and run, autopilot on that so. That way you don't have to leave data. Robot, in order. To make the next round of changes here. It's. A similar thing that we do in all tracks except this one was we. Have. Titles, and then we Bend for, different age groups, we're. Just kind of curious when we put, some of that data. Robot. How. Can we. Use. The. Models it gives us to see how. To relate. To each other predicted. Like I guess, I get it it. Gives us some graphs and stuff on how. Predictive, each bin is but it's hard to tell how we can, apply. That. Okay. So. The. I. Suspect. That is what the hot spot tool is supposed to be for but as I admit it I don't super know how that works. But. You can within. Where. Is it. So, you know what I'm gonna say now that I look at this that. Would be a great thing for you to do for the novel idea contribution, look into a good way to show interaction. Between variables. I, have. A few ideas if you want to message me on slack about those. But, that's a good enough idea that, you guys want to stake a claim to that go ahead and do so. So. As. Long. As you're not doing the make, new models and transforming again that absolutely is a great way to do it, simply. Because. If. You upload a new data set it creates a new project you, can't have multiple or. You can have all the ladies that's on a single project so, if you just want to go and create here's, a thousand, different fields I upload. Them all at once and then. I select here are four fields at a time or here twelve fields at a time. That. Is absolutely, a reasonable, approach to, go for the. Biggest thing I would caution you on if you take that approach is. By. Default, it will do something called informative, features as what it selects to build a model on. That. Informative, features takes, into account. Interactions. To some degree and if. You're just uploading, what happened if I binned, aged with, three five and ten bins. Then. The impact, of two. Of those options is going to be almost nothing so. You. Can't, rely on the informative, feature selection, if. You. Do something like that but it's still a good approach and. If you want to have that as your novel contribution, show, a comparison, of how you did that look at some of the, differences or how you overcame the informative features being lacking that sounds like a great novel idea you, can go on, the PowerPoint and clean it quick. Sure. Come on up here login. Nope, you can plug in multiple places not gonna change anything. For. This project it's not going to matter for future projects at will. So. You want to know how to add features here. Ok. Ok, so. Once. You are on the data, tab. If. You, want to make new features or change features, in some way. You. Will go and select. A little check, boxes. Next. To features that you want to use together so. Let's. Just make. Yeah. So it's by, default it's sorted by importance, so. Let's go. Prefix.
Fair. Cabin. Class. Just. That. Will give you the option to create a new feature list once at least one of them is checked you can create a new feature list and call it whatever you want. So. Go ahead and name that something. Okay, and then click create. So. Now you have a new feature list, that. Has importance, relative, to that list and. Gives. You the information about it to. Go back and, look. At things originally, you'll go to feature list and then say all features. And. Now. If you want to make some transformations. You. Click this arrow and change URLs let's see that's not going super interesting, for. Ya. So let's say passenger. Class right. Now that is listed as numeric. Seems, reasonable there's 1st 2nd 3rd however. It's possible. That. We. Are interested. In or. Maybe it's not a linear, relationship between first, second and third class. Maybe. Its first. Second. Or, maybe the gap between first and second is big if that second third is small or something, like that so. I would change that to a categorical, I'm. Selecting. Treating an is missing because I know that there's some, stuff. There and. Now. I have a categorical. Feature out, of here. And. As p class is categorical and let's. Also look, at. Fair. Fair. Is numeric. Maybe. Fair, isn't linear. Maybe, the. Relation should be something else so. Let's. Look at fair squared, so. The more you pay the more it helps that you paid more and, we. Can add another one F. Of fair Oh. Cuz. Some feature equals, oh. It. Doesn't have a health button right here well. Nevermind don't bother trying to make your own function. But. Maybe fair, is used on a squared scale. And, I believe the reason it's blocking out log is because there's zero. Here. But. Something. Like that, I have to see what else do we have it's age. There we go. Let's. Add log of age I have. No reason to think log of age is actually gonna be useful but, log, of age and age squared, and. Now. I can, create a new bizarre, feature list. That. Includes. Three. Kinds, of age. Two, kinds of fare. The. P class I actually care, about and. Sex. And. I will call this feature set. Nonsense. Because. There, was no sense but behind what, I was doing. And. Then. Run autopilot, on a different feature list. And. I, could go and run it on nonsense, or the class and figure that you did but. I will not do that because it interrupts need currently running models, oh. I. Don't buy this finished it sure I will run it on something I'll. Run on nonsense, just. Because someone, foolishly. Trusted me to click buttons. So. That is how you go about transforming. Features, within data robot and, how. You go about. Using. Those transform features to make a new feature list and, with. Your new feature list you. Can run the models again. Okay. In. Order to log it out click on this little person button and, then sign out. Did. You have a question. Yes. Your question earlier all right. So. If. It was actually, using just name then. Name would be terrible, for that reason. However. Because. Or. If you changed name from text field to like a categorical, or something it. Would have that problem and it would fan out however. Data. Robot will say this is text let's see what features I can pull out of the text automatically, so, it'll break it into separate words and it will. Get. So it'll pull titles out and. Other. Things like that which. Is why name winds up being good but, not perfect because.
It Looks at parts of it in addition to looking at the whole. You. Could try both and see which one is better. I. Know. In the sample dataset that I've shown in all of my previews. I did. Not check name. I. Mean. It was including the set by unchecked it because. I pulled out title and I pulled out spouse. If. You'll notice some of the names have. Another. Name in parentheses after it that is the spouse. So. Are you traveling, with your spouse, could. Be a reasonable, feature to impede or, to. Engineer. So. Come. On up here. You've. Asked a question that should be on video so, come on up. So. Go ahead and log on into cackle. Once. He's done will log in and look at it I. Don't. Know I don't know. Like. Oh. Well then could Google. And. Then. There should be a another. Gal. So, go to the Titanic page. Yeah. And like try that little busted anyway. Okay. Scroll. Down click the. Data. And. Scroll, down. And. Out exclaims they are, so. If, you are confused about what a field is in your data. You. Can go to the data tab in the cattle competition, and. Scrolling. Down you will find descriptions of, what they are so. What is sieve SP. Number. Of sibling or spouses, you have aboard the Titanic. PR. CH the number of parents or children you, have aboard the Titanic. So. Now. We've got a. Decent. Way. To know what our new data is you can go and log on. Come, on back up I'll wreck it. And. Eat a robot right. Inside. Its. Text. Mining and then so here's effect. And, it just has a skin no idea what it was uh. As. Hovering. Over doesn't tell us anything. And, the strong. So. All this is, is, it's a relative, weighting I don't. Know what its relative to but. Don't. Think, of it as having any terrible. Meaning. Beyond. Big. Positive, numbers. Good. Big, negative numbers bad numbers. In the middle less predictive. So. Don't worry too much about exactly. What they're supposed to represent. Stairs. No. That's, not you you got familiar it's me, meaning to keep my mouth fix. I. Have. Another question, for everybody or. For, 'litham I suppose. So. For, a number of models submitted, we've got one, or two for most groups. Submissions. Are a big part of the grade another. Big part is to write up how. Many groups have started the write-up. Yeah. No no no your, write up is one for the. Whole thing. How. Many of you have. Alright, know the answers so, far only one group has their novel idea up and I think two more have ideas for what they want to do. Okay. How. Many of you have, your. Alteryx workflows, ready. To. Go and submit for. The. Models that you have used or. For, whatever the best model you've used is. Okay. So. Do. Remember there are four. Parts of this assignment that are based on the grade or, that your grade is based on. You're. Gonna want to make sure you allocate, time for all of them instead. Of trying to pull, them all off Friday afternoon. In. Particular. The. Novel idea thing you. Can get, going very, early on you. Could. Do it before you have a model that's great or all of it works and. The. Write, up you can actually get, started on as soon, as you've started running some models, because. You'll have some notion. Of feature importances. Or, variable. Impacts, or things like that, so. You can get started on those even, though you don't have your final submission ready do, not wait until you, have your, top score to do the other parts of the assignment.
2018-03-17 16:03