Solutions Tutorials: Daylight API Setup

Solutions Tutorials: Daylight API Setup

Show Video

hi my name is William Wood Harter I go by wood  I'm a Solutions architect for Luminoso and I'm   going to go through how to use the API daylight  has a really robust user interface where we can   explore our text information and find all kinds  of amazing insights it's only half as powerful   as this amazing API and anything you do in the UI  you can do in the API and there's many many things   you do with the API that you can't do in the UI  so it's worth exploring and you need some minor   python skills to actually do this well the first  thing we're going to talk about is how to set up   with the API documentation looks like we're going  to install a virtual environment with python we're   going to talk about using the API and create a  little notebook to do that first up is the setup   so there's two forms of documentation the first is  the core API here it's available to anyone under   daylight.luminoso.com API V5 and here are the  topics here and this part has the endpoint points   that you'll use things about projects creating  projects documents how to get documents how to get   the terms how to get the concepts and the drivers  and the sentiment and the rest of it is a lot of   user navigation when I talk about those today  maybe another episode or something and the most   important ones on this are filters and concept  selectors and probably have a whole session   on that as well if you look at how to create a  project again this is just a restful API you can   call it with any language we have a python library  that I'm going to talk about next but in general   I've used this with JavaScript Java of course  Python and to create a project you call Post on   API V5 projects you're going to give it a token  which we're going to talk about next you give   it the name of the project and the language of the  project that'll create the project and then you're   going to call upload upload a bunch of documents  and then you're going to say build the project and   we'll go through that process at some point today  we're just going to look at an existing project   the next step is the python client and you  can find that here it is under Pi Pi under   luminoso underscore API pip install luminos  or underscore API the next step is to talk   about the installation and you need any python  between 2.7 and 3.9 the way you do is create a   virtual environment we're going to activate that  environment that's how you do it on a Mac this   is how you do it on Windows then we're going to  install a couple of modules luminos API and then   we're going to install a couple of modules Lumina  so underscore API and this URL lib3 there's been   some issues with compatibility on Mac an SSL  library or something so I just backed it down   to the version 2.0 has some issues on every Mac  and this will probably fix itself eventually   so the next step is to show you how to do that so  I'm going to create a virtual environment python   minus m v e and V the EnV and there it goes  and it creates it this creates a folder with   a whole python inside it and then I'm going to  say to activate that Source dot slash v e n v   slash bin slash activate and I run that  now I can see I'm running that virtual   environment pip install luminoso underscore API  and then pip install urlib3 equal equal 1.26.6 .6 and that installs that now the next step is to  get get set up on authentication how do we talk to   the API with our account how do we get in what do  we give it and that is going to be with a tokens   if I go over to Daylight here I'm looking at a  project here and I choose my name on the right   upper right and I choose settings and I go to  tokens here I'm going to generate a new token I'm   going to create the token I'm going to give it a  name API training I'm going to type in my password and then I say create token this is going to  show me the token one time I copy that to the   clipboard and I have to save it that's going  to be the important part you save it the way   you do save this is the easiest way is to say  Lumi this came with the luminos or underscore   API module Lumi save hyphen token and I give it  the token that I just copied off the clipboard   press enter and it's going to save that token  in your home directory slash Dot luminoso   and there's a file called tokens.json so all your  tokens will be saved in there the python Library  

will actually use that file so you don't have  to save it in your code or create an environment   variable or any kind anything like that there are  ways to give the token use environment variables   there's many ways to use it but this is probably  the easiest way to go if you don't want to go   through that whole process you can simply say  a Lumi hyphen save hyphen token and it'll ask   you your username and password at that point it  will generate a token and it'll save it in that   file that's the other way to do it so once you've  got that token saved you can start talking with   the API first things first this might be a way  to connect if I want to put my token into code   I can say client luminoso connect give it a URL  in this case the URL is https if you don't have   the S here it's gonna it's gonna give you some  strange errors and it might take a little bit   to to figure it out it just says hey there's no  endpoint here and you're saying oh I did put in   the right thing but it's not right because  it's not talking to the right web server so   https daylight luminos.com API V5 is the root  URL for the API you pass that to the luminoso   client connect function and you give the URL  and the token if you don't give it excuse me   if you don't give it the token it will go and  look in that tokens.json file again here's how   you save the token or you can use your username  and password if I want this is something that I   do a lot of if I parse the URL I have a little  function here I'm going to copy and paste that   into our first notebook first things first  though I'm going to copy this code I'm going   to go over and create a new notebook and say new  python notebook here it is call it API training paste that luminoso from luminoso get the  luminoso client and the next thing I'm going   to go back over to my slides I'm going to go  find that code for the project URL I want to   use and then I'm going to print out all of those  values you could write your own parser I'm going   to put the project URL in there I usually do these  in separate functions here in general I'm going   to use a different one here because I don't know  what that one is so what we're going to use today   is the vitamin gummies project that's the one we  were looking at before we went and got that token   I'm going to go back a little bit here and the  URL has it doesn't have API it has apps so that's   the user interface projects this number is your  workspace ID and we could talk about this every   user can be in multiple workspaces and and every  user typically has their own specific workspace so   in general this is the workspace this project is  saved under and this is the project ID typically   you only need project ID but in this case when I  split apart the URL it's nice to just go get the   one here and paste it into my project URL this  one can have highlights the way this works it   can have highlights it starts in the beginning  and counts slashes instead of getting something   from the end I set the project URL value and I'm  going to split it up and at this point I can go   and see what is the API URL and there it is API  so went and got all of this and created the API   URL you do this because sometimes if you have  an on-site version we have a lot of customers   with on-site versions I might have a different  name here your company name.luminosa.com if  

you're in Europe it's eu.daylight if you're in  Australia's Au daylight if you're in Japan it's   JP Dash daylight I'm pretty sure so we have  a lot of different hosts that you can use so   you need that that URL right there we've got it  got the workspace ID the project ID so you have   all those values saved now just using that simple  split URL function so the next thing I'm going to   talk about is again the standard client connection  does look like this where I give it a project ID   I give it the URL I connect and then once I've  connected excuse me once I've connected with that   base URL I create a new client and I say client  for path projects project ID so this is the next   step in this piece of code remember I haven't done  anything yet so let's go get both of those pieces   of code so I'm going to client Connect using  that root URL and then I have a client project   and a client route so there it is and I can print  out what this is oh it's going to print out the E5 URL and this is API URL let's run that and  it comes back and it says remember we connect   it up to our project ID we created a client for  path from the root URL client to client project   projects project ID now that'll be the route for  this guy so any calls we make to here if we give   it a path we'll start at project slash project ID  and we say get the get on a project ID Returns the   information about that project ID in Json what's  the workspace what time was it created who created   it that's me any description that was given how  many documents are in it this vitamin project has   6797 documents what language it's in last time  the metadata was updated the build info so if   there was a build that kicked off there will be a  build info here you can't go and get Concepts if   the build hasn't been completed we can talk about  that probably in another section session what the   science version was when the build started when  it's done sentiment is a separate build we have   that here where you know we didn't skip sentiment  the start time of the sentiment the end time these   are all Epoch time they're integers since uh  Unix 1972 something like that and whether it   was successful or not and whether the core build I  remember there are different builds the core build   versus the sentiment build and then you've got  this last successful the project name the project   ID and the permissions that you have on that so  that's how you get information about a project   and we've already got our API going here so if you  want to connect with an environment variable token   this might be how you do it the code here is if  there is a token in the OS environment let's go   and get it and then we're going to pass the token  on here so we just have a couple different ways to   connect I'm just showing you how depending on  your environment you might be running on some   kind of back-end system and you're not going  to have somebody go and say Lumi hyphen save   token you just want to go get an environment  variable and pass it in we do that a lot too   if you're using another language when you pass  that token in you put it on the header and the   header looks like this it's the authorization  header and the value is token space and that   value of that Lumi token and so if you're  using JavaScript or Java you're going to be   setting that header like that I'm just gonna  do a quick couple of things and get a couple   of documents maybe get a couple of Concepts and  then we're done and you know how to use the API   and we'll start digging in on other sessions from  there so the idea of a document is that it has   basically three values the text the title and the  metadata the text is what we use to process the   title is just for displaying in the in the user  interface we don't do any analysis on this and   then the metadata metadata values can have strings  numbers dates scores and this is what they would   look like each metadata field has a has a type  it has the name of the metadata field and the   value for this specific document same thing with  number it has a name it has a value and a date   in a score so those are our different types of  metadata so let's just jump right in this section   shows we're going to look at this documents  endpoint so if I'm going to get some documents   out of this I'm going to say client and we're  going to say clientproject.get the root of the   URL in this case we are already there projects  project ID docs limit how many you want and the   offset so I could call this in a batch system  if I have a lot of documents you don't want to   say download me two million documents right now  you have to do it in batches that's an HTTP issue   that's not a luminosovicious common Web Service  practice so let's just go and get a document   out of this thing client underscore let's say  Docs equals client underscore project dot get   remember this one is at slash project slash  project ID already so if I say slash docs which   was outlined in that documentation here if I go  to get documents remember it's projects project   ID docs and there's a bunch lots of different  things that we can we can use filters and concept   selectors and limits and offsets and things like  that and we'll talk about those in another session   but if I just want to get some docs and I'm going  to limit it limit equals one I'm just going to get   one document here and then I'm going to print  that out so this goes and gets the documents so   um the result of this call is a Json object  and it's a dictionary the result is in is   a list it's a list of documents it only has  one document in it the document has the text   that was originally given it came half empty the  title it was three stars here's all the metadata   these are the metadata fields that are in this  project it has things like the date of it the   rating they give it a score of three there this  was 150 count vitamin bottle of vitamins uh the   time the type of vitamins is a One A Day Women's  there's lots of videos on this specific project   I'm pretty sure and I've added some other fields  here we can talk about how those get added we   have some other scripts that can add a sentiment  filters to these projects as well we have plenty   of sentiment but I added some other things as well  the actual terms on this document so the term is   come half empty so remember it is a stop word  so that didn't show up as a as a term term but   it does show up of where these other terms exist  fragments are the non-co-location version if any   of these are co-locations every document and every  concept has a vector that's how we understand the   relationships between all of the concepts and this  wasn't done with search so there's no match score   the document has a uid and there's another field  in here I want to get there's one more so this   is kind of it has a lot of information in this  Vector but mostly it's just the data that we put   into it there's another piece of information if  I look at getting documents there is a flag here   called include sentiment on Concepts this one's  very fairly new it's very interesting and I'll   say docs equals client underscore project dot get  docs2 Docs sentiment and I'm going to say I want   to get it with Slash docs and limit equals one  and include synthetic Concepts equals true and   then I'm going to print out these docs so this  is going to have a little bit of more information now when I see these terms this term  is this term of come is negative and   confidence is 99 this term of half is negative  of 99 and this term of empty is negative of 99   um 99 confidence so that's a way to get  confidence you can do that on every document   once the confidence builds and we can build really  interesting things with that that's one of the   ways that I built out this uh this other metadata  field on there we can talk about that as well   I'll do one last thing on Concepts if I want to  get some Concepts on this I can just say concepts equals client project Dot dot get and I'm just  going to say slash Concepts and I'm going to say   uh limit I think I can say limit one at 10.  oops client project I gotta type that right

uh it says that I need there's no limit on this  so I'll just take that off and see what happens I   think that the limit would go into what we call  the concept selector oh I didn't print it out there is the concepts these are the top Concepts  within this project again vitamins is the top   concept it's relevance is how I would sort them  when it comes back it has the texts and the terms   and a Rel the next one is gummies the term is uh  gummy and the vector for that the relevance the   next is taste so we have the top Concepts in  this project so that's the way we get set up   and how to use the API The Next Step probably is  to start talking about filters I can get all the   top concepts for the filter of women's one a day  or all the filters for the concepts of reviews of   the one and five or one and two or four and five  I wanna I wanna know what the top concepts are   around the worst of the reviews and the best  of the reviews so I can start getting that   information out using filters and the other way  is with Concept selectors which are interesting   and I can say hey return me the top Concepts which  is the default and then I can do this based on a   concept list remember in daylight UI and with  the API you can create concept lists say I want   to watch the packaging and the transportation and  the cost and The Taste and the flavor I want to   watch all those Concepts I put them in a shared  concept list and I can pass that here to get the   concepts around just those terms I can specify a  concept that I want information on I can get all   the related Concepts to something say vitamins  what are all the related Concepts to that you   see that if I have excellent here and I look at  all the related Concepts it's good great perfect   and so I can get that through the API using  related Concepts suggested Concepts these are   concept clusters and sentiment suggested unique  to filter and Driver suggested all of these have   you know we could talk for a whole session on each  of those and we probably will but that is the API   training for now of how to get set up and how to  actually look at get some quick information out   of the API around documents and Concepts I really  appreciate your time and thank you for listening

2023-06-14 20:39

Show Video

Other news