This algorithm takes in raw text, either on its own, or as a set of documents, and outputs candidate tags for the text. To send raw text, just send a string to the algorithm endpoint. If you have a set of distinct documents and want keywords for the overall set of documents, pipe your input as a json array of strings.
This algorithm takes in a URL, retrieves the content, and produces candidate tags using LDA.
Takes in a url and extracts the content from the page. Makes an attempt to remove non-content text like navigation and footer text.
Recognize text in your images with this algorithm. It uses Tesseract, "probably the most accurate open source OCR engine available". For more information on the development of Tesseract, refer to: https://code.google.com/p/tesseract-ocr/
Gets lists of N grams from an input text. Input is: size of N-gram (number of words), cutoff size for max # of results returned, whether or not to ignore capitalization, and whether or not to sort from most frequent to least.