CompaniON is an application that demonstrates the power of document-vector representations as applied to raw business data. More specifically, it converts highly-dimensional abstract representations of thousands of company profiles into a human-friendly visual graph. The information for all companies was retrieved from the online database provided by Industry Canada here.
POWERED BY: This app was built with help from the usual suspects: Selenium for online data scraping, Doc2vec (from gensim) for document-vector representation generation, Tsne (from scikit-learn) for dimensionality reduction, Kmeans for color-clustering (from scikit-learn) and Flask for web interfacing.
TECH to me please: Over 18000 company profiles as recorded in the Industry Canada site were scraped and later employed to train a Doc2vec model. The model was trained over 30 epochs. After this step a sample set of 3000 profiles was extracted and the Tsne algorithm was applied to it in order to reduce the model dimensionality from the original fifty dimensions to only two. Next, a simple affine transformation consisting of a translation and a scaling steps was applied to the sample dataset to make the data fit on the screen in the default view. Lastly, the kmeans clustering technique was applied to the dataset in order to assign different colors to different regions in the map. This was done to make visual recognition of different map areas easier to the human eye.