As a Data Scientist who has done his own fair share of model training, I’m very familiar with spending long periods of time waiting for training to finish, especially as you increase your data size and / or model complexity.
When training on a local machine, these long wait times become inconvenient not just because of the wait itself, but also because you will often be forced to keep your computer running throughout the night, or even multiple nights in a row. Hopefully, your computer doesn’t randomly restart due to an update! …
Several months ago, I published an article which covered 3 simple ways to improve your RMarkdown. In this sequel article, I want to cover 3 more ways to improve your RMarkdown which deal with slightly more advanced techniques.
In the previous article, I covered using built-in themes. These are really easy to use, and require a single line of code in the
YAML of the RMarkdown. However, what if you want to modify a specific part of the theme such as the colors or font for a specific section? Here’s how you can do that:
Last month I talked about the multiple ways of sharing RMarkdown analyses with your target audience. When you’re sharing your RMarkdown analysis, the most important aspect is the content itself, but formatting and aesthetic of the analysis is not far behind.
In this article, I am going to show you three simple things that can turn your analysis from a bland, flat file into a well organized, interactive document that people will want to read.
The YAML section in an RMarkdown file is the very first section of code you see:
The above YAML…
When I first learned how to use RMarkdown, I was in absolute awe. Moving from copy/pasting SPSS output into Word documents, to being able to automatically generate a rich analysis report full of text and visualizations at the press of a button was game-changing. RMarkdown files can also be rendered into a variety of formats which not only covers HTML, but also PDFs, Word documents, and others. While sharing an output file is easy and works great for an analysis that is conducted once, what if there are frequent updates that you want to share with clients/colleagues/others?
In the case…
A couple of weeks ago, a collaboration of various research groups released a huge dataset of literature which can be used to gain insights into the current COVID-19 pandemic. Additionally, Johns Hopkins updates a dataset that tracks confirmed cases/fatalities/recoveries daily. A call to action to the data community was given to help join the fight against the virus.
Two competitions were created on Kaggle, and many people started to share their take on working with the data-sets and the different things that they have uncovered. We at CompassRed have been fascinated with a lot of the work shared. …
The topic of whether to choose R or Python for data science work has been debated ad-nauseam. There are already countless articles discussing the pros and cons of both, so I won’t go into too much detail in this post. While there is still a large debate on what to use between R and Python users, many agree that R is superior for data cleaning, exploration, and conducting general statistical analyses, while Python is more capable when it comes to developing and implementing models.
Given these main pros of R and Python, is there a way where we can utilize…
Thanksgiving is almost upon us, and that means lots of cooking and looking up new recipes!
As someone who frequently visits food blogs, I sometimes get a bit overwhelmed from the sheer volume of recipes, and especially the number of ingredients that they require. A question that I started thinking about recently, is can I determine which ingredients are used in most recipes so that I can cook a variety of meals with as few ingredients as possible?
One way we could answer that question is by conducting Market-Basket Analysis. This type of analysis is frequently used in marketing to…
Since October is Breast Cancer Awareness Month, I wanted to do some type of analysis using a breast cancer dataset. There is a popular dataset available on Kaggle and the UCI Machine Learning Repository that contains diagnosis data which looked like a good contender for an interesting analysis.
The data used in the analysis were originally provided by the University of Wisconsin. They are numerical variables which were obtained based on digitized images of a fine needle aspirate of a breast mass and describe cell nuclei characteristics present in the image.
Each observation is a nucleus which is described by…
In this day and age, we have so many choices when it comes to making online purchases, watching a TV show or movie using a video streaming service, or finding songs to listen to online. Given the plethora of choices, companies need some way of curating content for customers, since browsing through millions of products on a website is just not feasible.
Enter recommender systems. If you have ever shopped on Amazon, or watched a TV show on Netflix, you have encountered recommender systems at work. …
Face detection/recognition has been a growing field for the last few years. With advancements in computer vision, companies have been able to leverage powerful algorithms to help solve a variety of problems. However, there have been debates about the balance between benefit to society vs the cost of people’s privacy.
Here are a few ways that organizations are using face detection.
One of the most controversial areas right now concerning organizations using face detection technology is law enforcement. Some people might be worried given the plethora of futuristic movies depicting an overseeing government watching and tracking down every person’s movements.