What’s Your Best Feature?

Todd DiFronzo
13 min readApr 30, 2021

You are part of a team tasked with creating a new feature for an application that helps users find the perfect city to settle down. The app already has several features but the team isn’t satisfied. They want more. That’s right…more.

And they are all looking at YOU!

So what do you do? You could snap a selfie, submit it to the team and hope that will suffice. But make sure it is your good side! However, unless you are Chris Hemsworth or Scarlett Johansson, chances are that just isn’t going to cut it.

But no worries, you came to the right place. I am going to discuss on a high level how I added a new feature to an already existing application, from feature ideation to feature completion, using a project I just recently worked on called CitySpire.

CitySpire is an application that aims to be a one-stop resource for users to receive the most accurate city information. It analyzes several types of data, such as rental rates, crimes rates, populations and more, to help users decide where the perfect place would be to grow roots.

But to make this application more robust, the team has asked me to add a Rental Forecast feature. So where do I start?

Step 1: Discuss

https://www.interview-skills.co.uk/blog/how-to-stand-out-group-tasks-discussions

The first step should always be to find out what it is you need to do. But in order to this, you need to communicate with all pertinent parties. This seems like a pretty obvious step but you would be surprised to know that this is a perfect place to fail.

Encourage as many meetings or interactions with the team and stakeholders as is practical and possible. If you don’t know what they want, then how can you provide what they need.

  • Meet, meet…meet!
  • Listen first
  • Ask questions next
  • Summarize what is wanted
  • Provide follow-up questions
  • Review the task to ensure you are all on the same page
  • Remember: “Tell me what you want so I know what you need”

My first steps in building the Rental Forecast feature was to attend the following types of meetings:

  • Product Document meeting with all parties
  • Stakeholders meeting
  • All-teams meeting consisting of iOS, Web and Data Science teams
  • Data Science team meetings

The key here is effective communication. Without it, the project will be doomed from the start.

Step 2: Research

https://www.discoverphds.com/blog/what-is-research-purpose-of-research

The second step for adding any feature to an already existing application would be to first do your due diligence, aka…research.

  • Review and play with the application to see how it functions
  • Go through each feature and see how it works
  • Consider how the new feature will fit in with what is already there
  • View the code and get a feel on how all the parts fit together
  • What’s the overall look and feel of the application before and after you add your new feature
  • View the datasets already being used

The list goes on but don’t leave any stones unturned because what you learn here will determine your approach and success for the project.

I took the time to thoroughly review the product documentation, the current application, the datasets, the code, the platform and more. The more I understood, the easier it became for my next step…to plan.

Step 3: Plan

https://www.blueoceanglobalwealth.com/technology-planning-process-program.php

You now know what is wanted but to know what you need to do, you need to…Plan!

Going in without a plan is like going to your job in your birthday suit. It isn’t going to work and there is a high probability no one is going to appreciate your contributions. Below are some of the items that I planned prior to diving into my task. Note that this isn’t all inclusive but just some of the items that went into my planning.

Data:

  • What type of data do I need and how do I plan to use it?
  • How and where will I acquire the data?
  • How will I prepare and clean that data?

Feature Engineering

  • What dataset features will I be looking for?
  • Do I have the right dataset features to reach my goal?
  • Do I want to add, merge or remove dataset features?

Modeling

  • What type of model will meet my needs? Linear/ logarithmic regression, decision trees, KNN etc.
  • Will I use outside Model programs like Prophet or start from scratch?
  • Will I use multiple models to get the best answer?

Testing

  • How will I test my feature?
  • Will these tests be adequate or do I need more?
  • What approaches will I have to fix my errors?

Creating the Endpoint

  • What api will I use to build the endpoint?
  • How will I test the endpoint?
  • What information will the web team need for me to hand over this feature?

Step 4: Acquire the Data

https://hirinfotech.com/what-is-web-scraping/

You’ve done your groundwork and now its time to get your hands dirty. Sometimes acquiring data is as simple as A-B-C. Other times it is a job all in of itself. There are two main ways to get data:

In a perfect world, one could simply download the dataset of choice, clean and engineer it, model it and poof…you are done. But I’m not sure what world that is. Nothing worthwhile comes easy and getting the data you want is one example.

For those not lucky enough to find the holy grail of a dataset, scraping the web is the next best thing.

“Web scraping is a highly effective method to extract data from websites (depending on the website’s regulations)”, definition courtesy of Analytics Vidhya.

Scraping the Web can be an arduous process, in that whatever data is returned, is most likely going to be one big hot mess. But do not fret my ambitious adventurer, as there are many helpful tutorials out in that vast internet universe for you to refer to. Simply click open your web browser and put in a simple search for Web Scraping tutorials. Tutorialspoint is a good place to start.

Lucky for me, web scraping was not needed. Not that what I found was set and ready to go, but was data that I could mold into what I needed. I started off by accessing several sites that I thought would be great resources for past rental data. I ultimately narrowed it down to Zillow and HUD but soon settled on HUD. They had exactly what I needed. The biggest hiccup here was that the information I needed was not in one dataset but several. Additionally, some minor data cleaning and some feature engineering was going to be needed.

Step 5: Clean the Data

https://towardsdatascience.com/data-cleaning-with-python-and-pandas-detecting-missing-values-3e9c6ebcf78b

Come on. Admit it. You love that moment when you type ‘dataFrame.head() into the terminal and see that glorious and magnificent data pop up right before your eyes. It is a sight to behold. Unless of course it… isn’t.

Although dataset cleaning can be challenging, it can also be fun…in a weird way. Don’t judge me.

But more often than not, you will be faced with some not-so-fun questions. How much data will I lose when I remove Nulls or duplicate values? How do I take care of any missing values? These, and more questions, you will undoubtedly face on your data cleaning journey. Below are some items to look for when cleaning your dataset:

  • How many missing or null values do I have and how many can I remove without destroying the integrity of the data
  • How will I replace those missing or null values?
  • How many duplicate values are there?
  • Do I need to convert data types, such as time to datetime?

I used seven total datasets from HUD for the project and and the planets must of been in alignment on that day because there were almost no missing values or duplicates. In fact, there wasn’t any big obstacles to overcome in this situation. I simply prepared all the datasets so the columns were comparable and merged them. With one big dataset now in hand, I turned my attention to engineering the features I wanted.

Step 6: Feature engineering

https://blog.dominodatalab.com/feature-engineering-framework-techniques/

An integral part to working with any dataset is to engineer features, or create new features out of your available data. You can combine columns, add to columns, create new columns and more. The sky is the limit and you are the pilot.

In my case, there was no time feature. Each dataset was labeled by its year but no columns to represent that. So I engineered a data column for each year that represented each dataset merged, along with its historic Fair Market Rental prices. I now had a complete dataset, flush with all the columns and data I wanted. Next up…Modeling.

Step 7: Modeling

https://pythondata.com/forecasting-time-series-data-with-prophet-part-1/

Modeling is putting your data to work. Because this topic can span many book volumes, I wont go into much detail here other than there are a large amount of different models you can use depending on what it is you are trying to do. Some types available to you are:

  • Linear Regression
  • Logarithmic Regression
  • Decision Trees
  • Clustering
  • KNN
  • and heck of a lot more…

But there are some outside sources you can use too, such as Prophet. This is a great program. Windows installation can be a bit challenging, which it was in my case. But once you have it up and running, it is a fantastic program. The only other caveat is that you will need to have your data in the format needed by Prophet. Once you do that, then off you go. Refer to the Prophet’s documentation.

Below is a rental forecast visualization for a 1 bedroom rental in New York, New York from my Rental Price Forecast feature. The return generated by Prophet returned a forecasted upper, lower and average price for the rental by year.

Of course you can tweet and pimp out your graphic in several different ways. The point though is to make sure that the user can understand what is being displayed. I think in this case it is better to keep it simple to provide more information impact on the viewer. In my case, I believe that was accomplished as just about anyone viewing the above graphic will immediately understand it.

Step 8: Create an endpoint

Endpoints are basically how two different systems interact with each other. As defined by SMARTBEAR AlertSite, “When an API interacts with another system, the touchpoints of this communication are considered endpoints.” Each endpoint would be one end of the communication channel.

For instance, an API (Application Programming Interface) communicating with a web application works by using requests and responses. There are different web frameworks you can use to build these endpoints, such as FastAPI or Flask.

For CitySpire, I used FastAPI to build the endpoint. Below is the code snippet from the work I did:

In the first graphic, the endpoint is denoted by: @router.post(‘/api/rental_forecast_graph’). This is the line of code that the Web team will need.

The second graphic is code that was created in order to do the forecasting in Prophet. The code consists of getting the data into the form that Prophet needs in order to produce the predictions.

Step 9: Testing

https://www.zeolearn.com/magazine/what-is-code-testing-and-why-is-it-important

Testing, testing…testing. It should go without saying that you need to test not only your code and model’s results, but also the endpoints.

Python’s built-in ‘unittest’ module is a great place to start but there are also many third party testers such as ‘pytest’. Whatever you plan to go with, just make sure that when it comes time to present your work, that everything is in working condition and ready for the next step.

Step 10: Provide your endpoint to your Web team

https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c

It is finally time to present your work to the Web team. You provide your team with your completed forecasting model and endpoints. The tensions are high. Sweat beads are slowly, but methodically, inching down your face. As you scan everyone’s face you notice that it appears every single persons’ right eyebrow is lifted, ever so subtly. But not to fear, you did the research, the planning, the hard work…the whole enchilada.

You do a quick demo wowing all those involved. ‘Minds are blown!’ The excitement in the room overflows…okay, okay…so maybe it doesn’t go down that dramatically. In fact, more likely it will be quite the opposite. The team will be expecting you to be a professional and capable of providing what was promised. And that’s what you gave them. That’s what I gave my team. Time to take a deep breath. Victory is yours. Victory is mine. Victory is ours.

Project Reflections and summary:

http://clipart-library.com/clipart/1402237.htm

CitySpire background reflection:

My task was clear: Create a feature that would predict rental price listings for each city in our database. One of my immediate main concerns with jumping fresh into an already existing application was picking up where others left off. How will I be able to balance injecting a new feature into an already working application without throwing off the balance or the message. What obstacles would I face?

CitySpire Technical Challenges reflection:

Being on a team is wondrous thing because you know that when a problem or challenge is faced, you face it together. Now I’d use the Titanic as an analogy here but those that hopped onto the lifeboats weren’t solving Titanic’s sinking problem…they were leaving it for those stuck on board. Perhaps in this case, running away from the challenge wasn’t a bad idea. But in most situations, whatever problems a team member faces, the whole team faces.

My contributions to the team may have looked like I had one foot in the life boat and another on board, but they ultimately kept the ship afloat. While others went around masterfully plugging holes in the ship, I was busy obtaining the needed datasets, cleaning and preparing them, modeling them and creating the endpoint that the Web team would need.

One specific challenge was deciding what would be the best way to show our rental forecast price? What would take a rental price forecast and give some needed context to the user instead of just some number with a dollar sign in front of it? This problem was overcome by using a great looking and informative visual of course.

Using Prophet I was able to generate a simple but powerful graphic putting the returned rental price forecast into context. Below is the code:

The above produced a simplistic but to the point graphic giving the user the information they need, all in context. Of course you can polish up your outputs and all of that is covered in Prophet’s documentation.

My feature allows a user to return Rental price forecasts for 0, 1, 2 and 3 bedrooms. Below is the output of the above code forecasted for 2 bedrooms:

Results and Future implications reflection:

The CitySpire application, although not fully deployed yet, contains a rich mixture of great content. Options presently available:

  • City names
  • City coordinates
  • Crime statistics
  • Current Weather
  • Job opportunities
  • Rental prices
  • Pollution information
  • Walkability score
  • Bike score
  • Population
  • School listings with rankings
  • Recommendations
  • Demographics visualization
  • Employment visualization
  • Crime visualization
  • Air visualization
  • Population forecast visualization
  • Rental Price forecast visualization

Below is our CitySpire API:

The CitySpire app as it stands is a very useful tool but it can get better. Adding a Sports features would be ideal.

How about adding features that cover topics such as:

  • Are there any pro league teams such as MLB, NBA, NHL or NFL teams?
  • Are there any minor league teams such as AAA, AA or A for MLB?
  • What about adding recreational sports?
  • What about age-related sports and recreation for different age and gender groups?

Some challenges future developers will face will be just the enormous amount of information that would be available for the above suggestions for some cities, then very little for others. A balance would have to be found in order to tackle the task of building these new features due to the varying type and amount of data available for each city.

With the help of my team and some well received feedback, I was able to overcome some of the challenges that I faced, such as data overload and building an endpoint that worked.

Persistence and desire to learn were some of the excellent feedbacks I received. But the biggest one that I received that will help me further my career, and my hope, yours:

“Tell me what you want so I can give you what you need”

So when the whole team looks in your direction, be ready, communicate, research, plan, and then do what you do best: transform data into something useful…like a feature. Because we all know that once you are done they are going to want…more. And they are all going to be looking at You!

--

--

Todd DiFronzo

Data Scientist in the works. Excited about transforming raw data into real world solutons