Reverse engineering an Android Application
Hello, a good reader of my epic.blog!
On this lovely and streaming hot July day, I wish to take you on a journey of my reverse engineering trip across the mountains of Android and the valleys of decompilers. Pour yourself a cold drink and enjoy this voyage.
So; I wanted to demonstrate how to reverse engineer an Android Application and what tools you can use to achieve this - even without owning an Android Phone.
The app used for this demonstration is called Krk Bike, and it is a mobile application that you can download from Google Play store, and it will show you many of bike trails on Croatia’s island of Krk. I wanted to see all of this bike trails in a single, un-cluttered map. How could we get such data out of this app?
When you open up this app on your phone, it looks something like this:
The app itself has detailed trails with paths and pictures plus contacts, and it even comes with small “navigation module” that will guide you through the track while while you are on the route. Pretty decent and slick app. Obviously - dear reader - you’ll choose your own target at some point, but this one is interesting to me. It is definitely a good first step to be familiar with the app as much as possible.
How are Android apps built?
Well. First, you need to understand how Android apps are built and distributed. Usually, Android developers develop their apps with the help of Android Studio, compile them, sign them and upload the “apks” to Google Play Store. The more detailed explanation about the process of building Android apps for beginners and getting them to end-users can be found on Android documentation page (Create an Android project).
Hold on, mister! ✋How are apps compiled, and how do they even run?
Now that you’ve asked. In a nutshell. Usually, Android apps are compiled in a few stages. The first stage; depending on your source code gets compiled either with Java compiler or Kotlin compiler to Java bytecode. These compilers spit-out *.class
files. The *.class
files are then fed into DX (DEX compiler - d8
). DEX compiler spits out DEX bytecode that runs on Android devices, and it allows you to use Java 8 language features in your app’s code. DEX bytecode is then something that Dalvik Virtual Machine (DVM for short) actually runs your app.
Now; if you know anything about Android. You’re likely to say: “Sorry, Oto! Old news! DVM is deprecated since KitKat.” And you would be right! Dalvik has been replaced with something much more magical - called Android Runtime (ART for short). But few core concepts are the same, just better.
The successor of Dalvik is Android Runtime (ART), which uses the same bytecode and .dex files (but not .odex files), with the succession aiming at performance improvements transparent to the end users. The new runtime environment was included for the first time in Android 4.4 “KitKat” as a technology preview, and replaced Dalvik entirely in later versions; Android 5.0 “Lollipop” is the first version in which ART is the only included runtime.
Ok; so the next step in the building process for any Android app is to package it into something that is called APK. An Android Package Kit (APK for short) is the package file format used by the Android operating system for distribution and installation of mobile apps. Think of it as a package with some more meta-information attached, sophisticated Java *.jar
or Debian’s *.deb
package.
After developers successfully build these APK’s they “push” them to devices or to Google Play Store. If its development process, that’s usually done via adb
in command-line (or Android Studio will use something similar in the background for you). If its production release, people sometimes also sign these things with their keys either with Android Studio or with apksigner
via CLI directly. The Android system uses the certificate as a means of identifying the author of an application and establishing trust relationships between applications.
Ok, now that you know how apps are built, packaged and pushed to store,…
How can I get APK of an app from Google Play Store?
There are multiple ways you can get APK of any given Android App. The most trivial is to use one of multiple “mirror” sites that collect these APKs. Sometimes they automate the process and sometimes - as I found out via my uber Googling skills - they just have people fetch populat APKs manually. To name a few of these apkpure.com, evozi APK Downlowder and apkmirror.com. These sites have few disadvantages; the foremost being that not “all” apk’s that are on Google Play Store is available, and that this is a “mirror” site and there is no way of assuring that these apk’s and Apps that these belong to are untempered. Although hard, it is possible to inject malicious code and spread it via these or similar sites.
… but they don’t have my app there!
Oh, yeah. In that case, you can do what I did. You can
-
Install Oracle VirtualBox on your machine.
-
Get Android-x86 OS image from OSBoxes.org (follow this link).
-
Bootup the Android image and then go to Google Play Store,…
-
then install the app you wanna look into.
-
after app installs, download an app called MyAppSharer.
I stored the APK of targeted app into
/storage/emulated/0/
-
at this point, you can either install Android tools on your machine and then use adb to pull apk to host machine with following command. Or you can creativly just email yourself apk from MyAppSherer app in the emulator. (lol)
adb pull /storage/emulated/0/hr.molekula.bikekrk.apk
At this point, I wish to clarify a few things. Although for a given example this works; Android-x86 is an open-source effort to port Android to x86 architecture. An experiment that has now also extended beyond the primary x86 platform. Please make sure you visit their site and buy them a coffee if you like the project. Second. There are also other ways where you could pull apk from an Android phone or emulator, but I found all of them to be hard or unreliable, so that’s why I’m sharing this app here.
I have the APK! ✅ Now what? 🤔
After you obtain the APK of an app that you wish to peek into. The next step is to decompile *.apk
into something useful. i.e. *.java
source code. The decompilation process of Android source code is usually a two-step process. First, you unpack raw APK back to “files”, then those dalvik codes (*.smil
) needs to be converted back to Java classes. Note at this stage. Compilation usually means trimming some characters, names and also introducing some optimisations, so by no means expect that the decompiled code will match back to original source code. But in general, it is good enough that an expert will be able to figure out what is going on.
In other to do that there are several open-source tools available that you can use. For this experiment, I’ve used these. Some of them are pretty basic, and very Android specific, others however, like NSA’s Ghidra are more general purpose reverse engineering tools.
Apktool
This reverse engineering tool is the most probably most sophisticated and Android specific from this list and it can disassemble resources nearly to the original; including XMLs, images and other assets.
java -jar apktool_2.4.1.jar d hr.molekula.bikekrk.apk
If you run this tool it will spit out tree-structure of the original app that looks something like this; please note that it does not convert *.smil files back to Java source code. But it does an incredible job with other resources.
Java Decompiler
This decompiler can be used practically on any Java application. If comes in tree flaveours. JD-GUI, the variation with user interface, JD-Eclipse as an plugin for Eclipse IDE and JD-Core as an lib.
The small gotcha with Java Decompiler is that you need to convert your APK to JAR before it can be decompiled and inspected by it. To do that you need to geta tool called dex2jar that will help you construct JAR.
./dex2jar-2.0/d2j-dex2jar.sh -f hr.molekula.bikekrk.apk # for DEX -> JAR
java -jar jd-gui-1.6.6.jar # to bootup Java Decompiler GUI
JADX
This tool is very likely the most “for Android” that I’ve found and managed to use. It can be used with GUI or purely from CLI. It can also run directly on APK / DEX combo without the extra step like Java Decompiler mentioned early. It has also magical ability to deal with deobfuscation and from what I can tell the best full text search, decleartion jumping and usage lookup. When ran; it looks like this:
Ghidra
Although I originaly explored tools that are very Android or Java specific. I also get a recommendation from a friend to look into usage of Ghidra, a more general purpose software reverse engineering (SRE) suite of tools developed by NSA’s Research Directorate in support of the Cybersecurity mission.
Initially if feels a bit cluncky, and although it should work directly on *.APK’s I had to covert APK to JAR with dex2jar the same way as I did with JADX before I could use it. There are other ways you could also just load classes.dex
via “file system” import that I later found to also works.
Ghidra is definitly next level when it comes to reverse engineering and the toolbox seems to have everything and even more than a human might need for a simple experiment like the one I’m trying to demonstrate here.
javadecompilers.com
In some cases you even might not need to install any of these tools and frameworks and you can just use the online solution like javadecompilers.com. Alghouth bit cumbersone; it can still be used to look into simple Java / Android apps.
I have the source. ✅ Now what?! 💡
After you’ve successfully obtained the source code with decompilation process the next step is to probably limit the scope of your investigation and start reading the code. In my case I was looking for two things; how does the app connect to the web service that exposes “tracks” and for the coordinates of the tracks themselfs. And secondly what security measures are used on server and client side to bridge the communication gap.
For my case, the easist thing was to search though the whole codebase for strings “http
” and “https
” and then start the search from there. With the help of JADX I’ve quickly found following sweet breadcrumb:
That led me to this lovely class named Url
:
package hr.molekula.bikekrk.net;
public class Url {
private static String base = "http://krk.molekula.net/";
private static String jsonBase = (base + "api/v1/");
// few lines omitted here
public static String tracks = (jsonBase + "tracks");
// and a few here,...
}
So with simple logical deduction we can deduct that tracks are fetched by HTTP request via following path
GET http://krk.molekula.net/api/v1/tracks
Quick test with HTTPie,… 🍰
http --follow get http://krk.molekula.net/api/v1/tracks
Spits out few more breadcrumbs… but, no luck! Looks like we need the second peace; API key?
HTTP/1.1 401 Unauthorized
Content-Type: application/json
# few headers omitted.
expires: -1
pragma: no-cache
{
"error": "Unauthorized. Wrong or missing API key."
}
At this point it might be a good exercise to look deeper into how the app itself builds HTTP requests before they are dispatched to end service. Lets see what classes use hr.molekula.bikekrk.net.Url
and what do they do with it and follow the trail from there… and that quickly leads us to a wrapper BikeClient
that looks like this:
package hr.molekula.bikekrk.net;
// ... few imports
public class BikeClient {
private final AsyncHttpClient client = new AsyncHttpClient();
public BikeClient(String lang, String apiKey) {
this.client.addHeader("Language", lang);
this.client.addHeader("API-Key", apiKey);
}
public void getJsonContent(String url, JsonHttpResponseHandler responseHandler) {
this.client.get(url, new RequestParams(), responseHandler);
}
}
Meaning that before request is build and dispatched it also gets two additional HTTP headers: Language
and API-Key
. Nice find. Now; where is this API-Key
stored and what is the Language
. Again; few clicks via usage finder in JADX will point us to class that - in most cases sets language to what users device uses - for demonstration purposes we’ll just default it to “en” and now we only need api_key
. With text search we can now look around and we find this:
package hr.molekula.bikekrk;
public final class R {
// A lot of stuff omitted ...
public static final class string {
public static final int abc_action_bar_home_description = 2131230720;
public static final int action_settings = 2131230768;
public static final int api_key = 2131230840; // <- this, maybe?
public static final int app_name = 2131230841;
public static final int app_name_short = 2131230842;
}
// ...
}
The R class in Android projects is a special class. R.java
is the dynamically generated class, created during build process to dynamically identify all assets (from strings to Android widgets to layouts), for usage in java classes in Android app. And that means that api_key
is not 2131230840
but rahter something else that is defined in XML.
At this point we know that there is an API key somewhere in the source/apk, but it is hidden somewhere else. A good time to go back to source code - in my case I went back to apktool - and grep the whole decompiled code with resources included and I found the key itself…
grep -nw "api_key" -r .
./res/values/public.xml:941: <public type="string" name="api_key" id="0x7f080078" />
./res/values/strings.xml:124: <string name="api_key">[ API KEY HERE ]</string>
./smali/hr/molekula/bikekrk/R$string.smali:80:.field public static final api_key:I = 0x7f080078
And now we can reconstruct the full HTTP request with all the right headers like so with HTTPie:
http get https://krk.molekula.net/api/v1/tracks Language:en API-Key:[API KEY HERE]
And bingo… we get JSON response of all tracks with URLs to GPX files with coordinates for all paths.
Reconstruction of the map 🗺
Now that we know where data is and how to get it; we can fetch that and re-use it in some way. For example; this is a simple BASH script that fetches all that GPX track files.
#!/usr/bin/env bash
set -ex
export LANGUAGE=en
export API_KEY=API_KEY_COULD_BE_HERE
rm -rf ./*.gpx* && \
http get https://krk.molekula.net/api/v1/tracks \
Language:${LANGUAGE} API-Key:${API_KEY} \
| jq -r ".data[].gpx" | parallel --gnu "wget {}"
And the final result - when all the GPX tracks are visualised together is pretty lovely.
Summary
Purpose of this post was to take you, dear reader, on a yourney of decompilation of Android app; for the sake of demonstration I took one app that looked interesting to me. The procedure is a bit more sophisticated then traditional old school “view source” in “web” but few concepts are similar.
Please note that I’m everything but Android engineer and a lot of details in this post are written by studiying various resources and migh be wrong or incomplete. Please, if you do feel that there is something off in my post, say it! I always like a good feedback.
Cheers!
P.s.: I wish to thank deeply to following friends and colegues that have helped with ammendments, recommendations and comments so that I’m not spreading nonsens. :) Kudos @solarb, @damjancvetko, @lowk3y, @milangabor and @lknix for your help.
P.s.s.: I also received a brilliant recommendation to look into MobSF. You should definity look into if you wanna look deeper in the subject of reverse engineering of mobile apps, pen-testing and malware analysis.