With searchcode server you can search across any repository of code that has been added by your administrator.
Type in anything you want to find and you will be presented with the results that match with the relevant lines highlighted. Searches can filtered down using the right filter panel. Suggested search terms are,
Type any term you want to search for in the search box and press the enter key. Generally best results can be gained by searching for terms that you expect to be close to each other on the same line.
The following search operators are supported.
If a search does not return the results you are expecting or no results at all consider rewriting the query. For example searching for Arrays.asList("a1", "a2", "b1", "c2", "c1") could be turned into a looser query by searching for Arrays.asList or Arrays asList. Another example would be EMAIL_ADDRESS_REGEX for email address regex.
To view the full file that is returned click on the name of the file, or click on any line to be taken to that line. Syntax highlighting is enabled for all files less than 1000 lines in length.
You can perform a literal search against the index by enabling literal search. To do so check the box "Literal Search" in the Search Options panel of the search result page. This search includes all the standard searches performed by Lucene.
Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries).
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:
te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:
test*
You can also use the wildcard searches in the middle of a term.
te*t
Note: You cannot use a * or ? symbol as the first character of a search.
Lucene supports regular expression searches matching a pattern between forward slashes "/". For example to find documents containing "moat" or "boat":
/[mb]oat/
Lucene supports fuzzy searches based on Damerau-Levenshtein Distance. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search:
roam~
This search will find terms like foam and roams.
An additional (optional) parameter can specify the maximum number of edits allowed. The value is between 0 and 2, For example:
roam~1
The default that is used if the parameter is not given is 2 edit distances.
Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in a document use the search:
"jakarta apache"~10
The following fields are supported. All spaces and / characters are replaced with _
You can search using a pure HTML interface (no javascript) by clicking here. Note that this page generally lags behind the regular interface in functionality.
Any search can be filtered down to a specific repository, source, identified language or code owner using the refinement options. Select one or multiple repositories, sources, languages or owners and click the "Filter Selected or "refine search" button to do this.
Note that in the case that a filter has only a single option for source it will not display any option to filter. This is to avoid cluttering the display were you have only a single source to search from.
Filters on the normal interface persist between searches. This allows you to select a specific repository or language and continue searching. To clear applied filters uncheck the filters indivudually and click on "Filter Selected". You can also click "Clear Filters" button to clear all active filters. The HTML only page filters are cleared between every new search.
The owner of any piece of code is determined differently between source control systems. See below for details.
GIT owners are determined by counting the number of lines edited by each user. This is then weighted against the last commit time. For example, Bob added a file of 100 lines in length 1 year ago. Mary modified 30 lines of the file last week. In this situation Mary would be marked as the owner as she has modified enough of the file and recently enough to be more familiar with it then Bob would be. If she has only modified a single line however Bob would still be marked as the owner.
The name is taken based on the git config user.name setting attached to the user in commits.
SVN owners are determined by looking at the last user to change the file. For example, Bob edited a single line in a file with 100 lines. Bob will be considered the owner even if Mary edited the other 99 previously.
Source code is complex to search. As such the following restrictions currently apply
The estimated cost for any file or project is created using the Basic COCOMO algorithmic software cost estimation model. The cost reflected includes design, coding, testing, documentation for both developers and users, equipment, buildings etc... which can result in a higher estimate then would be expected. Generally consider this the cost of developing the code, and not what it is "worth". It is based on an average salary of $56,000 per year but this value can be changed by the system administrator if the values appear to be too out of expectation.
API endpoints offered by your searchcode server instance are described below. Note that some require API authentication which will also be covered.
/api/repo/list/
Some repositories returned by this endpoint may be queued for deletion. They will continue to appear in this list until they are sucessfully removed.
hmac_sha1("MYPRIVATEKEY", "pub=MYPUBLICKEY")
http://localhost/api/repo/list/
http://localhost/api/repo/list/?sig=SIGNEDKEY&pub=PUBLICKEY
{ "message": "", "repoResultList": [ { "branch": "master", "name": "test", "password": "", "rowId": 1, "scm": "git", "source": "http://github.com/myuser/myrepo", "url": "git://github.com/myuser/myrepo.git", "username": "" } ], "sucessful": true }
/api/repo/add/
It is not possible to update an existing repository. To do so you must first delete the existing repository and wait for the background tasks finish cleaning the repository.
hmac_sha1("MYPRIVATEKEY", "pub=MYPUBLICKEY&reponame=REPONAME&repourl=REPOURL&repotype=REPOTYPE&repousername=REPOUSERNAME&repopassword=REPOPASSWORD&reposource=REPOSOURCE&repobranch=REPOBRANCH")
hmac_sha1("MYPRIVATEKEY", "pub=MYPUBLICKEY&reponame=REPONAME&repourl=REPOURL&repotype=REPOTYPE&repousername=REPOUSERNAME&repopassword=REPOPASSWORD&reposource=REPOSOURCE&repobranch=REPOBRANCH&source=SOURCE&sourceuser=SOURCEUSER&sourceproject=SOURCEPROJECT")
http://localhost/api/repo/add/?reponame=testing&repourl=git://github.com/test/test.git&repotype=git&repousername=MYUSER&repopassword=MYPASSWORD&reposource=http://githib.com/test/test/&repobranch=master
http://localhost/api/repo/add/?sig=SIGNEDKEY&pub=PUBLICKEY&reponame=testing&repourl=git://github.com/test/test.git&repotype=git&repousername=MYUSER&repopassword=MYPASSWORD&reposource=http://githib.com/test/test/&repobranch=master
http://localhost/api/repo/add/?sig=SIGNEDKEY&pub=PUBLICKEY&reponame=testing&repourl=git://github.com/someone/test/test.git&repotype=git&repousername=MYUSER&repopassword=MYPASSWORD&reposource=http://githib.com/test/test/&repobranch=master&source=GitHub&sourceuser=someone&sourceproject=test
{ "message": "added repository sucessfully", "sucessful": true }
/api/repo/delete/
Successful calls to this endpoint will insert a request into a queue to remove the repository. The actual deletion can take several minutes.
hmac_sha1("MYPRIVATEKEY", "pub=MYPUBLICKEY&reponame=REPONAME)"
http://localhost/api/repo/delete/?reponame=testing
http://localhost/api/repo/delete/?sig=SIGNEDKEY&pub=PUBLICKEY&reponame=testing
{ "message": "deleted repository sucessfully", "sucessful": true }
/api/repo/reindex/
Successful calls to this endpoint will cause the index and repository directories to be deleted and schedule all repositories to be reindexed. Note that queries to the system while the reindex is running may not return expected results.
hmac_sha1("MYPRIVATEKEY", "pub=MYPUBLICKEY")
http://localhost/api/repo/reindex/?sig=SIGNEDKEY&pub=PUBLICKEY
http://localhost/api/repo/delete/?sig=SIGNEDKEY&pub=PUBLICKEY
{ "message": "reindex forced", "sucessful": true }
/api/repo/index/
Successful calls to this endpoint will suggest to searchcode that a repository has been updated and add it to the index queue. If already on the queue this method does nothing. The queue is a first in first out queue and repositories will be processed in order.
http://localhost/api/repo/index/?repoUrl=https://github.com/boyter/searchcode-server.git
http://localhost/api/repo/index/?repoUrl=/disk/location/
{ sucessful: true, message: "Enqueued repository https://github.com/boyter/searchcode-server.git" }
searchcode server is designed to require as little maintenance as possible and look after itself once setup and repositories are indexed. However it can be tuned using the settings mentioned below in the searchcode.properties file or through the admin settings page.
searchcode server uses the high performance jetty web server. It should perform well even with thousands of requests as a front facing solution. If a reverse proxy solution is required there is no need to configure static assets, simply configure all requests to pass back to searchcode server. You should also set the config property only_localhost to true in this case.
There are two properties files in the base directory of searchcode server, searchcode.properties and quartz.properties.
The searchcode.properties file in the base directory is a simple text file that can be used to configure aspects of searchcode server. By default it is setup using suggested defaults. It is important to note that the password to administer your server is located in this file. To apply changes, modify the file as required then restart searchcode. All slashes used in the properties file should be forward not backwards. I.E. Unix style not Windows.
The quartz.properties file in the base directory should only need to be modified when changing the searchcode.properties values of number_git_processors, number_svn_processors and number_file_processors. By default searchcode spawns 10 background threads which are used for repository processing and internal processing logic. By itself searchcode uses 5 threads by itself leaving over 5 for background repository processing tasks. If you adjust the number of repository processors higher then you should increase the value for org.quartz.threadPool.threadCount to a higher number up-to a maximum of 100.
The admin settings page can be used change look and feel settings for searchcode server. Change the settings on the page. Changes are applied instantly.
The api key page is used to maintain keys used for authenticated API requests. This page is only relevant if you firstly enble the API through properties and then enable authenticated API reqeusts as well. To generate a key click the "Generate New API Key" button. A new API key will be created and appear at the bottom of the list. The key consists of two parts. The first portion is the public key which is used to identify who is making the request to the API. The second is the private key and should be shared only with the consuming application. This key is used to sign the request. To delete a key click the delete button next to the key you wish to remove. Generally it is considered good practice to create individual keys for each application using the API.
Generally searchcode server should only need the searchcode.properties and searchcode.sqlite files to be backed up. However where many repositories are indexed or when connectivity to source control can be problematic you may want to back up the index and repo directories and their contents.
Assuming you want to recover searchcode you will need to install the application sources. Then copy a backup of the searchcode.sqlite and searchcode.properties files into the same directory. When started searchcode will analyse the code and rebuild the index. This process will take longer for setups that contain many or large repositories. If faster restores are required restore the index and repo directories as well.
To index a repository browse to the admin page. Enter a repository name and url for publicly available repositories and for private a username and password for a user with enough access to checkout a copy of the repository. Repo Source should be a URL that relates to the repository (but can be anything) and will appear as a link on the code pages. When done click "Add Repo". The repository will be downloaded and indexed as soon as any other indexing operations are finished. Note that repository names cannot include a space, and any spaces will be replaced with a hyphen character.
GIT and SVN repositories are able to be indexed. To enable indexing of SVN repositories set the property value svn_enabled to true and svn_binary_path to the path of your SVN executable.
File locations on the machine searchcode server is running on are also able to be indexed. This allows you to index code that is not in a repository or is in a SCM that searchcode server currently does not support. To do so select the file option from the drop down and replace the repository URL with the path on the local machine such as /opt/projects
Note that searchcode server needs permission to read the directory, subdirectories and contents of all files otherwise it will crash out with a AccessDeniedException in the logs. There are a few things to note
To delete a repository click the delete button at the end of the repository list. This will remove all copies of code from disk (not for file repositories however) and the index. This action is not reversible. To undo the operation add the repository again. Note that all delete operations are queued and it may take several minutes for the repository to be removed.
Updating the details of a repository will require you to delete the repository, wait for the delete operation to finish and add it again with the new details.
To quickly add a large amount of repositories use the bulk admin page. This page will only allow the adding of repositories using a CSV format with one repository per line. Use the values git, svn or file for the choice of repository.
The format for adding follows.
reponame,scm,gitrepolocation,username,password,repourl,branch
For example a public repository which does not require username or password
phindex,git,https://github.com/boyter/Phindex.git,,,https://github.com/boyter/Phindex,master
*
For example a private repository which requires a username and password
searchcode,git,https://searchcode@bitbucket.org/searchcode/hosting.git,myusername,mypassword,https://searchcode@bitbucket.org/searchcode/,master
* This is a real repository can can be indexed. Copy paste into the bulk admin page to test.
A repository is not being indexed?
Check the console output, you should see something similar to
ERROR - caught a class org.eclipse.jgit.api.errors.TransportException with message: https://username@bitbucket.org/username/myrepo.git: not authorized
A file in a repository is not being indexed?
Files with an average file line length >= 255 are considered minified and will not be indexed. Files that are considered binary will also not be indexed. You should get a message like the ones below on the console saying as such when trying to index the file.
Appears to be minified will not index FILENAME
Appears to be binary will not index FILENAME
A repository is not being indexed on Windows
There are reserved file names on Windows such as CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
If a repository created on another OS contains one of these filenames it is likely that the attempt to clone or checkout will fail. Generally
it is better to deploy searchcode server on a Unix style OS to avoid this problem. If you are only going to index repositories that
were created using Windows then Windows is still a valid choice.
OutOfMemoryError
If you are getting the classic java out of memory error such as
java.lang.OutOfMemoryError: Java heap space
java.io.IOException: Too many open files
This issue typically occurs on Unix/Linux servers with a low ulimit.
If you are getting errors like the above you may need to change your ulimit to a higher number as the default
of 1024 for most systems can be too low.
Also consider lowering the values for number_git_processors, number_svn_processors and number_file_processors.
java.nio.file.AccessDeniedException
This is usually caused when using the filepath indexing. Usually it means that the user running searchcode server does not have the required permissions to read from the path selection. You will need to set the permissions so that searchcode server has full read rights on the directory. Otherwise it can be cause if the index or repo directories have been denied to searchcode server which requires full read write delete permissions for these directories.
How do I index code in Perforce/BitKeeper/Fossil
You can index code in unsupported repositories by checking out a copy of the repository on disk and adding a file repository which is pointed at this location. Suggested methods to keep it in sync would be setting up a cron job or scheduled task to constantly update the repositories.
Odd Results
If you have had an instance that has been running for a long time or that has stopped and started without notice
the index may need to rebuilt. Click the "Recrawl & Rebuild Indexes" button in the admin pages. This will clear
the repository and index directories and rebuild everything from scratch which should resolve the issue. Note that
this process may take some time if you have a lot of repositories or very large ones.
Help! Nothing is working!
Its possible that you may enter a state where nothing is working. In this case save the console output and try
restarting searchcode. This may resolve the issue. If not, try stopping searchcode and deleting the index and repo directories.
This will force searchcode server to re-download and re-index. If all else fails contact support.
To get support for your searchcode server instance email Ben directly at searchcode@boyter.org Please include the following information along with the problem you are experiencing.