I’ve been working a lot in python recently.

Besides at work, I’ve also been working on converting my broker project which was originally written in C to python.

While doing this, I started to look at the network socket code first and I'm amazed how easily its able to serialize data over the wire using JSON with so little fuss. I'll explore this a bit further...

Now originally I didn't use JSON in the C version, I used MsgPack which is like JSON but it's a binary protocol and faster. Anyway, it roughly achieves the same goal - sending messages across the wire...So

One thing you always need to do in TCP programming is agree on a protocol between the two parties in the communication. Something like "I'll send the size of the packet first, and then the rest of that data...", then the receiving end should read only that much data.Also, you need to convert the bytes to network-byte order before sending and then converting to host order on receiving.

Anyway, you can't really get away from this whether you're in C# or in plain old C or Python for that matter. So just for interest sake, here is how its done in C compared to how its done in Python.

C:

/* readn - read exactly n bytes */
int netReadn( SOCKET fd, char *bp, size_t len)
{
	int cnt;
	int rc;

	cnt = len;
	while ( cnt > 0 )
	{
		rc = recv( fd, bp, cnt, 0 );
		if ( rc < 0 )				/* read error? */
		{
			if ( errno == EINTR )	/* interrupted? */
				continue;			/* restart the read */
			return -1;				/* return error */
		}
		if ( rc == 0 )				/* EOF? */
			return len - cnt;		/* return short count */
		bp += rc;
		cnt -= rc;
	}
	return len;
}

Python:

  length_str = b''
  char = socket.recv(1)
  while char != b'\n':
    length_str += char
    char = socket.recv(1)
  total = int(length_str)
 
  view = memoryview(bytearray(total))
  next_offset = 0
  while total - next_offset > 0:
    recv_size = socket.recv_into(view[next_offset:], total - next_offset)
    next_offset += recv_size 

The python code comes from the jsonsocket source while the previous comes from code in Stulibc.

That was a little diversion that I found quite interesting while converting my C to Python.

Apart from that short little diversion, I’ve also been working generally in python recently at work and made some interesting code...

Chunking-up data before sending it up to the internet. This is very cool, basically turns one array into many smaller chunk-sized arrays. I like.

        if(shouldChunk):
            chunks = numpy.array_split(array(holdings), ChunkSize)
        else:
            chunks = numpy.array_split(array(holdings), 1)

Using list comprehension in python to construct objects in one-line, much like LINQs .Select() function:

requests = [models.HoldingDto(security_uid=request["securityUid"], holding_type=request["holdingType"], units=request["units"], settled_units=request["units"], cost=0, properties=None, transaction=None ) for request in chunk ]      

Caching and reloading of data. In python its a doddle to dump and reload dictionaries and lists. It always amazes me how much we can do with simple lists. Here I'm serializing a list of dictionaries called ticker_to_isin having generated that list using a routine. Which, I need not be redone if we can cache the results...which we can:

    if(isNewLoad or shouldResolveLusid):
        isin_to_secuid = GetSecuids(ticker_to_isin, client)
        pickle.dump(isin_to_secuid, open(secuid_cache_name,"wb"))
    else:
        isin_to_secuid = pickle.load(open(secuid_cache_name,"rb"))
    return (ticker_to_isin, isin_to_secuid)

I've also had to call into Thompson-Reuters data scope platform recently. Basically to get information about RICs - Reuters Instrument Codes which I've come to learn are how Thompson Reuters identifies securities.

Here is how you can obtain a token and issues REST requests to data scope. The link above is to their documentation page which I used to come up with the code below. So, below I'm requesting Isin codes for Ticker aka RIC codes.

This also shows how easy it is to manipulate headers and send plain HTTP requests in python much like in Typescript of Javascript. 

def GetDataScopeToken(username = "x", password = "y"):
    headers = {'Prefer': 'respond-async', 'Content-Type': 'application/json'}
    url = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Authentication/RequestToken'
    json = { 
        "Credentials": { 
            "Username":  username, 
            "Password":  password
         }
        }
    r = requests.post(url, json = json, headers=headers)
    return r.json()    

def GetDataScopeInstrument(token, source):    
    headers = {'Prefer': 'respond-async', 'Content-Type': 'application/json', 'Authorization': 'Token {token}'.format(token=token)}
    url = 'https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/InstrumentSearch'
    json = {"IdentifierType":"Ric","Identifier":source,"InstrumentTypeGroups":["CollatetizedMortgageObligations","Commodities","Equities","FuturesAndOptions","GovCorp","MortgageBackedSecurities","Money","Municipals","Funds"],"PreferredIdentifierType":"Isin","MaxSearchResult":10}
    r = requests.post(url, json = json, headers=headers)
    return  r.json()

I've also discovered an interesting library (tqdm) which allows you to track iterative processes in a visual manner. Whats more is that it works with pythons concurrent.futures library to aid async code which makes me weak at the knees!. I like!

It's just a matter of tacking the functionality in for loops and boom - progress bars! Incredible.

Notice that I'm sending up 10,861 items and I'm dividing this into 200 item chunks, which is roughly 54 items a request. parallelize these and you're in business! 

Here is how you can do it using a tqdm iterator:

iter = tqdm(HoldingDataDf.groupby("HoldingsDate"))
        for group_name, group in iter:
            iter.set_description("Processing Group ({name}), holdings = {size}".format(name=group_name, size=(len(group))))
            # do work

 And here is how you can do it using concurrent.futures:

with concurrent.futures.ThreadPoolExecutor(max_workers=MaxThreads) as executor:
                futures = [executor.submit(SendHoldingsThreadFunc, chunk) for chunk in chunks]   
        
                kwargs = { 'total': len(futures), 'desc' : 'Uploading data' }

                for f in tqdm(as_completed(futures), **kwargs):
                    f.result();

Basically, you pass the futures to the tqdm() function when they are completed.

One-line assignments. I like these. These are so useful because they are concise. I use them when I'm reading in command line switches:

InputFile = options['-i'] if '-i' in options else "HoldingsSummary.xml"
    MaxThreads = int(options['-t']) if '-t' in options else 2
    ShouldResolveTickers = True if '-k' in options else False
    Verbose = True if '-v' in options else False
    Scope = options['-s'] if '-s' in options else "tr"
    ChunkSize = int(options['-u']) if '-u' in options else 200
    DoSynchronous = True if '-0' in options else False
    show_holding_count_by_date = True if '-c' in options else False
    isNewLoad = True if '-n' in options else False  
    shouldResolveLusid = True if '-l' in options else False
    shouldChunk = True if '-p' in options else False
    start = int(options['-x']) if '-x' in options else None
    end = int(options['-y']) if '-y' in options else None
    dryRun = True if '-a' in options else False
    timeout = int(options['-r']) if '-r' in options else 100

Cool huh?

Pythons humble format() makes life so much more manageable. Thanks!

print("python {program_name} -i <inputfile>".format(program_name=program_name))

Here is something I've not spent too much time on but its interesting none-the-less: Pythons ability to define types of arguments and return values:

def TickerToIsin(ticker:str) -> str or None:
    result = GetDataScopeInstrument(token['value'], ticker)
    value = result['value']
    if( len(value) > 0):
        isin = value[0]['Identifier'] 
        print('{ticker}={isin}'.format(ticker=ticker, isin=isin))
        return isin   
    else:
        return None

See how it's saying, ticker is a string and that TickerToIsin() returns a string or None. Usually, you don't do this in python you just return whatever as whatever - just like in Javascript. But you can add this "typedness" if you like it seems. 

Thats it for now :-)